Tag Archives: Data Science

Data Quality, Evolved

Data quality doesn’t have to be a train wreck. Increased regulatory scrutiny, NoSQL performance gains, and the needs of data scientists are quietly changing views and approaches toward data quality. The result: a pathway to optimism and data quality improvement.

Here’s how you can get on the new and improved data quality train in each of those three areas: Continue reading

Tableau Rollout Across Five Dimensions

Standing up any new analytics tool in an organization is complex, and early on, new adopters of Tableau often struggle to include all the complexities in their plan. This post proposes a mental model in the form of a story of how Tableau might have rolled out in one hypothetical installation to uncover common challenges for new adopters.

Tableau’s marketing lends one to imagine that introducing Tableau is easy: “Fast Analytics”, “Ease of Use”, “Big Data, Any Data” and so on. (here, 3/31/2017). Tableau’s position in Gartner’s Magic Quadrant (referenced on the same page) attests to the huge upside for organizations that successfully deploy Tableau, which I’ve been lucky enough to witness firsthand. Continue reading

Tableau Startup: First Lessons Learned

XAs I mentioned in the February post, I’m new to Tableau, and as the tone of that post implied,enjoying it very much. Tableau is a robust and flexible solution for data delivery. Like Qlikview, which I worked with a while ago, it is supported by outstanding, and free, introductory training and a very active user community.

As I’ve made my first steps in Tableau I’ve been a frequent user community visitor, and generally have gotten the answers I’ve been looking for. However, like any tool there still have been a few surprises. I’ll run down the top few in this post:

  • Measures can have complex logic
  • Big extracts are tricky
  • Changing data sources is really tricky
  • Sorry, there are some things you just can’t do

Hopefully this post helps other novices negotiate those first few steps a bit more easily. Continue reading

No Silver BI Bullet: Tableau Edition (It’s a good thing!)

For complex work, a very simple app requires a very smart user. That point was driven home to me in Tableau Fundamentals class this week. I don’t see that as bad news at all.

Not so long ago I wrote a piece that attempted to inject a bit of reality into the claims then made by some data visualization tool vendors. I cited unexpected challenges that those adopting such tools for their obvious and compelling data presentation abilities might face. The challenges included unexpectedly complex data integration, establishing solid reporting standards and practices, scaling report distribution as demand for the visualizations expands, and the conversion work that can result from version upgrades.

Although a Fundamentals class, the experienced and enthusiastic instructor and the small, intelligent student group combined to make the two days immensely valuable, going far beyond the basics on the program (more on specific lessons learned will appear in an upcoming post). The instructor’s focus on principles rather than recipes drove home this point: in order to effectively use Tableau you have to understand not only how to operate Tableau itself but also the underlying data management, usability, and statistics principles.

Could it be that adopting easy-to-use Tableau in place of, say, SSRS, Cognos, or SAS requires an upgrade in staff knowledge and expertise? Continue reading

What is Big Data Creativity and How Do You Get It?

Thomas EdisonIn a recent Smart Data Collective post, Bernard Marr cites creativity as a top big data skill, but what is creativity?

His point is, since big data applications are often off the beaten IT path, big data professionals must solve “problems that companies don’t even know they have – as their insights highlight bottlenecks or inefficiencies in the production, marketing or delivery processes,” often with “data which does not fit comfortably into tables and charts, such as human speech and writing.” Continue reading

A Field Guide to Overloaded Data

BugAt the very first TDWI Conference, Duane Hufford described a phenomenon he called “embedded data”, now more commonly called “overloaded data”, where two or more concepts are stuffed into a single data field (“Metadata Repositories,” TDWI Conference 1995). He described and portrayed in graphics three types of overloaded data. Almost 20 years later, overloaded data remains rampant but Mr Hufford’s ideas, presented below with updated examples, are unfortunately not widely discussed.

[Note: in March of 2021 I added one further category, bundling. – BL]

Overloaded data breeds in areas not exposed to sound data management techniques for one reason or the other. Big data acquisition typically loads data uncleansed, shifting the burden of unpacking overloaded fields to the receiver (pity the poor data scientist spending 70% of her time acquiring and cleaning data!)

One might refer to non-overloaded data as “atomic”. Beyond making data harder to use, overloaded data requires more code to manage than atomic data (see why in the sections below) so by extension it increases IT costs.

Here’s a field guide to three different types of overloaded data, associated risks, and how to avoid them: Continue reading

Three things about “Interview with a Data Scientist”

Chemistry-labRecently, I posted “Interview with a Data Scientist” at my company’s blog site. In it, my friend and colleague Yan Li answers four questions about being a data scientist and what it takes to become one. In my view Yan’s responses provide a bracing reminder that data science is something truly new, but that it rests on universal principles of application development. Continue reading