Tag Archives: Data Science

No Silver BI Bullet: Tableau Edition (It’s a good thing!)

For complex work, a very simple app requires a very smart user. That point was driven home to me in Tableau Fundamentals class this week. I don’t see that as bad news at all.

Not so long ago I wrote a piece that attempted to inject a bit of reality into the claims then made by some data visualization tool vendors. I cited unexpected challenges that those adopting such tools for their obvious and compelling data presentation abilities might face. The challenges included unexpectedly complex data integration, establishing solid reporting standards and practices, scaling report distribution as demand for the visualizations expands, and the conversion work that can result from version upgrades.

Although a Fundamentals class, the experienced and enthusiastic instructor and the small, intelligent student group combined to make the two days immensely valuable, going far beyond the basics on the program (more on specific lessons learned will appear in an upcoming post). The instructor’s focus on principles rather than recipes drove home this point: in order to effectively use Tableau you have to understand not only how to operate Tableau itself but also the underlying data management, usability, and statistics principles.

Could it be that adopting easy-to-use Tableau in place of, say, SSRS, Cognos, or SAS requires an upgrade in staff knowledge and expertise? Continue reading

What is Big Data Creativity and How Do You Get It?

Thomas EdisonIn a recent Smart Data Collective post, Bernard Marr cites creativity as a top big data skill, but what is creativity?

His point is, since big data applications are often off the beaten IT path, big data professionals must solve “problems that companies don’t even know they have – as their insights highlight bottlenecks or inefficiencies in the production, marketing or delivery processes,” often with “data which does not fit comfortably into tables and charts, such as human speech and writing.” Continue reading

A Field Guide to Overloaded Data

BugAt the very first TDWI Conference, Duane Hufford described a phenomenon he called “embedded data”, now more commonly called “overloaded data”, where two or more concepts are stuffed into a single data field (“Metadata Repositories,” TDWI Conference 1995). He described and portrayed in graphics three types of overloaded data. Almost 20 years later, overloaded data remains rampant but Mr Hufford’s ideas, presented below with updated examples, are unfortunately not widely discussed.

Overloaded data breeds in areas not exposed to sound data management techniques for one reason or the other. Big data acquisition typically loads data uncleansed, shifting the burden of unpacking overloaded fields to the receiver (pity the poor data scientist spending 70% of her time acquiring and cleaning data!)

One might refer to non-overloaded data as “atomic”. Beyond making data harder to use, overloaded data requires more code to manage than atomic data (see why in the sections below) so by extension it increases IT costs.

Here’s a field guide to three different types of overloaded data, associated risks, and how to avoid them: Continue reading

Three things about “Interview with a Data Scientist”

Chemistry-labRecently, I posted “Interview with a Data Scientist” at my company’s blog site. In it, my friend and colleague Yan Li answers four questions about being a data scientist and what it takes to become one. In my view Yan’s responses provide a bracing reminder that data science is something truly new, but that it rests on universal principles of application development. Continue reading