Tag Archives: Big Data

What is Big Data Creativity and How Do You Get It?

Thomas EdisonIn a recent Smart Data Collective post, Bernard Marr cites creativity as a top big data skill, but what is creativity?

His point is, since big data applications are often off the beaten IT path, big data professionals must solve “problems that companies don’t even know they have – as their insights highlight bottlenecks or inefficiencies in the production, marketing or delivery processes,” often with “data which does not fit comfortably into tables and charts, such as human speech and writing.” Continue reading

A Field Guide to Overloaded Data

BugAt the very first TDWI Conference, Duane Hufford described a phenomenon he called “embedded data”, now more commonly called “overloaded data”, where two or more concepts are stuffed into a single data field (“Metadata Repositories,” TDWI Conference 1995). He described and portrayed in graphics three types of overloaded data. Almost 20 years later, overloaded data remains rampant but Mr Hufford’s ideas, presented below with updated examples, are unfortunately not widely discussed.

[Note: in March of 2021 I added one further category, bundling. – BL]

Overloaded data breeds in areas not exposed to sound data management techniques for one reason or the other. Big data acquisition typically loads data uncleansed, shifting the burden of unpacking overloaded fields to the receiver (pity the poor data scientist spending 70% of her time acquiring and cleaning data!)

One might refer to non-overloaded data as “atomic”. Beyond making data harder to use, overloaded data requires more code to manage than atomic data (see why in the sections below) so by extension it increases IT costs.

Here’s a field guide to three different types of overloaded data, associated risks, and how to avoid them: Continue reading

Three things about “Interview with a Data Scientist”

Chemistry-labRecently, I posted “Interview with a Data Scientist” at my company’s blog site. In it, my friend and colleague Yan Li answers four questions about being a data scientist and what it takes to become one. In my view Yan’s responses provide a bracing reminder that data science is something truly new, but that it rests on universal principles of application development. Continue reading

To SQL or to NoSQL?

DiscDrivesRecently there was a great post at Dzone recounting how one “tech savvy startup” moved away from its NoSQL database management system to a relational one. The writer, Matt Butcher, plays out the reasons under these main points:

  1. Our data is relational
  2. We need better querying
  3. We have access to better resources

Summing up: “The bottom line: choose the right tool.” Continue reading

A New Framework for Data Management?

HubAndSpokeI hold a strong prejudice that IT paradigms are useful for about 30 years. The PC was dominant from 1980 to 2010, “online” mainframe systems from 1970 to 2000, and so on. If that’s the case then time’s up for Bill Inmon’s data warehousing framework. So far no widely held pattern has emerged to help us envision data management in today’s big data, mobile BI, end-user visualization, predictive analytics world, but at their recent Business Technology conference, Forrester Research took a swing at it by presenting their 2009 “hub and spoke” organizational strategy as a data management vision. Continue reading

Relational DB Pros: The Times They Are A-Changin’

Recently I read a thoughtful post DBQuestion
at the PASS Business Analytics Conference site discussing how different the world is now for database professionals. Author Chris Webb focuses on the data science side in this post. His analysis made me think of the challenges and opportunities “big data” serves up to relational database designers.

To me these challenges are fundamental. Big Data and NoSQL bring lots of what we know about data elements, inherent data design, and data management into question. I think considering these elements closely leads to a sensible to-do list for relational database professionals. Continue reading

Skills of the Data Architect

One common theme in recent tectonic shifts in information technology is data management. Analyzing customer responses may require combing through unstructured emails and tweets. Timely analysis of web interactions may demand a big data solution. Deployment of data visualization tools to users may dictate redesign of warehouses and marts. The data architect is a key player in harnessing and capitalizing on new data technologies. Continue reading

Big Data opportunities and NoSQL challenges

As a relational database professional I couldn’t help but feel like something would be lost with the emergence of the new Big Data/NoSQL database management systems (DBMS). After about two years of buzz around the topic, I’m really excited about the emerging possibilities. However, I’m pretty sure we’ll miss the relational model’s strengths in requirements definition and conceptual design. Continue reading

Special considerations in health care data

I’ve worked with health care data for the past few years, and in a recent conversation I realized it might be valuable to detail some of the complexities of health care data for those who might enter this growing field.  Of course these considerations aren’t unique to health care, but they are typical of the challenges that the new health care application developer or analyst might face. Continue reading