A while back I wrote the post A Field Guide to Overloaded Data, which publicized the work of Duane Hufford, who examined different types of overloaded data during the 1990s. Over the years his classifications of overloaded data effectively categorized data anomalies I encountered in the wild.
That is until recently, when a colleague encountered an array of file names in a single SQL Server column. This instance didn’t fit into the three categories detailed in the earlier post, so I’m presenting it here. I’ve also added it to the the original post.
Definition: Bundled data is a situation where a single column in a table contains an array of values. Continue reading →
“Business process reengineering is the act of recreating a core business process with the goal of improving product output, quality, or reducing costs.”* Recently I’ve perused articles on business process reengineering and have been surprised to find that they share a lack of emphasis on data definition.
By establishing a shared business vocabulary, identifying and describing business-critical entities and events, and applying the defined entities and events in process and system design, BPR teams can ensure an efficient redesigned process that works smoothly from end to end.
In spite of data concerns making up two of the seven key BPR principles (“Merging data collection and processing units” and “Shared databases to interconnect dispersed departments”), articles on the topic tend to lump these concerns into general Information Technology topics, without acknowledging the need for business driven data definition and management. For example, this post stresses the need for “more sources of data and enhanced connectedness to information”. This one recounts a famous Ford BPR example where a new database was central to the solution. Many, like this one, cite “shared databases” as a core principle. However, none details the business leadership and participation necessary to define a common data foundation across a reengineered business process. Continue reading →
Sometimes success seems like a data analytics team’s worst enemy. A few successful visualizations packaged up into a dashboard by a small skunkworks team can generate interest such that a year later the team has published scores of mission critical dashboards. As their use spreads throughout the organization, and as features expand to meet the needs of an expanding user base, the dashboards can slow down and data refreshes fail as they exceed database and analytics tool time and resource limits.
There are steps teams can take to deal with such slowdowns. Analytics tool vendors typically offer efficiency guides, like this one, that help resolve dashboard response time issues. A frequent recommendation is for the dashboard to use summary tables rather than full detail, reducing the amount of data that the dashboard has to parse as the user waits for a viz to render.*
Summary tables also help resolve data refresh timeouts, but their long term success for the team depends on the foundation on which they are built and how they are organized. The most obvious approach is to build custom summaries serving each dashboard. While report-specific tables stand out as a quick win, analysis shows they are a suboptimal solution because they tend to (1) reduce ability to respond to requirements evolution, and (2) make metrics in different dashboards less consistent. Continue reading →
It’s not unusual for talented teams of business analysts to find themselves maintaining significant inventories of Tableau dashboards. In addition to sound development practices, following two key principles in data source design help these teams spend less time in maintenance and focus more on building new visualizations: publishing Tableau data sources separately from workbooks and waiting until the last opportunity to join dimension and fact data.
Imagine a business team — let’s call it Marketing Analytics — with read-only access to a Hadoop store or an enterprise data warehouse. They gain approval for Tableau licenses and Tableau Server publication rights for five tech-savvy data analysts. After a few initial successes with some impactful visualizations, the team gathers steam. After a while the team finds itself supporting scores of published workbooks serving a few hundred managers and executives. In spite of generally sound practices, Marketing Analytics struggles to maintain consistency from one Tableau workbook to another.
Of course no one would do that on purpose, but I as a consultant over many years I’ve often seen it. A vendor fulfills a contract to the letter, which unfortunately allows them to deliver required reports in various, sometimes changing, formats with suspect data quality. The customer company absorbs these costs, leaning on the data analyst to update PowerPoint decks on schedule before the next monthly management meeting in spite of the extra programming work.
These contracts have been for various goods and services, but almost every business contract today is also a contract for data. If a regional gas company hires a vendor to inspect residential lines, then I suspect it wants reports showing inspections conducted and results; a healthcare firm that sends nurses on house calls needs data detailing call schedules and results; and so on.
Companies that supply goods or provide services often don’t feature data management as a core competency, and the quality of their reporting often doesn’t match the quality of their goods or services. Someone in the customer organization has to code around every addition or omission of an expected Excel column, every “N/A” in a numeric field, and every unexpected change from imperial to metric units. Continue reading →
Data quality improvements follow specific, clear leadership from the top. Project leaders count data quality among project goals when senior management encourages them to do so with unequivocal incentives, a common business vocabulary, shared understanding of data quality principles, and general agreement on the objects of interest to the business and their key characteristics.
Poor data quality costs businesses about “$15 million per year in losses, according to Gartner.” AsTendü Yoğurtçuputs it, “artificial intelligence (AI) and machine learning algorithms are only as effective as the data they use.” Data scientistsunderstandthe difficulties well, as they spend over 70% of their time in data prep.
Recent studies report that data entry typos are the largest source of poor data quality (here and here). My experience says otherwise. From what I’ve seen, operational data is generally good, and data errors only appear when data changes context. In this post I’ll detail why data quality is management’s responsibility, and why data quality will remain poor until leadership makes it a priority.Continue reading →
It’s been a truism that data is a resource, but to prove it you just have to follow the money. As the illustration shows, the vast majority of corporate market value draws from intangible assets. Just as money is an abstraction that represents wealth, data is an abstraction that represents these intangible assets.
It’s year three after initial rollout of the Leader’s Data Manifesto (LDM). Since then, many widely publicized events have highlighted the value of data and metadata, and the importance of sound data management (here, here, and here). Recently at Enterprise Data World, John Ladley, Danette McGilvray, James Price, and Tom Redman presented this year’s LDM update. They reintroduced the Manifesto, recounted events of the past year, discussed strategy for the coming year, and issued a call to action for data professionals. Continue reading →
“At least 84 percent of consumers across all industries say their experiences using digital tools and services fall short of expectations.”* That quote headed a recent article by David Roe on the role of data integration in digital workplace apps. However, the opening quote reflects the pervasive dearth of integrated data among the companies most of us frequent.
We’ve all experienced the effects. Last week I was in a fender bender. Due to a mixup I didn’t have my insurance card with me, so I called the insurance company to get the info. They had no record of me associated with my car. It turned out that my car is insured under my wife’s name, hers under mine. Although I’ve been their customer for 25 years, and was driving my own car, they couldn’t give me insurance info. Sure, they were following good security practices. But I’m not letting them off the hook. Continue reading →
What is Data Quality anyway? If you are a data professional, I’m sure someone from outside our field has asked you that question, and if you’re like me you’ve fallen into the trap of answering in data-speak.
To my listener, I’d guess that the experience was similar to having a customer service rep who has just turned down his simple request justify it by describing byzantine company policies.
There’s a ton of great writing available on data quality, and I in no way mean to disparage it or its value in the field. But in that writing I’ve yet to find a concise and compelling definition that’s useful to non-data professionals. I’ll review one or two prevailing definitions and then offer one that could help us unlock real data quality improvements. Continue reading →
Modern data architectures, by enabling data analytics insights, promise to drive order of magnitude value gains across many business sectors (here, here, and here). Not so long ago, big data presented a daunting challenge. Although tools were plentiful, we struggled to conceptualize the architecture and organization within which to capitalize on those tools. Now solid frameworks have emerged. This post reviews two promising models for modern data architecture, and discusses two key cultural values critical to their successful adoption: drive to solve business challenges and drive for universal data correctness. Continue reading →