Data management professionals have long and sometimes rather Quixotically driven organizations to “get past the spreadsheet culture.” Maybe that’s misguided. The recent furor over a widely read social science paper may show how we can look to scientific peer review for a way to govern data, spreadsheets and all.
Recently, it was found that a key study underpinning debt-reduction as a driver of economic growth based its conclusions on a flawed spreadsheet. As this ArsTechnica article describes, Carmen Reinhart and Kenneth Rogoff’s Growth in a Time of Debt seemingly proved a connection between “high levels of debt and negative average economic growth”. But, per a recent study by Thomas Herndon, Michael Ash, and Robert Pollin, it turns out that the study’s conclusions drew from a Microsoft Excel formula mistake, questionable data exclusions, and non-standard weightings of base data. The ArsTechnica piece finds those conclusions fade to a more ambiguous outcome with errors and apparent biases corrected. Continue reading →
As important as it is, data modeling has always had a geeky, faintly impractical tinge to some. I’ve seen application development projects proceed with a suboptimal, “good enough”, model. The resulting systems might otherwise be well-architected, but sometimes strange vulnerabilities emerge that track directly to data design flaws.
Recently I saw an example where a “good enough” data design, similar to the one pictured, enabled a significant application bug.
In some presentations, I assert that top-down data modeling should result in not only a business-consistent model but also a pretty well normalized model.
One of the basic concepts behind normalization is functional dependency. In layperson’s terms, functional dependency means separating entities from each other and putting attributes into the obviously correct entity. For example, a business person knows that item color doesn’t belong in the order table because it describes the item, not the order. Everyone knows that the order isn’t green! Continue reading →
Recently I was in a conversation about data modeling standards. I confess that I’m not really the standards type. I understand the value of standards and especially how important it is to follow them so others can interpret and use work products. It is just that I prefer to focus on understanding of the principles behind the standards. In general, it seems to me that following standards is trivial for someone who understand the principles, but impossible for someone who doesn’t. But there doesn’t seem to be general understanding of data modeling principles. Continue reading →
Data quality in most large organizations is commonly known to be rather lacking. Most would argue that things haven’t gotten much better since this 2007 Accenture study found that “Managers Say the Majority of Information Obtained for Their Work Is Useless”. To some, quotes like that are shocking, but if you think about how information is processed in most Fortune 1000 sized organizations it is surprising that data available to managers is as good as it is. These slides have been useful in my efforts to explain the persistence of data quality problems in large organizations. Continue reading →
I’ve posted a couple of articles at my company’s blog site that reflect my view on data quality efforts:
Yes, there is a business case for improving data quality, and I’ve got real business value examples. If you look for real money where you anecdotally know there are data quality problems, you’ll likely find it in high costs of data correction and rework, and savings related to business process improvements that reliable data enables.
There are distinct things an organization can do to reap benefits of improved data management and data quality. (1) Get started in the first place, (2) find the tangible benefits, (3) cross the departmental silos that exist in every large organization, and (4) promote sound data management practices.
Who would want to be a national health care administrator? Who would want the responsibility for managing health care and formulating health policy for tens or hundreds of millions of people? It seems obvious that such decisions would rely on quality data. A recent interview impressed upon me how much data managers can learn from a field where data recording millions of separate life and death decisions aggregates to support decisions on the future allocation of health care resources.
The Atlantic, not typically a technical rag, recently presented an article by business and economics editor Megan McArdle on health care data integration entitled “Paging Dr. Luddite”. The article brings to a mass audience an understanding of both the importance and difficulty of data integration, but the title and general anti-healthcare-professional tone seem counterproductive.
I’ve worked with health care data for the past few years, and in a recent conversation I realized it might be valuable to detail some of the complexities of health care data for those who might enter this growing field. Of course these considerations aren’t unique to health care, but they are typical of the challenges that the new health care application developer or analyst might face. Continue reading →
Recently there has been a long, and very interesting, discussion of do-it-yourself versus third-party metadata tools on LinkedIn’s TDWI BI and DW discussion forum (membership required to follow the link). I have followed but haven’t commented, but I suppose I contributed when Information Management kindly published my article on DIY metadata.
The discussion is extremely informative, presenting the views of a variety of knowledgeable professionals in different situations, and describing successful and sometimes not-so-successful efforts to solve the essential metadata challenge: how to document what information is locked up in databases. Continue reading →