Application developers and business people accessing relational databases need data dictionaries in order to properly load or query a database. The data dictionary provides a source of information about the model for those without model access, including entity/table and attribute/column definitions, datatypes, primary keys, relationships among tables, and so on. The data dictionary also provides data modelers with a useful cross reference that improves modeling productivity.
It is particularly useful for the dictionary to be a filterable/sortable Excel document, but out of the box ERwin, one of the leading data modeling tools, includes a notably inflexible reporting capability. Luckily, it is possible to directly query the ERwin “metamodel”. However, I found the ERwin documentation a bit hard to decipher and not quite accurate. Hopefully this post will save modelers some steps in figuring out how to query the metamodel.
The data integration process is traditionally thought of in three steps: extract, transform, and load (ETL). Putting aside the often-discussed order of their execution, “extract” is pulling data out of a source system, “transform” means validating the source data and converting it to the desired standard (e.g. yards to meters), and load means storing the data at the destination.
An additional step, data “enrichment”, has recently emerged, offering significant improvement in business value of integrated data. Applying it effectively requires a foundation of sound data management practices. Continue reading →
The well-publicized problems with healthcare.gov are disturbing, especially when we remember they might result in many continuing without health insurance. But it seemed a step in the right direction when recent a news report differentiated between “front end” and “back end” problems. The back end problems were data issues, like a married applicant with two kids being sent to an insurer’s systems as a man with three wives.
Coincidently, I recently responded to a questionnaire about health care data. I’ve paraphrased the questions and my responses below. Perhaps the views of someone who’s spent a lot of time in the health care engine room might provide some useful perspective. Continue reading →
I recently stumbled upon one of The Martin Agency’s hilarious Geico caveman ads and wondered, rather geekily, why they didn’t do one about data analysis. I think if a caveman suddenly arrived in the 2010s he or she would see parallels between his life and the activities of today’s knowledge worker. When I thought it through, it seemed obvious that knowledge workers need to be more like farmers and less like hunter/gatherers if they want to achieve the full potential of business intelligence.
I hold a strong prejudice that IT paradigms are useful for about 30 years. The PC was dominant from 1980 to 2010, “online” mainframe systems from 1970 to 2000, and so on. If that’s the case then time’s up for Bill Inmon’s data warehousing framework. So far no widely held pattern has emerged to help us envision data management in today’s big data, mobile BI, end-user visualization, predictive analytics world, but at their recent Business Technology conference, Forrester Research took a swing at it by presenting their 2009 “hub and spoke” organizational strategy as a data management vision. Continue reading →
Data management professionals have long and sometimes rather Quixotically driven organizations to “get past the spreadsheet culture.” Maybe that’s misguided. The recent furor over a widely read social science paper may show how we can look to scientific peer review for a way to govern data, spreadsheets and all.
Recently, it was found that a key study underpinning debt-reduction as a driver of economic growth based its conclusions on a flawed spreadsheet. As this ArsTechnica article describes, Carmen Reinhart and Kenneth Rogoff’s Growth in a Time of Debt seemingly proved a connection between “high levels of debt and negative average economic growth”. But, per a recent study by Thomas Herndon, Michael Ash, and Robert Pollin, it turns out that the study’s conclusions drew from a Microsoft Excel formula mistake, questionable data exclusions, and non-standard weightings of base data. The ArsTechnica piece finds those conclusions fade to a more ambiguous outcome with errors and apparent biases corrected. Continue reading →
Recently I read a thoughtful post
at the PASS Business Analytics Conference site discussing how different the world is now for database professionals. Author Chris Webb focuses on the data science side in this post. His analysis made me think of the challenges and opportunities “big data” serves up to relational database designers.
To me these challenges are fundamental. Big Data and NoSQL bring lots of what we know about data elements, inherent data design, and data management into question. I think considering these elements closely leads to a sensible to-do list for relational database professionals. Continue reading →
One common theme in recent tectonic shifts in information technology is data management. Analyzing customer responses may require combing through unstructured emails and tweets. Timely analysis of web interactions may demand a big data solution. Deployment of data visualization tools to users may dictate redesign of warehouses and marts. The data architect is a key player in harnessing and capitalizing on new data technologies. Continue reading →
Recently I was in a conversation about data modeling standards. I confess that I’m not really the standards type. I understand the value of standards and especially how important it is to follow them so others can interpret and use work products. It is just that I prefer to focus on understanding of the principles behind the standards. In general, it seems to me that following standards is trivial for someone who understand the principles, but impossible for someone who doesn’t. But there doesn’t seem to be general understanding of data modeling principles. Continue reading →
When Tom Petty sang, “Hey baby, there ain’t no easy way out” he wasn’t referring to business intelligence (BI) reporting but he might have been. Current generation reporting engines, AKA data visualization or data discovery tools, market their products with statements like these, emphasizing quick development and ease of use:
“The democratization of data is here. In minutes, create an interactive viz (sic) and embed it in your website. Anyone can do it— and it’s free.” (Tableau Products Page)
“Easy yet sophisticated report design empowers your employees to design professional and telling reports in minutes not days” (Windward)
I like these tools, and I do believe that they can provide a leaner, more productive, and more informative approach to BI reporting than some more mature products. However, none is a silver bullet for all data integration and reporting woes. Continue reading →