Application developers and business people accessing relational databases need data dictionaries in order to properly load or query a database. The data dictionary provides a source of information about the model for those without model access, including entity/table and attribute/column definitions, datatypes, primary keys, relationships among tables, and so on. The data dictionary also provides data modelers with a useful cross reference that improves modeling productivity.
It is particularly useful for the dictionary to be a filterable/sortable Excel document, but out of the box ERwin, one of the leading data modeling tools, includes a notably inflexible reporting capability. Luckily, it is possible to directly query the ERwin “metamodel”. However, I found the ERwin documentation a bit hard to decipher and not quite accurate. Hopefully this post will save modelers some steps in figuring out how to query the metamodel.
Here are the topics covered:
- ODBC drivers in the ERwin install
- Reporting experience in MS Access, WinSQL, and MS Excel Continue reading
The data integration process is traditionally thought of in three steps: extract, transform, and load (ETL). Putting aside the often-discussed order of their execution, “extract” is pulling data out of a source system, “transform” means validating the source data and converting it to the desired standard (e.g. yards to meters), and load means storing the data at the destination.
An additional step, data “enrichment”, has recently emerged, offering significant improvement in business value of integrated data. Applying it effectively requires a foundation of sound data management practices. Continue reading
The well-publicized problems with healthcare.gov are disturbing, especially when we remember they might result in many continuing without health insurance. But it seemed a step in the right direction when recent a news report differentiated between “front end” and “back end” problems. The back end problems were data issues, like a married applicant with two kids being sent to an insurer’s systems as a man with three wives.
Coincidently, I recently responded to a questionnaire about health care data. I’ve paraphrased the questions and my responses below. Perhaps the views of someone who’s spent a lot of time in the health care engine room might provide some useful perspective. Continue reading
A technique for reporting requirements has emerged as the de facto standard in the business intelligence community. The technique, which emerged in the mid-2000s, is new enough to be as yet unacknowledged by the requirements analysis powers that be. David Loshin describes how it works in this 2007 post:
- Start with a business question about how to monitor a business process using a metric, like “How many widgets have been shipped
by size each week by warehouse?” Continue reading
I recently stumbled upon one of The Martin Agency’s hilarious Geico caveman ads and wondered, rather geekily, why they didn’t do one about data analysis. I think if a caveman suddenly arrived in the 2010s he or she would see parallels between his life and the activities of today’s knowledge worker. When I thought it through, it seemed obvious that knowledge workers need to be more like farmers and less like hunter/gatherers if they want to achieve the full potential of business intelligence.
I hold a strong prejudice that IT paradigms are useful for about 30 years. The PC was dominant from 1980 to 2010, “online” mainframe systems from 1970 to 2000, and so on. If that’s the case then time’s up for Bill Inmon’s data warehousing framework. So far no widely held pattern has emerged to help us envision data management in today’s big data, mobile BI, end-user visualization, predictive analytics world, but at their recent Business Technology conference, Forrester Research took a swing at it by presenting their 2009 “hub and spoke” organizational strategy as a data management vision. Continue reading
Data management professionals have long and sometimes rather Quixotically driven organizations to “get past the spreadsheet culture.” Maybe that’s misguided. The recent furor over a widely read social science paper may show how we can look to scientific peer review for a way to govern data, spreadsheets and all.
Recently, it was found that a key study underpinning debt-reduction as a driver of economic growth based its conclusions on a flawed spreadsheet. As this ArsTechnica article describes, Carmen Reinhart and Kenneth Rogoff’s Growth in a Time of Debt seemingly proved a connection between “high levels of debt and negative average economic growth”. But, per a recent study by Thomas Herndon, Michael Ash, and Robert Pollin, it turns out that the study’s conclusions drew from a Microsoft Excel formula mistake, questionable data exclusions, and non-standard weightings of base data. The ArsTechnica piece finds those conclusions fade to a more ambiguous outcome with errors and apparent biases corrected. Continue reading
Recently I read a thoughtful post
at the PASS Business Analytics Conference site discussing how different the world is now for database professionals. Author Chris Webb focuses on the data science side in this post. His analysis made me think of the challenges and opportunities “big data” serves up to relational database designers.
To me these challenges are fundamental. Big Data and NoSQL bring lots of what we know about data elements, inherent data design, and data management into question. I think considering these elements closely leads to a sensible to-do list for relational database professionals. Continue reading
As important as it is, data modeling has always had a geeky, faintly impractical tinge to some. I’ve seen application development projects proceed with a suboptimal, “good enough”, model. The resulting systems might otherwise be well-architected, but sometimes strange vulnerabilities emerge that track directly to data design flaws.
Recently I saw an example where a “good enough” data design, similar to the one pictured, enabled a significant application bug.
One common theme in recent tectonic shifts in information technology is data management. Analyzing customer responses may require combing through unstructured emails and tweets. Timely analysis of web interactions may demand a big data solution. Deployment of data visualization tools to users may dictate redesign of warehouses and marts. The data architect is a key player in harnessing and capitalizing on new data technologies. Continue reading