The data integration process is traditionally thought of in three steps: extract, transform, and load (ETL). Putting aside the often-discussed order of their execution, “extract” is pulling data out of a source system, “transform” means validating the source data and converting it to the desired standard (e.g. yards to meters), and load means storing the data at the destination.
An additional step, data “enrichment”, has recently emerged, offering significant improvement in business value of integrated data. Applying it effectively requires a foundation of sound data management practices. Continue reading
I believe that early, effective big picture diagrams are key to application development project success. According to the old saw, no project succeeds without a catchy acronym. Maybe so, but I’d say no project succeeds without a good big picture diagram. The question: what constitutes a good one? To me good high-level diagrams have four key characteristics: they are simple, precise, expressive, and correct.
I recently stumbled upon one of The Martin Agency’s hilarious Geico caveman ads and wondered, rather geekily, why they didn’t do one about data analysis. I think if a caveman suddenly arrived in the 2010s he or she would see parallels between his life and the activities of today’s knowledge worker. When I thought it through, it seemed obvious that knowledge workers need to be more like farmers and less like hunter/gatherers if they want to achieve the full potential of business intelligence.
I hold a strong prejudice that IT paradigms are useful for about 30 years. The PC was dominant from 1980 to 2010, “online” mainframe systems from 1970 to 2000, and so on. If that’s the case then time’s up for Bill Inmon’s data warehousing framework. So far no widely held pattern has emerged to help us envision data management in today’s big data, mobile BI, end-user visualization, predictive analytics world, but at their recent Business Technology conference, Forrester Research took a swing at it by presenting their 2009 “hub and spoke” organizational strategy as a data management vision. Continue reading
Recently the BBC posted this video. On first view it is just funny, but watching those dogs learn to drive really reminded me of personal experiences with IT teams making big learning transitions. To represent those real situations let’s consider a fictional team of SQL developers facing the daunting task of deploying a functional Hadoop-based analytics prototype in two months. The video parallels their critical learning success factors: (1) set audacious goals, (2) learn bit by bit, and (3) know your limits.
One common theme in recent tectonic shifts in information technology is data management. Analyzing customer responses may require combing through unstructured emails and tweets. Timely analysis of web interactions may demand a big data solution. Deployment of data visualization tools to users may dictate redesign of warehouses and marts. The data architect is a key player in harnessing and capitalizing on new data technologies. Continue reading
I’ve posted a couple of articles at my company’s blog site that reflect my view on data quality efforts:
- Yes, there is a business case for improving data quality, and I’ve got real business value examples. If you look for real money where you anecdotally know there are data quality problems, you’ll likely find it in high costs of data correction and rework, and savings related to business process improvements that reliable data enables.
- There are distinct things an organization can do to reap benefits of improved data management and data quality. (1) Get started in the first place, (2) find the tangible benefits, (3) cross the departmental silos that exist in every large organization, and (4) promote sound data management practices.
Who would want to be a national health care administrator? Who would want the responsibility for managing health care and formulating health policy for tens or hundreds of millions of people? It seems obvious that such decisions would rely on quality data. A recent interview impressed upon me how much data managers can learn from a field where data recording millions of separate life and death decisions aggregates to support decisions on the future allocation of health care resources.
“Our goals can only be reached through a vehicle of a plan, in which we must fervently believe, and upon which we must vigorously act. There is no other route to success.” – Pablo Picasso
It is an old story: about 30% of IT application projects succeed, 45% are “challenged,” and the other quarter fail altogether. That’s the consistent result over the years of the Standish Group Study of Project Outcomes. Jorge Dominguez, here, displays a chart of the remarkably similar results since 1994. Not a pretty picture, right? Some question the validity of the Standish studies, but Scott Ambler parallels the Standish story in a recent Dr Dobbs column called “Lies, Great Lies, and Software Development Project Plans,” which itemizes the strategies commonly used by IT project managers to “stay out of trouble” when schedule/budget results don’t match initial estimates. For example, “18% change the original schedule to reflect the actual results”. Continue reading