I hold a strong prejudice that IT paradigms are useful for about 30 years. The PC was dominant from 1980 to 2010, “online” mainframe systems from 1970 to 2000, and so on. If that’s the case then time’s up for Bill Inmon’s data warehousing framework. So far no widely held pattern has emerged to help us envision data management in today’s big data, mobile BI, end-user visualization, predictive analytics world, but at their recent Business Technology conference, Forrester Research took a swing at it by presenting their 2009 “hub and spoke” organizational strategy as a data management vision.
[I should say at the outset that the diagrams here are not Forrester’s, and that this post is based on my notes from the BT conference. For definitive info on the hub and spoke approach please go to their site.]
Working with reporting databases in the early 1980s, Bill Inmon discovered that reporting and operational systems have very different usage patterns, and that benefits of duplicating operational data in a separate reporting database far outweighed the redundancy cost. This was a revolutionary idea at a time when most looked forward to a day when relational technology would “eliminate data duplication”.
Mr. Inmon developed this insight into a four-sector data management model consisting of operational data sources; an “atomic” database, later called the data warehouse, containing full historical detail and optimized for storage and archival; a “departmental” sector containing datamarts where data drawn from the atomic database serves specific business needs; and finally a presentation layer that includes reporting and analysis.
Of course there have been many variations and controversies over three decades (most notably the “Inmon vs Kimball” brouhaha) but virtually all data management approaches adopted Mr. Inmon’s core “operational vs. decision support” concept.
Today, traditional data warehousing frameworks are under pressure. The Forrester team turned the Inmon model on its side and called it a “layer cake” architecture, where data works its way up successive time-consuming processes before presentation for business purposes.
According to Forrester analysts, the layered data architecture prevents IT from meeting today’s business needs, which center on real time data capture, analysis, and utilization. They propose instead a hub and spoke architecture where a central data pool supports a variety of different business uses on the end of the spokes.
The Forrester team also builds from a single core concept. In the Inmon model, the “extract, transform, and load” (ETL) interface from operational systems to the atomic database conforms all data to the organizational standard, but in the Forrester model integration is applied at the spokes: each business application applies its own semantics to data extracted from the central store. So the core concept is that data semantics vary by application. The result, as presented by the Forrester analysts, is “hyperflexibility”, presumably because the business isn’t delayed by IT’s semantic preprocessing on the way in.
Critiquing the Forrester Model
A cynic may quickly find flaws in the hub and spoke architecture: isn’t the data pool just another version of the atomic database? Not at all. At the risk of repeating, a core element of the atomic database was its role as the rationalized, fully detailed, historical record of data of interest to the organization. The data pool is certainly detailed and retains history, but in the hub and spoke world data is not processed on the way in.
Furthermore, even though external data was listed as part of it, the data warehouse model focused on internally-generated data, while the hub and spoke vision focuses on data from external sources, for example social media and the like. Forrester’s Michele Goetz described the data pool as a hub for data governance, but the emphasis on external data means that data governance in the hub and spoke architecture surely focuses more on regulatory and privacy compliance, availability, and delivery SLAs and less on validity, correctness, and consistency.
Ms. Goetz’s Forrester colleague Boris Evelson channeled the thoughts of many IT professionals who attended one of his presentations when he said, while discussing the challenges of a Hadoop architecture, that there are layer cakes inside the data pool. The hub and spoke pattern emphasizes data delivery for business purposes on business terms, but the Forrester researchers didn’t provide much input on underlying technical challenges. How exactly does a more responsive data architecture avoid replacing a legacy layer cake with a big data layer cake? The Inmon model pointed the way for IT to solve a business problem, providing benefits all around. The hub and spoke model as expressed at the conference begs the IT questions.
Is This the New Shared Vision?
Even so, a great virtue of the hub and spoke model is that it is possible.
I don’t know of any organization that actualized Inmon’s vision. After 30 years it’s safe to say that integrating an organization’s data resource into a single coherent structure was just too hard, even allowing for all kinds of complexity within the atomic database. There are millions of successful tactical datamarts that have proven the wisdom of Inmon’s core idea, but for most organizations the strategic data warehouse remains elusive.
On the other hand, organizations can define the data pool their own way, figure out SLAs and governance structures that make sense for different types of data, and use that flexible framework as a foundation for cutting edge data-enabled applications.
Kudos to Forrester for delivering a useful model for data management. Will it be the new shared data management vision? Only time will tell.