Modern data architectures, by enabling data analytics insights, promise to drive order of magnitude value gains across many business sectors (here, here, and here). Not so long ago, big data presented a daunting challenge. Although tools were plentiful, we struggled to conceptualize the architecture and organization within which to capitalize on those tools. Now solid frameworks have emerged. This post reviews two promising models for modern data architecture, and discusses two key cultural values critical to their successful adoption: drive to solve business challenges and drive for universal data correctness.
Modern Data Architectures
A while back I wrote about one emerging model: Forrester’s “hub and spoke” concept. The hub and spoke helped us “find the handle” on important conversations about governance, integration of external data, and how data lakes differed in concept from data warehouses. However, after finding that handle the challenges with Hub and Spoke became apparent and new frameworks evolved.
One such framework is the Data Fabric, introduced by Robert Abate in 2008 at the DAMA International Symposium (described here). The Data Fabric, or Data Marketplace, would serve as a central source of data but also a common focus for metadata and master data management, data quality, data governance, and even business process control (by virtue of serving as a controlled source for data driving business processes). In terms of data management, the data fabric concept is still being fleshed out, but vendors have jumped into the breach offering a “converged platform that supports the storage, processing, analysis and management of disparate data” across in-house and cloud data sources.
Mike Ferguson, presenting at April’s Enterprise Data World, added data management flesh onto the data fabric’s bones, offering broad brush business and data management steps to “Conquer Complexity, Govern Data And Accelerate Time to Value”. Key recommendations: establish a shared business vocabulary, institute a data catalog, align organization and initiatives with business strategy, and “facilitate a change in culture”.
So now we’re armed with solid visions for integrating modern data tools. But just as in past business computing revolutions, culture is key. In the 80s, those who resisted structured programming (yes, there were many) struggled with CASE tools, and in the 90s batch COBOL programmers struggled to transition to client server computing. More recently, “big data culture” is noted as key to reaping the benefits of modern data analytics and data science. In my experience organizations that focus on technologies at the expense of culture rarely transition to new tools successfully.
To me, the two key elements of that culture are (1) drive to solve business challenges and (2) drive for universal data correctness.
Drive to Solve Business Challenges
Mr. Abate describes a flipped information triangle to illustrate his six step process for answering business questions with data. Rather than the six month internal-data-focused ETL then reporting process of the data warehouse, he suggests a lean six to eight week data discovery and analytics effort that single-mindedly drives to business insights using both internal and external data sources.
Mr. Abate is an energetic, charismatic guy with a laser focus on business success. For those who would emulate his success, these personal qualities are as important as the process. In order to deliver value, analysts must understand relevant business challenges and care about solving them. They have to rapidly identify, cleanse, and integrate source data, imaginatively combine and visualize results, then present them to a skeptical audience of busy senior managers.
Success in modern data architecture demands data professionals who can build effective relationships with senior managers, understand business challenges, efficiently wrangle diverse, poor quality source data, attractively visualize results, and communicate imaginative business solutions.
Drive for Universal Data Correctness
To Mr. Abate, the six to eight week effort described above is a pilot that “creates the ‘interest’ in the business to invest” in a modern data management capability. Mike Ferguson weaves the elements that make up this capability into a plan for harnessing data for business gain. The plan is all encompassing, but for our purposes he cites “a common vocabulary and lineage” as “the foundation for smart data management, governance, and creating business value.” In short, the common vocabulary provided by a well-managed and widely-used data catalog is the basis for capitalizing on data and analytics.
Beyond the data catalog, requirements such as the Basel Accords and GDPR mean “a requirement for data governance, single source of truth and [advanced toolsets that] utilize new technologies to trace information flow and provide for lineage / impact analysis of information – a key for compliance….Consider a solution such as Global IDs – a automated toolset that applies both rules based and machine learning/AI to achieve 80+% confidence in data organization, domain categorization, and semantic mapping.”*
But as those technologies emerge and make their way into general use, data analysts spend most of their time cleaning up and integrating data. Often, critical data is missing or obviously incorrect. It’s easy to blame the data source when this happens, but in my experience most development projects operate under stress and sacrifice priorities along the way. Downstream data quality is rarely on the priority list, so we end up with situations like these: a retail company where “over one million records contained a home telephone number of “000000000” and addresses contained flight numbers, and an insurance company that had customer records with 99/99/99 in the creation date field of policy”.
Universal correctness means that data accurately reflects business facts, and thereby is correct from any perspective and for any use. In the policy date example above, perhaps the customer records in question came from a sales mailing list for which policy creation date was irrelevant and the sales app dev team didn’t think much of it. However, a team that valued universal data correctness would either get policy creation date right or get rid of it if its not needed.
Sure, that’s just one data point, but if universal correctness is instilled as a value then over time data quality in general is increased, the time data analysts spend cleaning and integrating data reduces, and the time devoted to solving business problems increases. Even improving from 70% wrangling and 30% analysis to 50%/50% would be a huge productivity increase.
In the same way we now prioritize data security, organizations with effective data management capabilities use training, project reviews, and compensation incentives to raise data consciousness. If, as Mr. Ferguson says, we’re building “the ‘logical’ data lake as an organized collection of raw, in-progress and trusted data In multiple data stores” then its a world where any data destination can be a data source, downstream data quality determines analysis efficiency, and universal data correctness is a core value.
Thinkers like Mike Ferguson and Robert Abate have shown the way ahead. Its up to us to foster drive to solve business challenges and drive for universal correctness as core values enabling the successful modern data capability.
*Robert Abate, email conversation