Obviously, data management is important. Unfortunately, it is not prioritized in most organizations. Those that effectively manage data perform far better than organizations that don’t. Everyone who needs data to do his/her job must drive change to improve data management.
That was the theme of the recent Enterprise Data World (EDWorld) conference this week. This year’s EDWorld event might be the start of a new vitality and influence for the field, marked by introduction of a Leader’s Data Manifesto.
Over the years, data practitioners struggled for recognition and resources within their organizations. In reaction, they often focused on data “train wrecks” that this neglect causes. This year’s conference was no exception. For example: Continue reading →
Even though it happens annually, teams building new visualizations often forget to think about the effects of turning over from one year to another.
In today’s fast paced, Agile world, requirements for even the most critical dashboards and visualizations tend to evolve, and development often proceeds iteratively from a scratchpad sketch through successively more detailed versions to release of a “1.0” production version. Organized analytics teams evolve dashboards within a process framework that include checkpoints ensuring standards are met for security, reliability, usability, and so on.
A reporting team can build a revolutionary analytics capability enabling unprecedented visibility into operations, and then, if year turnover isn’t included in requirements, experience embarrassing errors and usability challenges in the January after initial deployment. In effect, the system experiences its own Y2.xK crisis, not too different from the expected Y2K crisis 16 years ago. Continue reading →
The term “trust” implies absolutes, and that’s a good thing for relationships and art. However, in the business of data management, framing trust in data in true or false terms puts data governance at odds with good practice. A more nuanced view that recognizes the usefulness of not-fully-trusted data can bring vitality and relevance to data governance, and help it drive rather than restrict business results.
The Wikipedia entry — for many a first introduction to data governance — cites Bob Seiner’s definition: “Data governance is the formal execution and enforcement of authority over the management of data and data related assets.” The entry is accurate and useful, but words like “trust”, “financial misstatement”, and “adverse event” lead the reader to focus on the risk management role of governance.
However, the other role of data governance is to help make data available, useful, and understood. That means sometimes making data that’s not fully trusted available and easy to use. Continue reading →
Recently I’ve had the opportunity to dig deeply into the CMMI Data Management Maturity model. Since its release, the DMM model has emerged as the de facto standard data management maturity framework (I’ve listed other frameworks at the end of this post).
I’m deeply impressed by the completeness and polish of the DMM model as a comprehensive catalog of processes required for effective data management. Even after decades in the business the broad scope and business focus of the model changed the way I think about data management.
Over the past year I’ve reviewed what seem like countless plans for enterprise data warehouses. The plans address real problems in the organizations involved: the organization needs better data to recognize trends and react faster to opportunities and challenges; business measures and analyses are unavailable because data in source systems is inconsistent, incomplete, erroneous, or contains current values but no history; and so on.
The plans detail source system data and its integration into a central data hub. But the ones I’m referring to don’t tell how the data will be delivered, or portray a specific vision of how the data is to drive business value. Instead, their business case rests on what I’ll call the “railroad hypothesis”. No one could have predicted how the railroads enabled development of the West, so the improved data infrastructure will create order of magnitude improvements in ability to access, share, and utilize data, from which order of magnitude business benefits will follow.* All too often these plans just build bridges to nowhere. Continue reading →
There’s an unfortunate and rather rude saying about assumptions that I’ve found popular among IT folks I’ve worked with. I say unfortunate because, to me, assumptions that are recognized early and handled the right way are a key to successful projects. Technical players who use assumptions well can help set projects on the right path long before they go astray.
Sometimes on waterfall and hybrid projects technical players are asked to estimate work early, before requirements are complete. My instinctive reaction is not to provide an ungrounded estimate, but that’s not helpful. The way to handle this uncomfortable uncertainty is to fill out the unknowns with assumptions: detailed, realistic statements that provide grounding for your estimate. Continue reading →
A quick Google search seems to reveal if you manage People, Process, and Technology you’ve got everything covered. That’s simply not the case. Data is separate and distinct from the things it describes — namely people, processes, and technologies — and organizations must separately and intentionally manage it.
The data management message seems a tough one to deliver effectively. Data management interest groups have hammered at it for years, but a sometimes preachy and jargon laden approach relying on data quality train wreck stories hasn’t generally loosened corporate purse strings. Yes, financial companies’ data-first successes in the 1990s paved the way for the ’00s dot com juggernauts, whose market capitalization stems largely from innovative data management. Yet, we still have huge personal data breaches at some of our most trusted companies, and data scientists spend the bulk of their valuable time acquiring, cleaning, and integrating poorly organized data.
The first steps are often the hardest, so here’s a short, no jargon, big picture guide to getting started with effective data management in three steps:
I had pondered writing a post called “Requirements Decay” about how requirements don’t last forever. In my research I found that such a post, complete with “my” words “requirements decay” and “requirements half-life”, had already been done comprehensively here. In a compact argument underpinned by half-life mathematics, the anonymous author proposes that a requirement isn’t likely to stand unchanged forever and explores the implications.
For me, requirements decay is an idea that helps us think realistically about project planning and improves our chances of meeting business needs. Continue reading →
The data integration process is traditionally thought of in three steps: extract, transform, and load (ETL). Putting aside the often-discussed order of their execution, “extract” is pulling data out of a source system, “transform” means validating the source data and converting it to the desired standard (e.g. yards to meters), and load means storing the data at the destination.
An additional step, data “enrichment”, has recently emerged, offering significant improvement in business value of integrated data. Applying it effectively requires a foundation of sound data management practices. Continue reading →