I recently found myself in a series of conversations in which I needed to make a case for dimensional data modeling. The discussions involved a group of highly skilled data architects who were surely familiar with dimensional techniques but didn’t see them as the best solution in the case at hand.
I thought it would be easy to find a quick, jargon free summary of best reporting database design principles aimed at a technical audience. There were a number of good summaries (cited at the end of this post), but none pitched just right for this highly-technical-but-outside-the-data-warehouse-world crowd.
I wanted to raise the dimensional model because, for most business reporting scenarios, it not only delivers on reporting needs, but also helps report developers handle changes to those needs as a side effect of the design.
So these are the notes I prepared for the conversation. They helped us all get on the same page, hopefully they will be useful to others: Continue reading →
Data quality doesn’t have to be a train wreck. Increased regulatory scrutiny, NoSQL performance gains, and the needs of data scientists are quietly changing views and approaches toward data quality. The result: a pathway to optimism and data quality improvement.
Here’s how you can get on the new and improved data quality train in each of those three areas: Continue reading →
Obviously, data management is important. Unfortunately, it is not prioritized in most organizations. Those that effectively manage data perform far better than organizations that don’t. Everyone who needs data to do his/her job must drive change to improve data management.
That was the theme of the recent Enterprise Data World (EDWorld) conference this week. This year’s EDWorld event might be the start of a new vitality and influence for the field, marked by introduction of a Leader’s Data Manifesto.
Over the years, data practitioners struggled for recognition and resources within their organizations. In reaction, they often focused on data “train wrecks” that this neglect causes. This year’s conference was no exception. For example: Continue reading →
For complex work, a very simple app requires a very smart user. That point was driven home to me in Tableau Fundamentals class this week. I don’t see that as bad news at all.
Not so long ago I wrote a piece that attempted to inject a bit of reality into the claims then made by some data visualization tool vendors. I cited unexpected challenges that those adopting such tools for their obvious and compelling data presentation abilities might face. The challenges included unexpectedly complex data integration, establishing solid reporting standards and practices, scaling report distribution as demand for the visualizations expands, and the conversion work that can result from version upgrades.
Although a Fundamentals class, the experienced and enthusiastic instructor and the small, intelligent student group combined to make the two days immensely valuable, going far beyond the basics on the program (more on specific lessons learned will appear in an upcoming post). The instructor’s focus on principles rather than recipes drove home this point: in order to effectively use Tableau you have to understand not only how to operate Tableau itself but also the underlying data management, usability, and statistics principles.
Could it be that adopting easy-to-use Tableau in place of, say, SSRS, Cognos, or SAS requires an upgrade in staff knowledge and expertise? Continue reading →
The term “trust” implies absolutes, and that’s a good thing for relationships and art. However, in the business of data management, framing trust in data in true or false terms puts data governance at odds with good practice. A more nuanced view that recognizes the usefulness of not-fully-trusted data can bring vitality and relevance to data governance, and help it drive rather than restrict business results.
The Wikipedia entry — for many a first introduction to data governance — cites Bob Seiner’s definition: “Data governance is the formal execution and enforcement of authority over the management of data and data related assets.” The entry is accurate and useful, but words like “trust”, “financial misstatement”, and “adverse event” lead the reader to focus on the risk management role of governance.
However, the other role of data governance is to help make data available, useful, and understood. That means sometimes making data that’s not fully trusted available and easy to use. Continue reading →
Recently I’ve had the opportunity to dig deeply into the CMMI Data Management Maturity model. Since its release, the DMM model has emerged as the de facto standard data management maturity framework (I’ve listed other frameworks at the end of this post).
I’m deeply impressed by the completeness and polish of the DMM model as a comprehensive catalog of processes required for effective data management. Even after decades in the business the broad scope and business focus of the model changed the way I think about data management.
Over the past year I’ve reviewed what seem like countless plans for enterprise data warehouses. The plans address real problems in the organizations involved: the organization needs better data to recognize trends and react faster to opportunities and challenges; business measures and analyses are unavailable because data in source systems is inconsistent, incomplete, erroneous, or contains current values but no history; and so on.
The plans detail source system data and its integration into a central data hub. But the ones I’m referring to don’t tell how the data will be delivered, or portray a specific vision of how the data is to drive business value. Instead, their business case rests on what I’ll call the “railroad hypothesis”. No one could have predicted how the railroads enabled development of the West, so the improved data infrastructure will create order of magnitude improvements in ability to access, share, and utilize data, from which order of magnitude business benefits will follow.* All too often these plans just build bridges to nowhere. Continue reading →
A quick Google search seems to reveal if you manage People, Process, and Technology you’ve got everything covered. That’s simply not the case. Data is separate and distinct from the things it describes — namely people, processes, and technologies — and organizations must separately and intentionally manage it.
The data management message seems a tough one to deliver effectively. Data management interest groups have hammered at it for years, but a sometimes preachy and jargon laden approach relying on data quality train wreck stories hasn’t generally loosened corporate purse strings. Yes, financial companies’ data-first successes in the 1990s paved the way for the ’00s dot com juggernauts, whose market capitalization stems largely from innovative data management. Yet, we still have huge personal data breaches at some of our most trusted companies, and data scientists spend the bulk of their valuable time acquiring, cleaning, and integrating poorly organized data.
The first steps are often the hardest, so here’s a short, no jargon, big picture guide to getting started with effective data management in three steps:
There’s consensus among data quality experts that, generally speaking data quality is pretty much bad (here, here, and here). Data quality approaches generally focus on profiling, managing, and correcting data after it is already in the system. This makes sense in a data science or warehousing context, which is often where quality problems surface. To quote William McKnight at the first of those sources:
“Data quality is no longer the domain of just the data warehouse. It is accepted as an enterprise responsibility. If we have the tools, experiences, and best practices, why, then, do we continue to struggle with the problem of data quality?”
So if the data quality problem is Garbage In Garbage Out (GIGO), then I would think that it would be easy to find data quality guidelines for app dev, and that those guidelines would be lightweight and helpful to those projects. Based on my research there are few to none such sources (please add them to the comments if you find otherwise).
So, all that said here’s my cut at app dev data quality guidelines by project activity: Continue reading →
As you’ve read on this site and many others, the database world is well into a transition from a relational focus to a focus on non-relational tools. While the relational approach underpins most organizations’ data management cycles, I’d venture to say that all have a big chunk of big data, NoSQL, unstructured data, and more in their five-year plans, and that chunk is what’s getting most of the executive “mind share”, to use the vernacular.
Some are well along the way in their big data learning adventure, but others haven’t started yet. One thing about this IT revolution is that there’s no shortage of highly accessible training options. But several people have complained to me about the sheer quantity of options, not to mention the sheer number of new words the novice needs to learn in order to figure out what the heck big data is.
So here’s a very short list of training options accessible to the IT professional who is a rank big data beginner, starting with a very brief classification of the tools that I hope provides a some context. Continue reading →