There’s consensus among data quality experts that, generally speaking data quality is pretty much bad (here, here, and here). Data quality approaches generally focus on profiling, managing, and correcting data after it is already in the system. This makes sense in a data science or warehousing context, which is often where quality problems surface. To quote William McKnight at the first of those sources:
“Data quality is no longer the domain of just the data warehouse. It is accepted as an enterprise responsibility. If we have the tools, experiences, and best practices, why, then, do we continue to struggle with the problem of data quality?”
So if the data quality problem is Garbage In Garbage Out (GIGO), then I would think that it would be easy to find data quality guidelines for app dev, and that those guidelines would be lightweight and helpful to those projects. Based on my research there are few to none such sources (please add them to the comments if you find otherwise).
So, all that said here’s my cut at app dev data quality guidelines by project activity: Continue reading →
Logical data modeling is one of my tools of choice in business analysis and requirements definition. That’s not particularly unusual – the BABOK (Business Analysis Body of Knowledge) recognizes the Entity-Relationship Diagram (ERD) as a business analysis tool, and for many organizations it’s a non-optional part of requirements document templates.
In practice, however, data models in requirements packages often include many-to-many relationships. I’ve heard experienced data modelers advocate this practice, and it unfortunately seems consistent with the “just enough, just in time” approach associated with agile culture.
In my experience unresolved M:M relationships indicate equally unresolved business questions. The result: schedule delays and budget overruns as missed requirements are built back in to the design, or the familiar “that’s not what we wanted” reaction during User Acceptance Testing (UAT). Continue reading →
I had pondered writing a post called “Requirements Decay” about how requirements don’t last forever. In my research I found that such a post, complete with “my” words “requirements decay” and “requirements half-life”, had already been done comprehensively here. In a compact argument underpinned by half-life mathematics, the anonymous author proposes that a requirement isn’t likely to stand unchanged forever and explores the implications.
For me, requirements decay is an idea that helps us think realistically about project planning and improves our chances of meeting business needs. Continue reading →
A technique for reporting requirements has emerged as the de facto standard in the business intelligence community. The technique, which emerged in the mid-2000s, is new enough to be as yet unacknowledged by the requirements analysis powers that be. David Loshin describes how it works in this 2007 post:
Start with a business question about how to monitor a business process using a metric, like “How many widgets have been shipped by size each week by warehouse?” Continue reading →
In some presentations, I assert that top-down data modeling should result in not only a business-consistent model but also a pretty well normalized model.
One of the basic concepts behind normalization is functional dependency. In layperson’s terms, functional dependency means separating entities from each other and putting attributes into the obviously correct entity. For example, a business person knows that item color doesn’t belong in the order table because it describes the item, not the order. Everyone knows that the order isn’t green! Continue reading →
Recently I was in a conversation about data modeling standards. I confess that I’m not really the standards type. I understand the value of standards and especially how important it is to follow them so others can interpret and use work products. It is just that I prefer to focus on understanding of the principles behind the standards. In general, it seems to me that following standards is trivial for someone who understand the principles, but impossible for someone who doesn’t. But there doesn’t seem to be general understanding of data modeling principles. Continue reading →
In my experience, some BI projects ultimately finish as a success, but exceed budget and schedule targets and fall short of functional goals along the way. On projects like this, somewhere in the midst of report development, things get sticky and tasks fall behind schedule as the team runs into unexpected complexities. Continue reading →
As a relational database professional I couldn’t help but feel like something would be lost with the emergence of the new Big Data/NoSQL database management systems (DBMS). After about two years of buzz around the topic, I’m really excited about the emerging possibilities. However, I’m pretty sure we’ll miss the relational model’s strengths in requirements definition and conceptual design. Continue reading →
I’m a data modeler, so I enjoyed Jonathon Geiger’s recent article entitled “Why Does Data Modeling Take So Long”. But why does he say it like it’s a bad thing?
Mr. Geiger’s bottom line is exactly right: “Most of the time spent developing data models is consumed developing or clarifying the requirements and business rules and ensuring that the data structure can be populated by the existing data sources.” On the projects he describes, no one took time before modeling to determine available data sources and identify business entities of interest, relationships among them, and attributes that describe them before database design started, so the data modeler had to do it.