I recently found myself in a series of conversations in which I needed to make a case for dimensional data modeling. The discussions involved a group of highly skilled data architects who were surely familiar with dimensional techniques but didn’t see them as the best solution in the case at hand.
I thought it would be easy to find a quick, jargon free summary of best reporting database design principles aimed at a technical audience. There were a number of good summaries (cited at the end of this post), but none pitched just right for this highly-technical-but-outside-the-data-warehouse-world crowd.
I wanted to raise the dimensional model because, for most business reporting scenarios, it not only delivers on reporting needs, but also helps report developers handle changes to those needs as a side effect of the design.
So these are the notes I prepared for the conversation. They helped us all get on the same page, hopefully they will be useful to others: Continue reading →
Even though it happens annually, teams building new visualizations often forget to think about the effects of turning over from one year to another.
In today’s fast paced, Agile world, requirements for even the most critical dashboards and visualizations tend to evolve, and development often proceeds iteratively from a scratchpad sketch through successively more detailed versions to release of a “1.0” production version. Organized analytics teams evolve dashboards within a process framework that include checkpoints ensuring standards are met for security, reliability, usability, and so on.
A reporting team can build a revolutionary analytics capability enabling unprecedented visibility into operations, and then, if year turnover isn’t included in requirements, experience embarrassing errors and usability challenges in the January after initial deployment. In effect, the system experiences its own Y2.xK crisis, not too different from the expected Y2K crisis 16 years ago. Continue reading →
It’s fashionable today to talk about the risks of authoritarianism in the political sphere. I’m not going to speculate on that, but such talk got me thinking about the same tendencies among IT project leaders. What is an authoritarian personality? (Yes, that’s actually a thing.) Is it truly antithetical to a healthy project? If so, how can you screen for it in hiring?
Recently, ArsTechnica ran an article that offers a survey of research on authoritarian personalities conducted since the 1940s. The bottom line for us is that those with authoritarian tendencies more often Continue reading →
I’ve written about groupthink-related quality challenges on Agile projects, and the architect’s role in preventing groupthink from degrading quality. I’ve seen other risks related to the cohesion and potential insularity of successful Agile teams, and the architect is also well positioned to help prevent these: a tendency to neglect setting up and documenting repeatable processes, and a similar tendency not to share of knowledge and lessons learned outside the Agile team. Continue reading →
There’s an unfortunate and rather rude saying about assumptions that I’ve found popular among IT folks I’ve worked with. I say unfortunate because, to me, assumptions that are recognized early and handled the right way are a key to successful projects. Technical players who use assumptions well can help set projects on the right path long before they go astray.
Sometimes on waterfall and hybrid projects technical players are asked to estimate work early, before requirements are complete. My instinctive reaction is not to provide an ungrounded estimate, but that’s not helpful. The way to handle this uncomfortable uncertainty is to fill out the unknowns with assumptions: detailed, realistic statements that provide grounding for your estimate. Continue reading →
There’s consensus among data quality experts that, generally speaking data quality is pretty much bad (here, here, and here). Data quality approaches generally focus on profiling, managing, and correcting data after it is already in the system. This makes sense in a data science or warehousing context, which is often where quality problems surface. To quote William McKnight at the first of those sources:
“Data quality is no longer the domain of just the data warehouse. It is accepted as an enterprise responsibility. If we have the tools, experiences, and best practices, why, then, do we continue to struggle with the problem of data quality?”
So if the data quality problem is Garbage In Garbage Out (GIGO), then I would think that it would be easy to find data quality guidelines for app dev, and that those guidelines would be lightweight and helpful to those projects. Based on my research there are few to none such sources (please add them to the comments if you find otherwise).
So, all that said here’s my cut at app dev data quality guidelines by project activity: Continue reading →
Logical data modeling is one of my tools of choice in business analysis and requirements definition. That’s not particularly unusual – the BABOK (Business Analysis Body of Knowledge) recognizes the Entity-Relationship Diagram (ERD) as a business analysis tool, and for many organizations it’s a non-optional part of requirements document templates.
In practice, however, data models in requirements packages often include many-to-many relationships. I’ve heard experienced data modelers advocate this practice, and it unfortunately seems consistent with the “just enough, just in time” approach associated with agile culture.
In my experience unresolved M:M relationships indicate equally unresolved business questions. The result: schedule delays and budget overruns as missed requirements are built back in to the design, or the familiar “that’s not what we wanted” reaction during User Acceptance Testing (UAT). Continue reading →
At the very first TDWI Conference, Duane Hufford described a phenomenon he called “embedded data”, now more commonly called “overloaded data”, where two or more concepts are stuffed into a single data field (“Metadata Repositories,” TDWI Conference 1995). He described and portrayed in graphics three types of overloaded data. Almost 20 years later, overloaded data remains rampant but Mr Hufford’s ideas, presented below with updated examples, are unfortunately not widely discussed.
Overloaded data breeds in areas not exposed to sound data management techniques for one reason or the other. Big data acquisition typically loads data uncleansed, shifting the burden of unpacking overloaded fields to the receiver (pity the poor data scientist spending 70% of her time acquiring and cleaning data!)
One might refer to non-overloaded data as “atomic”. Beyond making data harder to use, overloaded data requires more code to manage than atomic data (see why in the sections below) so by extension it increases IT costs.
Here’s a field guide to three different types of overloaded data, associated risks, and how to avoid them: Continue reading →
I had pondered writing a post called “Requirements Decay” about how requirements don’t last forever. In my research I found that such a post, complete with “my” words “requirements decay” and “requirements half-life”, had already been done comprehensively here. In a compact argument underpinned by half-life mathematics, the anonymous author proposes that a requirement isn’t likely to stand unchanged forever and explores the implications.
For me, requirements decay is an idea that helps us think realistically about project planning and improves our chances of meeting business needs. Continue reading →