There’s consensus among data quality experts that, generally speaking data quality is pretty much bad (here, here, and here). Data quality approaches generally focus on profiling, managing, and correcting data after it is already in the system. This makes sense in a data science or warehousing context, which is often where quality problems surface. To quote William McKnight at the first of those sources:
“Data quality is no longer the domain of just the data warehouse. It is accepted as an enterprise responsibility. If we have the tools, experiences, and best practices, why, then, do we continue to struggle with the problem of data quality?”
So if the data quality problem is Garbage In Garbage Out (GIGO), then I would think that it would be easy to find data quality guidelines for app dev, and that those guidelines would be lightweight and helpful to those projects. Based on my research there are few to none such sources (please add them to the comments if you find otherwise).
So, all that said here’s my cut at app dev data quality guidelines by project activity: Continue reading →
Logical data modeling is one of my tools of choice in business analysis and requirements definition. That’s not particularly unusual – the BABOK (Business Analysis Body of Knowledge) recognizes the Entity-Relationship Diagram (ERD) as a business analysis tool, and for many organizations it’s a non-optional part of requirements document templates.
In practice, however, data models in requirements packages often include many-to-many relationships. I’ve heard experienced data modelers advocate this practice, and it unfortunately seems consistent with the “just enough, just in time” approach associated with agile culture.
In my experience unresolved M:M relationships indicate equally unresolved business questions. The result: schedule delays and budget overruns as missed requirements are built back in to the design, or the familiar “that’s not what we wanted” reaction during User Acceptance Testing (UAT). Continue reading →
I recently stumbled upon one of The Martin Agency’s hilarious Geico caveman ads and wondered, rather geekily, why they didn’t do one about data analysis. I think if a caveman suddenly arrived in the 2010s he or she would see parallels between his life and the activities of today’s knowledge worker. When I thought it through, it seemed obvious that knowledge workers need to be more like farmers and less like hunter/gatherers if they want to achieve the full potential of business intelligence.
Recently I was in a conversation about data modeling standards. I confess that I’m not really the standards type. I understand the value of standards and especially how important it is to follow them so others can interpret and use work products. It is just that I prefer to focus on understanding of the principles behind the standards. In general, it seems to me that following standards is trivial for someone who understand the principles, but impossible for someone who doesn’t. But there doesn’t seem to be general understanding of data modeling principles. Continue reading →
In the past I’ve never understood what people really mean they say “think outside the box” but Jim Harris, in a recent OCDQ blog post, helped me figure it out.
Mr. Harris ends with this provocative line: “the bottom line is Google and Facebook have socialized data in order to capitalize data as a true corporate asset.” The post starts with a cold war analogy and proceeds to describe how Facebook and Google have made big money as “internet advertising agencies:” offering free services with which users (like us) serve up personal data in return for use of the service, then selling advertising space based on our data (hopefully anonymized).
I’m a data modeler, so I enjoyed Jonathon Geiger’s recent article entitled “Why Does Data Modeling Take So Long”. But why does he say it like it’s a bad thing?
Mr. Geiger’s bottom line is exactly right: “Most of the time spent developing data models is consumed developing or clarifying the requirements and business rules and ensuring that the data structure can be populated by the existing data sources.” On the projects he describes, no one took time before modeling to determine available data sources and identify business entities of interest, relationships among them, and attributes that describe them before database design started, so the data modeler had to do it.
One of the key skills needed in today’s IT shop is communication, and one of the best ways to improve ability to communicate is to write blog posts and articles.
In spite of “IT guy” stereotypes, communication and analytical thinking about business are among the most important skills in application development. Developers, analysts, and managers require ability to interact effectively with business people, to conceptualize solutions that match business needs, critically evaluate those solutions, and effectively make the case for one of them. Of course this is true of the overall project business case, but more importantly it applies to the daily “IT guy” to business person conversations that happen throughout analysis, design, development, and testing. Continue reading →
Recently my friend Mark Hudson posted about the inappropriateness of the term “sprint” for an agile project phase, preferring the cycling term “interval.” That post really struck a chord with me.
As a rugby union fan and former wing/fullback I’ve always thought the whole rugby analogy was wrong. Agile development is continuous and fluid, yet the agile originators chose the word “scrum” for its daily standup meetings. In rugby union a scrum is a set play resulting from a minor penalty, like offside in American football or a foot fault in tennis. If you like the rugby analogy the right term would have been “ruck,” which is kind of like a scrum but part of the continuous run of play (in the other kind of rugby, called rugby league, the scrum has devolved into an almost meaningless stylized ritual – which I guess happens on some agile projects). Continue reading →
I’ve often thought that conceptual data modeling was an underused tool in the arsenal available to requirements analysts, and in a recent conversation I found that many were surprised that it would be used in the requirements phase at all. Checking the Business Analysis Body of Knowledge (BABOK) I found data modeling listed among the tools available to requirements analysts to “to describe the concepts relevant to a domain, the relationships between those concepts, and information associated with them.” There’s also Steve Hoberman’s excellent book on the topic, Data Modeling for the Business, an introduction to data modeling aimed at a business audience. Continue reading →
Here are some quick notes for those looking to run the Metadata prototype:
The prototype metadata database includes SQL Server 2008 data definition language and data manipulation language (DDL and DML) needed to create the database and populate it with tables and columns from Microsoft’s AdventureWorksDW sample database. It also includes a sample requirements spreadsheet and source-to-target map, and SSIS jobs to load the spreadsheets to corresponding metadata tables. These define fictional requirements and mappings to populate the AdventureWorksDW FACTInternetSales table from tables in the AdventureWorks sample database.