As important as it is, data modeling has always had a geeky, faintly impractical tinge to some. I’ve seen application development projects proceed with a suboptimal, “good enough”, model. The resulting systems might otherwise be well-architected, but sometimes strange vulnerabilities emerge that track directly to data design flaws.
Recently I saw an example where a “good enough” data design, similar to the one pictured, enabled a significant application bug.
One common theme in recent tectonic shifts in information technology is data management. Analyzing customer responses may require combing through unstructured emails and tweets. Timely analysis of web interactions may demand a big data solution. Deployment of data visualization tools to users may dictate redesign of warehouses and marts. The data architect is a key player in harnessing and capitalizing on new data technologies. Continue reading →
In some presentations, I assert that top-down data modeling should result in not only a business-consistent model but also a pretty well normalized model.
One of the basic concepts behind normalization is functional dependency. In layperson’s terms, functional dependency means separating entities from each other and putting attributes into the obviously correct entity. For example, a business person knows that item color doesn’t belong in the order table because it describes the item, not the order. Everyone knows that the order isn’t green! Continue reading →
Recently I was in a conversation about data modeling standards. I confess that I’m not really the standards type. I understand the value of standards and especially how important it is to follow them so others can interpret and use work products. It is just that I prefer to focus on understanding of the principles behind the standards. In general, it seems to me that following standards is trivial for someone who understand the principles, but impossible for someone who doesn’t. But there doesn’t seem to be general understanding of data modeling principles. Continue reading →
As a relational database professional I couldn’t help but feel like something would be lost with the emergence of the new Big Data/NoSQL database management systems (DBMS). After about two years of buzz around the topic, I’m really excited about the emerging possibilities. However, I’m pretty sure we’ll miss the relational model’s strengths in requirements definition and conceptual design. Continue reading →
QlikTech’s QlikView reporting and analysis tool is among a new class of Business Intelligence (BI) software tools. As Ben Harden reported in a recent blog post, BI vendors like SAP, Microsoft, and IBM have traditionally sold “to the IT enterprise, but companies like QlikTech and Tableau are targeting the business and bypassing IT. Their tools are quicker to stand up, more intuitive and don’t need the configuration, support, and hardware that the bigger players require.”
A Quick Overview
At first look QlikView is fairly accessible to those experienced with BI tools. A “.qvw” QlikView file contains three classes of user-facing components: a script-based data integration language that runs when the user requests a “reload”, a data modeling component that looks deceptively like a relational data modeling tool, and a familiar array of data visualizations: graphics, charts, lists, etc.
In the past I’ve never understood what people really mean they say “think outside the box” but Jim Harris, in a recent OCDQ blog post, helped me figure it out.
Mr. Harris ends with this provocative line: “the bottom line is Google and Facebook have socialized data in order to capitalize data as a true corporate asset.” The post starts with a cold war analogy and proceeds to describe how Facebook and Google have made big money as “internet advertising agencies:” offering free services with which users (like us) serve up personal data in return for use of the service, then selling advertising space based on our data (hopefully anonymized).
I’m a data modeler, so I enjoyed Jonathon Geiger’s recent article entitled “Why Does Data Modeling Take So Long”. But why does he say it like it’s a bad thing?
Mr. Geiger’s bottom line is exactly right: “Most of the time spent developing data models is consumed developing or clarifying the requirements and business rules and ensuring that the data structure can be populated by the existing data sources.” On the projects he describes, no one took time before modeling to determine available data sources and identify business entities of interest, relationships among them, and attributes that describe them before database design started, so the data modeler had to do it.
I’ve often thought that conceptual data modeling was an underused tool in the arsenal available to requirements analysts, and in a recent conversation I found that many were surprised that it would be used in the requirements phase at all. Checking the Business Analysis Body of Knowledge (BABOK) I found data modeling listed among the tools available to requirements analysts to “to describe the concepts relevant to a domain, the relationships between those concepts, and information associated with them.” There’s also Steve Hoberman’s excellent book on the topic, Data Modeling for the Business, an introduction to data modeling aimed at a business audience. Continue reading →
Here are some quick notes for those looking to run the Metadata prototype:
The prototype metadata database includes SQL Server 2008 data definition language and data manipulation language (DDL and DML) needed to create the database and populate it with tables and columns from Microsoft’s AdventureWorksDW sample database. It also includes a sample requirements spreadsheet and source-to-target map, and SSIS jobs to load the spreadsheets to corresponding metadata tables. These define fictional requirements and mappings to populate the AdventureWorksDW FACTInternetSales table from tables in the AdventureWorks sample database.