Logical data modeling is one of my tools of choice in business analysis and requirements definition. That’s not particularly unusual – the BABOK (Business Analysis Body of Knowledge) recognizes the Entity-Relationship Diagram (ERD) as a business analysis tool, and for many organizations it’s a non-optional part of requirements document templates.
In practice, however, data models in requirements packages often include many-to-many relationships. I’ve heard experienced data modelers advocate this practice, and it unfortunately seems consistent with the “just enough, just in time” approach associated with agile culture.
In my experience unresolved M:M relationships indicate equally unresolved business questions. The result: schedule delays and budget overruns as missed requirements are built back in to the design, or the familiar “that’s not what we wanted” reaction during User Acceptance Testing (UAT). Continue reading →
At the very first TDWI Conference, Duane Hufford described a phenomenon he called “embedded data”, now more commonly called “overloaded data”, where two or more concepts are stuffed into a single data field (“Metadata Repositories,” TDWI Conference 1995). He described and portrayed in graphics three types of overloaded data. Almost 20 years later, overloaded data remains rampant but Mr Hufford’s ideas, presented below with updated examples, are unfortunately not widely discussed.
Overloaded data breeds in areas not exposed to sound data management techniques for one reason or the other. Big data acquisition typically loads data uncleansed, shifting the burden of unpacking overloaded fields to the receiver (pity the poor data scientist spending 70% of her time acquiring and cleaning data!)
One might refer to non-overloaded data as “atomic”. Beyond making data harder to use, overloaded data requires more code to manage than atomic data (see why in the sections below) so by extension it increases IT costs.
Here’s a field guide to three different types of overloaded data, associated risks, and how to avoid them: Continue reading →
I had pondered writing a post called “Requirements Decay” about how requirements don’t last forever. In my research I found that such a post, complete with “my” words “requirements decay” and “requirements half-life”, had already been done comprehensively here. In a compact argument underpinned by half-life mathematics, the anonymous author proposes that a requirement isn’t likely to stand unchanged forever and explores the implications.
For me, requirements decay is an idea that helps us think realistically about project planning and improves our chances of meeting business needs. Continue reading →
Recently, I posted “Interview with a Data Scientist” at my company’s blog site. In it, my friend and colleague Yan Li answers four questions about being a data scientist and what it takes to become one. In my view Yan’s responses provide a bracing reminder that data science is something truly new, but that it rests on universal principles of application development. Continue reading →
Recently there was a great post at Dzone recounting how one “tech savvy startup” moved away from its NoSQL database management system to a relational one. The writer, Matt Butcher, plays out the reasons under these main points:
Application developers and business people accessing relational databases need data dictionaries in order to properly load or query a database. The data dictionary provides a source of information about the model for those without model access, including entity/table and attribute/column definitions, datatypes, primary keys, relationships among tables, and so on. The data dictionary also provides data modelers with a useful cross reference that improves modeling productivity.
It is particularly useful for the dictionary to be a filterable/sortable Excel document, but out of the box ERwin, one of the leading data modeling tools, includes a notably inflexible reporting capability. Luckily, it is possible to directly query the ERwin “metamodel”. However, I found the ERwin documentation a bit hard to decipher and not quite accurate. Hopefully this post will save modelers some steps in figuring out how to query the metamodel.
Asking fact questions in technical interviews is like eating a donut, feels great at the time but not so satisfying later.
Let’s say the interview consists of facts like this “softball question”: “What is the default port number for SQL Server?” The linked list of questions is a really good high level study guide for a SQL Server student. If a SQL Server developer candidate answers all correctly, then the interviewer can be confident that the candidate knows a lot about SQL Server.
However, few development jobs require only technical fact knowledge. Typically, developers must apply creativity when working with unclear or poorly expressed requirements under tight schedules. They must be versatile so that they can take on unforeseen roles in case of resignations or transfers of team members. If you make an investment in an individual by hiring her or him, you’ll look for a return in the form of professional development as the individual grows their skills.
So how do you test creativity, versatility, and ability to learn, while still gauging raw technical talent? My method is to ask opinion rather than fact questions. Continue reading →
Recently I read an editorial about job interviews. It was breezy and funny, but not very helpful. Given that millions are out there looking for work, I want to help by giving my perspective on how to “win” the interview.
I do a lot of interviewing, from both sides of the desk. As a consultant I am interviewed by clients. As one of many technical and behavioral interviewers for my employer, I talk with candidates about their skills, goals, and fit with our business.
Of course, winning the interview may not get you the job. An interview is just one part of a many step process. Getting a job involves showing you have the skills, establishing mutual fit, coming to terms on salary, and standing out versus the competition. This post is only about how to do well in the interview.
Assuming you’re qualified for the job, you can set up a good interview experience by applying the right mental model, preparing well, and interacting effectively during the conversation. Continue reading →
The data integration process is traditionally thought of in three steps: extract, transform, and load (ETL). Putting aside the often-discussed order of their execution, “extract” is pulling data out of a source system, “transform” means validating the source data and converting it to the desired standard (e.g. yards to meters), and load means storing the data at the destination.
An additional step, data “enrichment”, has recently emerged, offering significant improvement in business value of integrated data. Applying it effectively requires a foundation of sound data management practices. Continue reading →
I believe that early, effective big picture diagrams are key to application development project success. According to the old saw, no project succeeds without a catchy acronym. Maybe so, but I’d say no project succeeds without a good big picture diagram. The question: what constitutes a good one? To me good high-level diagrams have four key characteristics: they are simple, precise, expressive, and correct.