In some presentations, I assert that top-down data modeling should result in not only a business-consistent model but also a pretty well normalized model.
One of the basic concepts behind normalization is functional dependency. In layperson’s terms, functional dependency means separating entities from each other and putting attributes into the obviously correct entity. For example, a business person knows that item color doesn’t belong in the order table because it describes the item, not the order. Everyone knows that the order isn’t green!
If functional dependency “just makes sense”, then the steps to go from an unnormalized list of fields to a normalized relational database, as described here, should also make sense from a business perspective.
Enter the puppy poster: published by the now bygone magazine Database Programming and Design, it was ubiquitous on 1990s IT cubicle walls. It presented each normalization step using business examples from dog breeding. (Some reading today might associate this with less savory aspects of pet breeding and sale. I share the opinion that the poster lacked sensitivity to this question).
Following the steps on the poster, you can see that they do make sense from a business perspective:
First Normal Form: Eliminate Repeating Groups
- The table before taking this step clearly includes attributes of three different business objects, puppies, kennels, and tricks, so from a business point of view we would identify them as separate entities.
Second Normal Form: Eliminate Redundant Data
- Here we have a situation like the example above where we were tempted to call the order green rather than the item. This table tells us about a puppy’s tricks, so it has a key which includes both the trick id and the puppy number. The name of the trick is only about the trick, regardless of which puppy can do it, so clearly from a business perspective we would say that the trick name is an attribute of a trick, regardless of which puppy had learned it.
Third Normal Form: Eliminate Columns Not Dependent on Key
- Just as in the First Normal Form example, we have attributes of two different entities jammed into the same table. The key is puppy number, but the table includes three attributes of kennels. From a business point of view they just don’t belong there.
Fourth Normal Form: Isolate Independent Multiple Relationships
- At this point, the headings start to become more complex, but the business questions remain simple. The example shows a puppy tricks table, with attributes describing where the puppy learned the trick, the puppy’s skill level on this trick, and the costume the puppy wears when doing the trick. But wait – the puppy can do other tricks in that costume, and can surely do this trick in other costumes. Therefore, clearly, costume isn’t a business attribute of a puppy’s trick, so it belongs somewhere else.
Fifth Normal Form: Isolate Semantically Related Multiple Relationships
- Now this name is a real mouthful, but again the initial table doesn’t make sense in business terms for the same reasons the examples above didn’t. The example table shows breeders and breeds in a table keyed by kennel number. There are three attributes of three different business objects crammed into a single table. I think a business person would know immediately that “breeder name of kennel” and “breed name of kennel” sound fishy. Clearly, kennel number is an attribute of kennel, breeder (name, I assume) of breeder, and breed name of breed, so they just don’t go together.
Of course the puppy poster isn’t an exhaustive treatment of normalization; a data model based on business analysis may not always reach fifth normal form. It will however be a “pretty good” normalization, a solid requirements guide, and help avoid green orders in the database design.