Logical data modeling is one of my tools of choice in business analysis and requirements definition. That’s not particularly unusual – the BABOK (Business Analysis Body of Knowledge) recognizes the Entity-Relationship Diagram (ERD) as a business analysis tool, and for many organizations it’s a non-optional part of requirements document templates.
In practice, however, data models in requirements packages often include many-to-many relationships. I’ve heard experienced data modelers advocate this practice, and it unfortunately seems consistent with the “just enough, just in time” approach associated with agile culture.
In my experience unresolved M:M relationships indicate equally unresolved business questions. The result: schedule delays and budget overruns as missed requirements are built back in to the design, or the familiar “that’s not what we wanted” reaction during User Acceptance Testing (UAT).
What’s a Many-to-many Relationship?
An ERD depicts entities, objects, events of interest, and the relationships among them.
Relationships between entities are said to have “cardinality”, meaning how many instances of each entity* participate in the relationship. In the case of employees and dependents, one employee may have many dependents (spouse and children, typically), so there is a one-to-many relationship between the entities Employee and Dependent.
On the other hand, the relationship between Employee and Project might be more complex. If an employee only ever works on one project at a time, then there’s a 1:M relationship between Project and Employee. However, in most organizations at least some employees are shared among many projects, resulting in a M:M relationship. Figure 1 shows the Project and Employee entities, with a relationship labeled “Works on/Team includes”, where:
- An Employee works on many Projects
- A Project team includes many Employees.
The simple diagram shown might be a fragment from an ERD in a business analysis document that includes a requirement something like “the system shall enable assignment of many employees to a project.”
Relational database management systems can’t house M:M relationships, so before converting a data model to a database in Access, SQL Server, Teradata, or other relational DBMS, the data modeler must convert the M:M into something else. It turns out that there’s a straight translation that works perfectly. As shown in Figure 2, the M:M between employee and project is logically equivalent to two one-to-many relationships among employee, project, and a new entity that we’ll rather obviously call Assignment. We call this “resolving” the M:M relationship.
Resolving M:Ms Uncovers Business Rules
The middle entity that resolves a many-to-many is typically called an “association” or “intersection” entity, implying that it is just an index connecting the other two entities. But more often than not resolving a M:M reveals unexpected complexity of great interest to the business.
Using our example, if a requirements document left the employee to project M:M unresolved it would have hidden interesting characteristics of the assignment like start date, end date, and expected hours. In my line of work, billing rate would be another really important data element. Uncovering the Assignment entity by resolving the M:M would enable definition of important business rules, like the relationship between an employee’s skill level with his or her assignment billing rate.
Not Resolving the M:Ms is Risky Business
It is unlikely that an app dev effort where managing projects is a core concern would leave out detailed study of employee assignments. Why, then, would analysts leave out the Assignment entity, and miss the opportunity to identify and define the Assignment attributes in the requirements?
In our example, if requirements analysts had left in the M:M and as a result left out the rule relating billing rates and skill levels, the need for that rule might show up as missing during UAT, disappointing the business customer, delaying delivery, and burning unexpected funds.
In conclusion, just as the ERD helps “describe the concepts relevant to a domain, the relationships between those concepts, and information associated with them”, resolving the ERD’s M:Ms improves accuracy and depth of business requirements, and therefore helps the project deliver the right product on time and within budget.
* The more proper term here would be “entity type” rather than “entity”, where an entity (say, this blog post) would be an instance of the entity type (blog post), but most modelers in the business settings I frequent use the terms entity and instance rather than entity type and entity.
Pingback: Data Modeling Quality | Kevin's Blog