In my experience, some BI projects ultimately finish as a success, but exceed budget and schedule targets and fall short of functional goals along the way. On projects like this, somewhere in the midst of report development, things get sticky and tasks fall behind schedule as the team runs into unexpected complexities.
There are many reasons that reporting projects can fall behind. They can fall into the traps common to any application development project. Widely cited studies (two linked here) say two thirds of application projects suffer due to poor planning, lack of executive support, and inadequate requirements definition. But I’ve seen a different pattern on reporting projects that can cause them to fall short of schedule, budget, and functional targets even when proper planning, support, and requirements are in place.
Well, maybe not the requirements part. The challenge with requirements for reporting is that accurate estimates depend on in-depth understanding of source data. Quite often, data design in the source system is somehow adequate for the operational purpose but difficult to use for reporting. Planners typically set reporting project schedules when high level requirements are fleshed out but significant research on source data remains to be done. Then, the unknowns in those source data details come back to bite the project; the schedule starts slipping and the team finds that some objectives are unreasonable given limitations in source data.
Here are just a few common examples of unexpected source data problems that I’ve experienced on reporting projects:
- Source system data retention differences: Operational systems like payroll, purchasing, inventory, or accounts receivable tend to maintain just sufficient historical data to support the time horizon required for the operational process. If a reporting system needs, say, 10 years history, but its three source systems only maintain records for 7, 5, and 3 years respectively, then the need for 10 years history will go unmet.
- Inconsistencies in historical data: Even if historical data in a given source system goes back X years, there’s no guarantee that it is consistent year to year. Changes over time in organizational policy or government regulation can mean that the same source data element can change its meaning over time. Think about those footnotes in annual reports explaining why a given measure jumps X percent in a given quarter.
- Codes and qualifiers that aren’t mutually exclusive: Sometimes codes and qualifiers in operational systems become complex over time in a way that makes them difficult to use in reporting. I was auditing a database for a large and prosperous company and it turned out that in one circumstance a given column could be the name of a nurse and in another case the same column held the age of a pet. In another situation a “transaction type” had values that likewise meant different things in different contexts, but also sometimes values “1” or “3” meant the same thing and sometimes they didn’t.
- Summary data recorded as the system of record that is inconsistent with its supporting detail: I’ve seen cases where summary data is sent to the general ledger, but reporting requires additional data beyond the GL transaction. However, when you add up the amounts of the detailed transactions they don’t balance to the summary total.
As I recall these types of problems aren’t usually bad enough to derail most reporting projects, but they are almost always unexpected and left out of original estimates. The problem is they require unexpected labor on the part of the reporting team: research to understand the extent and nature of the problem, then development and testing of code to overcome the problem, then explanation of the time lost and, sometimes, functionality that can’t be delivered.
The reader with data management experience will recognize that data integration is a discipline unto itself that is best part of a data warehousing or datamart development effort, and these problems are best managed as part of data integration. I couldn’t agree more, but reporting projects unsupported by warehouses and marts will persist. It is important that reporting estimates reflect the risk of the unknown in source databases, or even better defer the estimate until source data is understood in all its gory detail.