Even now the business case for a metadata tool seems unclear and difficult to quantify, but it isn’t impossible.
We in the data management business tend to devalue solutions that don’t clearly derive from a coherent top-level view. We seek applications defined from an enterprise architecture, database designs from an enterprise data model, and data elements consistent with the enterprise business glossary.
However, sometimes tactical gains make sense even when the big picture is missing, and tactical successes of metadata for analytics teams can raise consciousness that helps set the stage for evolving data management improvements.
A metadata tool is a software solution that
- Extracts data elements, along with their definition and usage information, from diverse database management, reporting, and app dev environments
- Integrate that data into a single metadata database
- Enable pre-packaged and ad hoc reporting and analysis from the metadata database
This page presents a comparison of some leading metadata tool examples.
There are two use cases critical to the metadata tool business case from an analytics perspective: data lineage and impact analysis.
Data lineage provides information on how a data element was generated. For example, an executive dashboard of the fictitious sports retailer The Gut Check might present a figure for total sales. A report from the metadata tool might tell us that total sales is the sum of sales for the east and west regions, where the east region total is the sum of individual sales from the corporate point-of-sale system, while the west region total includes data loaded from the POS system but also totals keyed manually for the recently merged Fabulous Sports Palace, whose POS system is yet to be integrated.
Solid data lineage enables impact analysis, the second key metadata capability for analytics. The Gut Check might have a complex formula for expected customer retention based on data from a loyalty card application and recent customer purchases. Over its portfolio of 500 Tableau reports that formula might be used in 30 or so reports, but which ones? If The Gut Check wants to revise the way it calculates customer retention, then data analysts have to comb through each of the reports to see if they present customer retention or use it in calculations.
(OK, yes, important calculations used in many reports should be done in the data layer, not in the presentation layer. However, in many organizations data analysts on the business side have read-only database rights, and the long turnaround time for IT projects means that many key business metrics are calculated at the reporting level. At least in Tableau, reporting tool architecture means that the same calculation code must be rewritten for each report, resulting in risks of subtle differences among copies and the challenge of finding each instance if a change is needed, which we’re discussing here.)
So, we base the business case on these assumptions:
- Reporting and metadata functions:
- The metadata application assembles a metadata database from the reporting tool and its data sources. A typical scenario might include Tableau workbooks and data sources; SQL Server, Oracle, or Teradata; Excel, and various NoSQL databases like Mongo DB or Cassandra, for example.
- Metadata reporting supports data lineage and impact analysis.
- Key stakeholders from report consumer groups participate in defining data elements of interest
- Rudimentary “out of the box” metadata reports are good enough to support data lineage and impact analysis needs.
- Team and report inventory:
- Team of 10 data analysts supporting a user community of 2000 named users
- Report inventory includes 500 Tableau workbooks supported by 500 Tableau data sources (assume most data sources support a single workbook, and the number of data sources that support multiple workbooks is offset by data sources provided to advanced users for ad hoc query).
- It takes 20 analyst-hours to develop and test the average data source/workbook pair, and analysts spend two thirds of their time on tasks directly related to report development.
- One analyst hour costs $200 fully loaded (salary/benefits or billing rate plus office space and other facilities)
With these assumptions, we can estimate opportunities created and data analyst time saved by a metadata tool:
- Data Lineage Inquiries: Assume one user per week has a question about derivation of a measure or dimension, and that it takes one analyst-hour to answer each question:
- Total data lineage inquiry cost per year is 52 analyst hours times $200, or $10,400 per year
- Opportunity cost of data lineage inquiries is $10,400 divided by 20 analyst hours per report, divided by $200 per analyst hour, or 2.6 new reports that can’t be developed due to analyst time lost
- Impact Analysis: Assume five times a year analysts must alter calculation of a key measure or dimension used in an average of 10 reports.
- Identification of affected reports, assuming it involves review of 30 workbooks at ½ hour each, costs $3000 each time, or $15,000 for the five annual revisions of important measures
- Opportunity cost of manual impact analysis is $15,000 divided by 20 analyst hours per report, divided by $200 per analyst hour, or 3.8 new reports that can’t be developed due to analyst time lost
So, with these assumptions, addition of a metadata tool to an analytics team increases productivity by about a percent (0.85%), adding back 128 analyst hours to the annual development workload of 15,000 hours. By this logic, the team has about $25,600 per year to spend on a metadata tool (128 hours times the $200 per analyst-hour).
Is that enough for a convincing business case? Based on an informal conversation with a leading tool vendor staffer, and speaking very roughly, cost for a tool to support this team would be somewhere around $50,000. At that price it isn’t a slam dunk, but it can work if you can:
- Make the financial case in a way that clearly shows our year benefits after initial installation
- Successfully negotiate with IT about who supports the tool, where to house it (cloud versus in-house, dedicated server versus existing, etc), and so on
- Successfully present the less tangible benefits, including growth in the community of analysts using the tool, development of a metadata portal for analytics users, potential presentation of key measures and related metadata to customers and suppliers, and so on.
While well-organized metadata provides an order of magnitude shift in business efficiency and effectiveness, a well-organized business case grounded in tangible near term benefits helps make that order of magnitude shift possible.