Sometimes success seems like a data analytics team’s worst enemy. A few successful visualizations packaged up into a dashboard by a small skunkworks team can generate interest such that a year later the team has published scores of mission critical dashboards. As their use spreads throughout the organization, and as features expand to meet the needs of an expanding user base, the dashboards can slow down and data refreshes fail as they exceed database and analytics tool time and resource limits.
There are steps teams can take to deal with such slowdowns. Analytics tool vendors typically offer efficiency guides, like this one, that help resolve dashboard response time issues. A frequent recommendation is for the dashboard to use summary tables rather than full detail, reducing the amount of data that the dashboard has to parse as the user waits for a viz to render.*
Summary tables also help resolve data refresh timeouts, but their long term success for the team depends on the foundation on which they are built and how they are organized. The most obvious approach is to build custom summaries serving each dashboard. While report-specific tables stand out as a quick win, analysis shows they are a suboptimal solution because they tend to (1) reduce ability to respond to requirements evolution, and (2) make metrics in different dashboards less consistent.
Summary Tables Require a Dimensional Foundation
The latter is especially true if the underlying database is normalized or unstructured. When that’s the case, report-specific logic can include code integrating data from many source tables. Code varies in different dashboards that collect different measures and group data differently, making it difficult to show the same value for the same measure on different tables.
Therefore, the first foundational step to better dashboard performance is to develop a detailed dimensional reporting database, organized by business objects and events. The dimensional foundation dramatically reduces data integration code needed to build the report-focused summary table, but still doesn’t eliminate it.
Summary Tables Should Reflect Business Objects and Events
A better approach is to develop summary tables representing business events and objects. Such a table in a retail setting, for example, might summarize customers-without-orders measures by location by month. Although tables like this still require integration with dimensions and other facts at the dashboard level, refresh performance improves due to reduced row count, dashboards are easier to change as requirements change, and are more consistent since more business logic is common across different dashboards.
Here’s how I play out the comparison between report-specific and object/event based summaries:
- To improve reporting performance by building an aggregate table schema that reduces the quantity of data needed, and thereby increases speed, of report queries.
- A dimensional base schema exists where underlying detail tables each record a single event or business object (like a single sale or customer)
- Key relations among base tables are consistent (for example a sale event includes a customer key that identifies a single customer record)
- “The aggregate data schema must always provide the same results as the base schema.”
- “The [non-surrogate key] attributes of each aggregate table must be a subset of those from a base schema table.”
Report-Specific Summary Tables
- In response to performance problems with specific reports, build tables tailored to each report to speed data load.
- Base tables feed Aggregate tables as needed, feeds are 1:1 with reports
Event/Object Summary Tables
- Develop summary tables based on business events without regard to which reports use them. For example, build a table summarizing customers-without-orders measures by location by month.
- An aggregate table typically feeds from a single base fact table. A report might draw from two or more aggregate tables linked via dimensions.
Pros and Cons
|Report-Specific Tables||Event/Object Summary Tables|
|Report data refresh performance||Good||Good|
|Flexibility as report requirements change||Poor||Good|
|Supports consistency of data in different reports||Poor||Good|
Report-Specific Tables provide more efficient single sources of data for any given dashboard. But given that logic is hard-coded in the data layer and, if controlled by the typical corporate IT team, suffers a three-month change cycle, this solution is inflexible and unresponsive to requirements changes. In addition, data load logic customized for each dashboard would likely mean different dashboards would show different results due to timing and varying grouping logic.
Event/Object Summary Tables still require integration SQL at the dashboard level, so this option is somewhat less performant than Report-Specific Tables. However, dashboards don’t have to build up from detail so load times improve. Since these tables serve the business model rather than specific reports, they’re less subject to requirements changes and better support consistent results across different dashboards.
*This post leaves aside server concerns like configuration or number of users, and doesn’t recommend that database administrators relax resource governors. On the first point, if the organization applies sufficient resources, modern analytics tools provide ways to diagnose and correct server-level issues. On the second, again assuming standard resourcing and management practices, you most likely have sufficient database resources.
**Adamson, Christopher, Mastering Data Warehouse Aggregates, John Wiley & Sons, Inc., 2006, pp 27-8.