Two Design Principles for Tableau Data Sources

It’s not unusual for talented teams of business analysts to find themselves maintaining significant inventories of Tableau dashboards. In addition to sound development practices, following two key principles in data source design help these teams spend less time in maintenance and focus more on building new visualizations: publishing Tableau data sources separately from workbooks and waiting until the last opportunity to join dimension and fact data.

Imagine a business team — let’s call it Marketing Analytics — with read-only access to a Hadoop store or an enterprise data warehouse. They gain approval for Tableau licenses and Tableau Server publication rights for five tech-savvy data analysts. After a few initial successes with some impactful visualizations, the team gathers steam. After a while the team finds itself supporting scores of published workbooks serving a few hundred managers and executives. In spite of generally sound practices, Marketing Analytics struggles to maintain consistency from one Tableau workbook to another.

For example, at one point the imagined company introduced a new “customer value index (CVI)”. Two months later, in the midst of adding that new field to the 30 relevant workbooks, managers requested a revision to CVI and three other calculations shown on the team’s dashboards. Marketing Analytics suspends new development for two sprints to focus only on updating these fields across all reports.

Best practice for Tableau teams supporting enterprise analytics and reporting is to draw data from a well curated dimensional database. With that architecture, CVI would be a column in the Customer dimension table, and a CVI revision would require only a relatively quick database logic fix.

However, there are teams, like the fictional Marketing Analytics, that support significant Tableau complexes accessing large, lightly designed data stores. There are different ways for such a team to pull source data efficiently, like using Tableau Prep or Alteryx. But regardless of whether a team uses a data prep tool or custom queries in Tableau data sources, workbook developers should stick with two rules of the road to increase maintainability and reduce time to market for revisions and corrections.

Publish Data Sources and Workbooks Separately

Tableau provides the ability to publish data sources independently of workbooks. Using the CVI example, when Marketing Analytics separately publishes workbooks and data sources, then they can change the CVI calc without the risk of inadvertently altering visualizations. Likewise, the team could change a viz without dragging the entire data source down from the server and thereby slowing down the change cycle. In my experience, large data sources embedded in workbooks have a tendency to unexpectedly reload themselves in Tableau Desktop, delaying development by tying up the workbook for up to 30 minutes or more while displaying a “Processing Request” message box.

(For those using Tableau data sources without a data prep tool, publishing separately enables many workbooks to share a single data source. However, in my experience different visualizations have unique requirements such that shared data sources haven’t made much sense. Another reasonable expectation is that teams might assemble data for a dashboard by blending data sources, and thereby achieve reusability of single-subject data sources. But in the cases I’ve seen, challenges associated with data blending make it more practical to tailor a custom data source.)

Separate Dimensions From Facts

Inasmuch as is possible, modularize and keep separate collection of dimension data and fact data, waiting to join them together at the last possible opportunity. In dimensional data terms, fact data describes an event or transaction, like a store purchase, and dimension data represents reference or master data that describes the event. For example, in the statement “a patient in a hospital at a given time with a medical condition who is attended by a medical professional”, the fact might be the patient encounter, and the associated dimensions might include the date and time, medical professional, medical condition, and facility (hospital).

Some dimensions, like the date/time calendar, change very little, if ever. However, other dimensions tend to change much more frequently than facts. Think of changes to region realignments, the company organization chart, or item price. Separation of dimensions from facts, and dimensions from other dimensions for that matter, helps reduce complexity and risk of error and increase speed of corrections and revisions.

For example, consider a Tableau Prep flow that gathers sales transactions from three different point of sale systems, and joins it with sales territories, marketing campaigns, and product data. The flow should gather, clean, and integrate sales transactions first, and afterwards bring in territory, campaign, and product data. That way, when the dimensional data changes there are fewer steps that require changes in the Tableau Prep flow.

For those building data sources without a data prep tool, Tableau offers cross database joins. Using this capability, a data source can have multiple database connections to different data stores or to the same database. So the developer can write separate queries for sales transactions, territories, campaigns, and products and join them together in the Tableau data source. Even though maintenance involves opening up the entire data source, the the developer can independently maintain the different fact and dimension queries.

Whatever tool Tableau teams use to gather data, publishing workbooks and data sources separately and joining facts and dimensions as late as possible will help reduce maintenance effort and enable them to focus more on adding business value.