This blog post is Human-Centered Content: Written by humans for humans.
Recently, I have been involved in working with larger datasets. Anyone who has done extensive data preparation with datasets at scale, in order to get the data cleaned and shaped for Tableau visualization purposes, will soon realize the importance of a workbook performance budget.
When I’m thinking of performance on my Tableau Workbook, I will count the costs of every point and click within the system. And mostly, I’m using Tableau Desktop because it allows for more complex data transformation and shortcuts to managing default properties across my fields.
Overall, I want to touch base on a few quick hacks that might help you buy back time for both your end users (avoiding the spinning wheel of death on a dashboard) and for yourself in keeping things quick and expeditious for ongoing maintenance, change requests, and workbook updates.
There are lots of useful resources and checklists that you can find online as far as identifying the “whole” picture of your workbook performance budget. I hand-picked a few of those to shine a light on because they are ones that can offer you the most impact in the shortest amount of time.
Avoid High Mark Counts
But how?
(This isn’t a new topic for us, but it’s important enough to reiterate before sharing some additional tips)
As a developer, you will be able to see certain components from each of your worksheets that are available for your eyes only. These are referred to as the Summary Card. This card allows you to see various summary statistics about the data in the view, including the total count of marks.
Why is that important?
A high mark count will naturally lead to slowness, as each mark represents a data point that needs to be rendered on the worksheet. In order to avoid a high mark count, but still give your end users a big picture, here are some options you might think through in order to create a win-win scenario.
Best Practice: Consider using tables only when they’re needed. Now, there are lots of times when stakeholders think tables are absolutely necessary. So, you could still give them this as additional detail on demand, but don’t try to sandwich a table into a dashboard unnecessarily.
Instead, you can utilize a dashboard action that launches the user, based on a field selection from a sheet on your primary dashboard, to a separate (i.e. secondary) dashboard where the full (possibly even filtered) table of data exists. You can then provide a navigation button to bring the user back to the overview dashboard. This will keep things speedy.
If you’re asked to keep all of the content contained into a single dashboard, then consider using a dashboard filter action that has the table set to be invisible (exclude all values). This could require a forced selection in order to show relevant data from the table, avoiding the raw data and high mark count altogether.
The table shown above only has to process a mark count of 6,000 versus the original mark count of the entire table, which was nearly 107,000. As the first category selection is made (by clicking the Technology header/bar), the table then appears in the dashboard with a much-more limited mark count, and then additional filters can be used from there.
Ensuring that a table is not displayed upon initial load of the dashboard is the key takeaway, which can provide a massive reduction in a dashboard’s overall load time.
Best Practice: Consider using sets, too. This might require an extra conversation upfront, but I would argue that sets are the best way to get the most important data from a table to the top. I don’t think stakeholders will explore all of the already reduced 6,000 marks. Maybe they would prefer the top or bottom 10, 25 or 100 instead?
In the filtered view above, you can see selections made for both Technology and Phones. Then, from our table, we’re getting the 10 most profitable orders. Bonus: Grand Totals are shown at the top by using the Analysis tab from the workbook menu. Also, the scatterplot is helping shine a light on outliers, good and bad, for products based on their aggregated sales and profit values.
No delays here at all.
The Makings of a Beautiful Relationship
In Tableau, don’t discount the defaults. What I mean by this is that Tableau’s defaults are oftentimes also the most performant ways of accomplishing a desired result, up to and including your data model inside of your Tableau workbook.
There are exceptions such as hierarchies that I choose to avoid at all costs.
Best Practice: Consider data relationship rather than joins. In Tableau, when you connect to your data source, the connection window defaults to the logical layer. This is where you can “relate” (or spaghetti noodle) two or more tables together using common fields, aka identifiers.
Now, joins are still possible using the “Open” option (getting into the physical layer) for a data table from the workbook connection screen. And if big, denormalized tables is the only possibility for you given your required mode of operations, then getting your data right might require extra work (click here for troubleshooting tips). Joins are known to behave badly, due to issues of data captured at different granularity levels, leading to overcounting or duplication of data.
Messy data makes the analysis messy too!
In the 2020.2 version of Tableau, that all changed with the permanent inclusion of Tableau data Relationships. In essence, Tableau simply pushed the “join” behavior from the data source level down into the visualization layer, if you’d like to kind of think of how the tables are working together toward a common goal.
So now, in a perfect data model world, you would have an anchor table in the form of a fact table that you pull into your connection window first, in the midst of the logical layer. Then, you’d have neatly organized dimension tables to offer additional context that go along with the captured data from your fact table. This efficient model is referred to as star schema.
Why is this better?
The biggest impact that this has is making sure that data is only being processed when it’s needed for a specific visualization to be computed. For example, if I have one fact table and two dimension tables, but I only build a sheet using fields from my fact table and a single dimension table, then my other dimension table lays dormant and get ignored.
It’s the same sort of concept of using “Hide All Unused Fields” once you complete your workbook. Those hidden fields are ignored, saving storage space and computing power. In the old world of Tableau versions 2019 and prior, joins, unions and blends were all that you had access to.
But then everything changed for the better with relationships. Especially if you have data at different levels of granularity, Tableau handles this with ease.
Here are a few parting tips that will help you achieve workbook performance, as you are developing your sheets and dashboards.
Best Practice: Use fixed dashboard sizing, so Tableau can stash this in its memory for quicker response times. And I even would go the extra mile of making sure that all of your objects are housed in neatly structured containers, down to the exact pixel of your chosen Dashboard size. Thank me later. To confirm: Size does matter here!
Best Practice: Use “Count” over “Count Distinct.” Now, you might be telling me that your single table doesn’t allow for that. I’ll argue that maybe it could, though. If you used SQL to pack the data together, or used joins for the added context, try using “Relationship,” keeping the context separate, and then a “Count” might easily be possible with the built in “Count of Records” metric that Tableau gratuitously gives, avoiding a “Count Distinct.”
Also, if you adhere to the advice above, and that leads to a very large single dimension table connected to your similar, if not larger, fact table — then have no fear. Try using the Dimension table as the “anchor” and see if that leads to improvement. You can test this out with the Performance Recorder feature, and actually see improvement results measured in seconds/milliseconds.
Best Practice: In terms of data tables, Cardinality and Referential Integrity both have a “say” in the overall performance budget of your workbook too.
If all of your values in your dimension table are unique then you will relate to your fact table as a many to one relationship. Similarly, if you know that all of your data matches in both tables that are being related, then use All Records match.
But a word of caution, more specifically on the latter: I’d leave the default in place (Some Records Match) and then adjust only as needed for workbook performance triage. Because if you aren’t sure, then you could have missing data and that leads to bad analysis and that leads to finger pointing and that leads to… You get the picture.
Thanks for reading! And I hope this helps with some of the ways you can work more efficiently with Tableau in the new year.