In this series, Tableau Zen Master Dan Murray takes a closer look at the Tableau Prep Builder and Conductor tools, their power and scope and how they stack up against other ETL tools.
In the previous three posts in this series, we used Tableau Prep Builder to create a workflow to join six tables from two data sources. Our flow contains 11 steps. The challenge presented with the data we have from the Census comes from the different levels of detail (LOD) our sales data has—specific transactions every day for the last four years—and the Census data, which includes population data aggregated at the State-level for the same four years. The Census data is much less detailed.
This is a common issue with financial data, for example, because budget data tends to be less detailed than the transaction-level details you can obtain from business systems. So, we need to use an aggregate step to reduce the LOD of the Superstore sales data so that it’s the same as the Census data.
Using the Aggregate Step in Tableau Prep
The Census data provides population data by State and Year. We’ll use an aggregate step to reduce the detail in the Superstore data to match the Census data.
After performing the aggregation of the Superstore data, we can now get a resulting set that combines sales and population data for every state. The Census data actually includes information for Alaska and Hawaii, for which we don’t have any sales. That’s fine. Delaware is also mismatched in 2018 because there were no sales in the Superstore data that year. By using the aggregate step, we’ve now brought both data sources together and achieved a consistent LOD.
Creating a Per-Capita Sales Calculation
The primary reason I wanted to bring in the Census data was to normalize the sales by state for population. Superstore is anonymized retail sales information. While it’s useful to know that California has the most sales in this data, how does Vermont compare to California when you take into account the population differences between states?
Saving Your Tableau Prep Workflow
At the end of the last video, I showed you how to add an output step, save a workflow and then run the workflow to generate a Hyper output file that we’ll use to build a visualization of the data. Saving the workflow enables you to reuse it later. Running the workflow generates the Hyper data.
Cleaning up the Workflow Visualization
We now have a workflow that consists of 18 steps. Let’s step back for a moment and enhance the information in our workflow so that other people can more easily follow its logic:
The video shows how you can change step colors and add comments, making each step easier for others to follow and understand. I didn’t rename any of them. You may also experiment with the ways you can use color, descriptions and step renaming to make your flows easier for other people to track with.
In the next post in this series, I’m going to build a visualization using the dataset we created in the workflow.