In this series, Tableau Zen Master Dan Murray takes a closer look at the Tableau Prep Builder and Conductor tools, their power and scope and how they stack up against other ETL tools.
The last post in this series showed you how to use the Wildcard union and the Manual union to join four years of sales data that was stored in four different worksheets. In this post, you’ll see how the cleaning step can be used to do the following:
- Change field-data roles
- Group and replace members of a specific field set
- Change the data types of a field and rename fields
All this can be done by using the Profile Cards within the Profile Pane in Tableau Prep Builder.
Using the Clean Step in Tableau Prep
The cleaning step behaves a lot like the data interpreter in Tableau Desktop, but it adds additional visualizations in the Profile Pane and the field Profile Cards that help you understand the shape and contents of each field. In addition, clicking on elements within the Profile Cards highlights related details in the other field cards. I find myself using the cleaning step frequently just to examine the results of prior steps.
Adding a Join Step
The Superstore dataset also includes another worksheet containing the names of Regional Managers. Regional Managers are assigned groups of States. We’ll use a join to add that information to the flow:
Now that the sales data from the Superstore tables have been cleaned up, we will bring some public data from the Census Bureau so that we can enhance our sales data by normalizing sales for the population by state.
Adding the Census Data into the Flow
I’ve been using Census data for many years. It’s useful when you want to account for the population density in analyses. In the example we’re building, this data will ultimately be used to express the sales by state in a way that accounts for the population density of each geography.
Using the Pivot Step and Tableau Prep Builder
The Census data isn’t perfectly formatted. We’ll use Prep Builder to fix the structural problems in that dataset:
In the video, I chose to do most of the field clean-up in the pivot step. I could have performed the same cleaning operations in the cleaning step that I added after the pivot. If the work you’re doing is going to be utilized only by use, fewer steps may save you time. If you work with a team of people who are new to Prep Builder, adding more steps to segregate individual cleaning operations may make your flow easier for others to understand. There aren’t hard and fast rules.
This workflow now includes two different data sources and six tables. You’ve seen two different ways to create a union; you’ve seen a join step and a pivot step; and you’ve learned about different ways you can use the cleaning step to improve the formatting and consistency of the data in the workflow. My colleague Katie wrote a blog post that takes a closer look at splitting and pivoting your data, so read it if you need more in-depth insights into those steps. For further information on cleansing your data, look at my colleague Spencer‘s blog on the topic.
In the next post in this series, we’re going to join the Superstore data to the Census data. Because these two data sources are not aggregated in the same way, we’ll be presented with a challenge that we’ll address with an aggregate step.