Hello again, data friends. My focus for this blog post will be the variety of formidable data cleansing options available in Tableau Prep (TP for short). I long for the day when data arrives clean – no bogus characters, mismatched naming conventions and or even duplicates. While I’m dreaming, I’d also love to have a private island and an unlimited supply of coffee ice cream. Unfortunately, that isn’t happening, and sets of data will always need massaging and wrangling.
For this (and forthcoming) Tableau Prep how-to blog post(s), we’re going to be looking at some spaceship manufacturing, sales and pricing data. The year is 3015, and I am the Sales Manager for Toyota. Sales for the company have struggled as of late, and I need to dig into both mine and my competitors’ numbers and see just how our models are stacking up in price, sales and specifications. Let’s dig in!
Set Phasers to Cleanse
The year may be 3015, but data management has frozen in time. In this example, I have a variety of .xls, .xlsx and .csv files with similar elements that I need to combine for analysis in Tableau. Before I can build visualizations on these data sources, I need to make sure my disparate data sets contain valid information and can talk to each other. Chances are, since my base files are from a variety of different sources and systems, there are going to be some anomalies that need addressing.
Opening the ship_sales_dataset1.xlsx file, I can already see some potential issues:
Now that I have an idea of what I’m dealing with, I’m going to put this file into a Tableau Prep flow and do some data profiling. First, click the + sign by Connections and add the Excel file to Tableau Prep, then drag the sheet you want out onto the flow if necessary. Once I have our sales data loaded up, if I click to add step I can see that the Profile pane now shows me a nice summary of the fields:
Since I know this dataset is from a system prone to human error, one of the first things I’ll do is look for abnormal values. If you’re familiar with your data, like I am with Ship Makers, for example, the anomalies will be easy to spot by simply sorting the desired column by count, and then eye-balling the singular values for errors:
I can see right away that these values should be Ford and Maybach, respectively. Instead of manually grouping and replacing these erroneous values, I’ll let Tableau Prep work its magic and then clean up anything left over. For this field, a common character group and replacement makes the most sense since any “bad” fields are likely a result of bad data entry or concatenation:
After I run the common character group and replace cleanse, I can scan through the results and see what Tableau Prep was able to fix for me. Here we can see that GMCC was automatically re-mapped to GMC (good), but it was unable to combine FordGMC into Ford like I expected:
That’s easy to fix; however, I can simply manually group FordGMC into Ford! Once you’re done making changes, clicking Done will essentially commit your groupings. As you can see, Ford and GMC are fixed and there is no trace of the incorrect values. Our distinct ShipMaker count is now 62, reflecting the fix made to the inaccurate fields:
Moving on, I know my boss prefers to see ShipMaker in all caps. This is an easy fix in Tableau Prep, simply click on specific field’s drop-down and in the Clean sub-option, selecting Make Uppercase will adjust your values as such:
Now that I’m happy with my ShipMaker field, I’m going to go through similar steps on the ShipModel field. Since there are so many distinct values (537), I’m going to tell Tableau Prep to take a pass at common character grouping and replacing. Scrolling through the results (changes identified by the paper clip), I can see some wanted adjustments, like this one to Avalon:
There are some groupings that I think are incorrect or am not sure of just yet, like this one, so to revert I’ll simply uncheck the 330 and remove it from the grouping:
Side note: If you go a little too fast, like me, you can easily revert any committed adjustments with an undo command, or by opening up the Changes tab and removing the unwanted alteration by clicking on the corresponding X:
Moving on to ShipCode, I know this field is supposed to be in an alpha-numeric format with a three-letter prefix and eight-number suffix, e.g. ABC-12345678. Keeping that in mind, I’m going to sort this field alphabetically, so any incorrect prefixes will stick out, like these:
To fix, I’ll manually adjust with a right-click and Edit Value:
My boss also hates hyphens, so I will go ahead and Clean off the punctuation as well:
Continuing with our cleansing crusade, we come to the MfgLocation field. I’m not familiar with all of the cities, but I do know these numerical values are wrong. I’m not sure what they should be, so I’m just going to turn these into nulls:
This field is a good candidate for a pronunciation-based group and replace, so I’ll have Tableau Prep work its magic again and review results for possible consolidation:
I’ll have to check with the regional specialists on some of these spellings, but it’s possible that these could be data entry errors. I’ll remove the ones I’m not sure of and commit the others. If I’m wrong in my assumptions, no worry – I’ll simply revert the changes via the changelog:
Now that I’ve got a clean set of values to work with, I can move forward with my data preparation and transformation. Here’s more information on the available sub-options under Clean in Tableau Prep. All of these are very useful for quick and painless transformations:
- Make Uppercase: Change all values to uppercase text.
- Make Lowercase: Change all values to lowercase text.
- Remove Letters: Remove all letters and leave only other characters.
- Remove Numbers: Remove all numbers and leave letters and other characters.
- Remove Punctuation: Remove all punctuation.
- Trim Spaces: Remove leading and trailing spaces.
What’s Next in Tableau Prep?
In my next posts, I’ll show you how easily Tableau Prep can transform your disparate data sources through joins, pivots, unions, aggregation and much more! In the meantime, please reach out to us today and let us know if you have a specific question or want more information on Tableau Prep or any of our partner products.