Tableau Prep: How to Cleanse Your Data and Prepare It for World Domination Analysis

Data

Tableau Prep: How to Cleanse Your Data and Prepare It for World Domination Analysis

Hello again, data friends. My focus for this blog post will be the variety of formidable data cleansing options available in Tableau Prep (TP for short). I long for the day when data arrives clean – no bogus characters, mismatched naming conventions and or even duplicates. While I’m dreaming, I’d also love to have a private island and an unlimited supply of coffee ice cream. Unfortunately, that isn’t happening, and sets of data will always need massaging and wrangling.

For this (and forthcoming) Tableau Prep how-to blog post(s), we’re going to be looking at some spaceship manufacturing, sales and pricing data. The year is 3015, and I am the Sales Manager for Toyota. Sales for the company have struggled as of late, and I need to dig into both mine and my competitors’ numbers and see just how our models are stacking up in price, sales and specifications. Let’s dig in!

Set Phasers to Cleanse

The year may be 3015, but data management has frozen in time. In this example, I have a variety of .xls, .xlsx and .csv files with similar elements that I need to combine for analysis in Tableau. Before I can build visualizations on these data sources, I need to make sure my disparate data sets contain valid information and can talk to each other. Chances are, since my base files are from a variety of different sources and systems, there are going to be some anomalies that need addressing.

Opening the ship_sales_dataset1.xlsx file, I can already see some potential issues:

XLSX File

Now that I have an idea of what I’m dealing with, I’m going to put this file into a Tableau Prep flow and do some data profiling. First, click the + sign by Connections and add the Excel file to Tableau Prep, then drag the sheet you want out onto the flow if necessary. Once I have our sales data loaded up, if I click to add step I can see that the Profile pane now shows me a nice summary of the fields:

Tableau Prep: Add Step - Profile
Since I know this dataset is from a system prone to human error, one of the first things I’ll do is look for abnormal values. If you’re familiar with your data, like I am with Ship Makers, for example, the anomalies will be easy to spot by simply sorting the desired column by count, and then eye-balling the singular values for errors:

Tableau Prep - Sort by Count Tableau Prep - Single Out Errors

I can see right away that these values should be Ford and Maybach, respectively. Instead of manually grouping and replacing these erroneous values, I’ll let Tableau Prep work its magic and then clean up anything left over. For this field, a common character group and replacement makes the most sense since any “bad” fields are likely a result of bad data entry or concatenation:

Tableau Prep - Common Characters

After I run the common character group and replace cleanse, I can scan through the results and see what Tableau Prep was able to fix for me. Here we can see that GMCC was automatically re-mapped to GMC (good), but it was unable to combine FordGMC into Ford like I expected:

Tableau Prep - Remapping

That’s easy to fix; however, I can simply manually group FordGMC into Ford! Once you’re done making changes, clicking Done will essentially commit your groupings. As you can see, Ford and GMC are fixed and there is no trace of the incorrect values. Our distinct ShipMaker count is now 62, reflecting the fix made to the inaccurate fields:

Tableau Prep - Updated Count

Moving on, I know my boss prefers to see ShipMaker in all caps. This is an easy fix in Tableau Prep, simply click on specific field’s drop-down and in the Clean sub-option, selecting Make Uppercase will adjust your values as such:

Tableau Prep - Make Uppercase

Now that I’m happy with my ShipMaker field, I’m going to go through similar steps on the ShipModel field. Since there are so many distinct values (537), I’m going to tell Tableau Prep to take a pass at common character grouping and replacing. Scrolling through the results (changes identified by the paper clip), I can see some wanted adjustments, like this one to Avalon:

Tabelau Prep - COmmon Character Grouping and Replacing

There are some groupings that I think are incorrect or am not sure of just yet, like this one, so to revert I’ll simply uncheck the 330 and remove it from the grouping:

Tableau Prep - Incorrect Groupings

Side note: If you go a little too fast, like me, you can easily revert any committed adjustments with an undo command, or by opening up the Changes tab and removing the unwanted alteration by clicking on the corresponding X:

Tableau Prep - Group and Replace

Moving on to ShipCode, I know this field is supposed to be in an alpha-numeric format with a three-letter prefix and eight-number suffix, e.g. ABC-12345678. Keeping that in mind, I’m going to sort this field alphabetically, so any incorrect prefixes will stick out, like these:

Tableau Prep - Sorting Alphabetically

To fix, I’ll manually adjust with a right-click and Edit Value:

Tableau Prep - Edit Value

My boss also hates hyphens, so I will go ahead and Clean off the punctuation as well:

Tableau Prep - Remove Punctuation

Continuing with our cleansing crusade, we come to the MfgLocation field. I’m not familiar with all of the cities, but I do know these numerical values are wrong. I’m not sure what they should be, so I’m just going to turn these into nulls:

Tableau Prep - MfgLocation Field

This field is a good candidate for a pronunciation-based group and replace, so I’ll have Tableau Prep work its magic again and review results for possible consolidation:

Tableau Prep - Pronunciation

I’ll have to check with the regional specialists on some of these spellings, but it’s possible that these could be data entry errors. I’ll remove the ones I’m not sure of and commit the others. If I’m wrong in my assumptions, no worry – I’ll simply revert the changes via the changelog:

Tableau Prep - Group and Replace

Now that I’ve got a clean set of values to work with, I can move forward with my data preparation and transformation. Here’s more information on the available sub-options under Clean in Tableau Prep. All of these are very useful for quick and painless transformations:

  • Make Uppercase: Change all values to uppercase text.
  • Make Lowercase: Change all values to lowercase text.
  • Remove Letters: Remove all letters and leave only other characters.
  • Remove Numbers: Remove all numbers and leave letters and other characters.
  • Remove Punctuation: Remove all punctuation.
  • Trim Spaces: Remove leading and trailing spaces.

 

What’s Next in Tableau Prep?

In my next posts, I’ll show you how easily Tableau Prep can transform your disparate data sources through joins, pivots, unions, aggregation and much more! In the meantime, please reach out to us today and let us know if you have a specific question or want more information on Tableau Prep or any of our partner products.

More About the Author

Spencer Parker

Data Engineer
Tableau Prep: How to Union and Join Your Data to Infinity and Beyond As you may recall from my previous post, the year is 3015. As the spaceship sales manager for Toyota, I’m gathering data from the ...
Tableau Prep: How to Cleanse Your Data and Prepare It for World Domination Analysis Hello again, data friends. My focus for this blog post will be the variety of formidable data cleansing options available in Tableau ...

See more from this author →

InterWorks uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy. Review Policy OK

×

Interworks GmbH
Ratinger Straße 9
40213 Düsseldorf
Germany
Geschäftsführer: Mel Stephenson

Kontaktaufnahme: markus@interworks.eu
Telefon: +49 (0)211 5408 5301

Amtsgericht Düsseldorf HRB 79752
UstldNr: DE 313 353 072

×

Love our blog? You should see our emails. Sign up for our newsletter!