As you may recall from my previous post, the year is 3015. As the spaceship sales manager for Toyota, I’m gathering data from the marketplace to analyze it, identify trends and eventually make pricing decisions based on my findings. We’ve been struggling to keep up with other manufacturers as of late, and it’s time to reverse the trend. They just don’t make spaceships like they used to.
I’ve been able to scrounge up planetary sales data from four different sources; Earth, Mars, Earth’s Moon, Io and data from the Andromeda galaxy. While my sales data is coming from different parts of the galaxy, it’s in a standardized format. Additionally, I have some data that lives in SQL Server containing customer information and demographics as well as various spaceship specifications.
Using Tableau Prep, I’ll show you how to combine all these disparate data sources into something easily digested by Tableau Desktop. If you need a refresher on joining data, Zac Heacker wrote a great blog on Combining Data with Joins. Let’s begin!
State of the Union
I’m going to start by opening Tableau Prep and connecting to my .xlsx sales files. For my analysis, I don’t want these datasets to be separated; since these four sources are identical in number of fields, field names and data types, I can union them into one set of data. Far out!
Here are my four standardized sales files waiting in the Flow Pane – all of them have TransactionID, CustomerID, ShipCode, SalePrice and SaleDate:
To create the union, I’ll simply click on the + sign next to one of my data sources and select Add Union. You can see data profiling is available on the Union step, which is tremendously helpful:
To add the other data sources to the new union, drag them onto the Union step and drop them on the +Add as shown here:
Once I’ve added all four data sets to the union, the Profile Pane shows what I’ve included:
Victory! I have now successfully combined the four standardized sales files into one set of data and any new connections to this Union step will treat it as such. Next, I need to join this unioned data to the SQL tables containing customer information and spaceship specifications.
… I know Join-Fu
After connecting to the SQL database, I’ll drop the customer_info and spaceship_mfg_info tables onto my Flow Pane. Notice the Custom SQL option here; you can craft custom SQL to retrieve data if necessary, but I just need these tables in their existing form:
I’ll be joining customer_info and spaceship_mfg_info to the Union data set sales_1 (just created) by way of their respective foreign keys (CustomerID and ShipCode). Do this by dragging and dropping from the Flow Pane onto the New Join side of sales_1:
Looking at the Join Panel in the Profile Pane, Tableau Prep smartly recognizes the join will be on Customer_ID. As expected, the data resulting from the join is now available for review in the Profile Pane, and the column headers are color-coded to match their source in the Flow Pane:
Since I’m just supplementing customer information to the sales data where it’s available, I’m going to adjust my inner join to a left join. Tableau Prep continues to be fantastically helpful with the visual aids and data profiling, even on something as seemingly mundane as applying a join.
Now, I want to join spaceship_mfg_info to this freshly joined sales and customer data. To accomplish this, I’ll drop spaceship_mfg_info onto the Join step I just made as another New Join:
Once again, Tableau Prep identifies the correct key field to join on (ShipCode). The Profile Pane shows me there were 15 ShipCodes that were not involved in any sales, and therefore will be excluded from the Join Results. I can even click on the Excluded values to see them profiled, showing exactly which ships were not found:
Bonus time: Doing a cursory review of my final set of combined data, I noticed the SalePrice and SaleDate columns must have been set to text in the Excel files (so much for standardized format!). I want to change these into decimal and date respectively, so I can use them more efficiently in Tableau Desktop. Both will be easy fixes in Tableau Prep and will require no backtracking because I can simply transform the fields from the “final” Join step itself!
For SaleDate, I’ll just click on the data type indicator within the Profile Pane, select date and “viola!” – a Clean step is added with the appropriate transformation:
To fix SalePrice, I’ll try an Automatic Split – to my delight, the $ symbol has been stripped out and my newly created value is in decimal format. Then, I’ll just delete the “bad” field and rename the split value accordingly. Now we’re cooking with lasers!
I have now successfully combined my four .xlsx sales files with the two SQL tables containing customer information and spaceship specifications! Adding an output step at the end, here’s the current state of my Flow. From here, I can create a Tableau Data Extract, .csv output file or even publish a data source to Tableau Server:
That’s all for now! Stay tuned for more Tableau Prep fun, and please reach out to us today if you have any questions or want to learn more about how InterWorks can help you master your data.