It’s no secret that Tableau Prep is a game changer for those looking to clean, explore and connect their data sources together. There are myriad join options, from your traditional left, right and inner joins to more advanced options such as Left Unmatched Only, which will include only rows that exist on the left data source. But one join type eluded me—the self join. So when my student, Greg, asked me for a good example of a self join, I knew why I would do it but not how.
Enter Harry Potter.
Here’s what I am trying to accomplish: I have a list of all the Harry Potter films, the year of their release and what their sequel is:
Using the Self Join in Tableau Prep
I want a new data source that will show me a few new columns per movie: what is the sequel’s release date, and what is the sequel’s sequel? For example, for Harry Potter and the Sorcerer’s Stone, I should still see the release date of 2001, the sequel should still be Harry Potter and the Chamber of Secrets, but I also want a column for the release date of 2002 for Chamber of Secrets, and for Prisoner of Azkaban as the sequel to Chamber of Secrets.
In order to get that data source, I can do a self join in Tableau Prep (note that this could also be achieved in Tableau Desktop). While this isn’t a pre-defined join option, I can join my Sheet 1 to my Sheet 1. What’s interesting is my join clause, and this took a bit of experimentation.
Finding the Right Join Clause
The challenge I encountered was, how do I join my two instances of the same sheet together in order to get the information I wanted? I needed to make sure that I was pulling in each sequel’s release date as well as its own sequel, information contained in the three columns of my original data source:
If I join my two worksheets with Sequel=Movie, I can then obtain information for that sequel’s release date, as well as the sequel’s sequel:
Of course, I will have one result excluded. As the first film in the series, Harry Potter and the Sorcerer’s Stone is not a sequel. I can discard this, add a clean step and voila! I have all the data I need:
Self joins are a great alternative to what might have been a very tedious copy-paste situation. Thank you, Greg, for the challenge!