Now, let’s look into the second part of the Performance Options: Referential Integrity. What a daunting phrase! Sounds scary, but it’s actually simpler than the Cardinality.
With this option, we tell Tableau whether it should expect values to have a match in the other table or not. Let’s have a look at our example from before. We connected our posts with our comments, but there were two posts that actually didn’t get any comments:
The items in our right table all have a match in the other table. Every Post_ID on the right is connected to a Post_ID on the left. All records match.
But the items in our left table do not match, at least not all of them. Two don’t have a connection to the right table and, as such, only some records match.
This option is basically the difference between Inner Joins and Outer Joins. And while I am saying this, I am also giving you a word of caution: No, we don’t set up joins here. We are limiting them. The performance options have influence on how Tableau is joining the tables later on, but we don’t set up any join. With Relationships, the join type is determined by the fields we drag into the view.
Joining tables means merging them in a way that columns are appended. We have a table on the left (like the posts in our example) and another table on the right (our comments). What we want to do is this: Build a new table out of the two that has all the data we need so we don’t have to use the two separate tables anymore.
To make that work, we cannot just put both tables next to each other. If we did, the data lines of both tables wouldn’t match, unless we have the rare case that both tables have the same granularity, same size and are ordered exactly the same. If it were rows A, B, C and D in the left table and A, C, B and D in the right table, the two in the middle would not match. We need to force these two tables together, so that the letters match up. This is, what joining does.
We select a specific field – a key – that appears in both tables and is an identifier (in our example, the Post_ID from the posts table and the Post_ID from the comments table), that tells to which post each comment belongs. Once we do that, all rows from the left table get attached to the rows of the right table wherever the Post_IDs are matching up.
Now, you may ask the question: What happens when there are rows that don’t have a match? And what happens when I have Post_IDs that have several matches in the comments table?
Enter the join variations. There are quite a few of them, of which Tableau officially uses only four. The Venn diagram is the most common icon for joins, and how this diagram is filled tells us what kind of join it is. This, for example, is a Left Join:
It indicates that it takes all the rows from the left table and only the rows from the right table that have a match in the left one. If I had any comments that had a weird Post_ID and wouldn’t appear in the posts table, then these comments would not be part of the join.
For a thorough blog article about joins, have a look here.
Join Limiting via Referential Integrity
When we change the settings for Referential Integrity, we limit which joins can happen in a view:
This option can help Tableau to filter before joining. When we say ALL RECORDS MATCH, but there are records in there that don’t match, then we’ll mostly lose these records. This is like what would happen with an Inner Join, where not matching items are discarded. There are two reasons why I added the word mostly here:
- It may happen that data is not lost but fetched across tables, although it should not be. You may see additional rows in the end and wonder where they might be coming from. This seems to be because Tableau is considering the relationship fields from two tables as interchangeable when we say ALL RECORDS MATCH. With that, the field headers also become interchangeable, and you might see data from afield in table B that suddenly belongs to table A.
- As another reminder: Do NOT use these Performance Options as a filter or join mechanic, unless you know exactly what you do. Filtering (by joining) is a side effect that can be helpful but also unpredictable if you do it wrong. Use the native filter options in Tableau instead.
Because we limit join types in the views later on, Tableau will be faster. For example, once we set either Table to ALL RECORDS MATCH, Tableau won’t perform FULL JOINS anymore.
Remember: As long as we use SOME RECORDS MATCH, Tableau will always get all the data from our data source and select the appropriate join without discarding anything while joining.