There are quite a few tips and tricks out there for Relationships in Tableau; just not nearly as many as for every other topic in Tableau, which is a shame. Nevertheless, this article is supposed to grow over time, and we start with something very simple!
What Is a Root Table?
Like with our traditional joins in the Physical Layer, the very first table we drag into the Logical Layer also defines the rest of the data model. Whatever we join/relate next, will be joined/related to this first table and from there onwards. This first table is called the root table. As soon as we have two tables related to each other, the third one can be related to the root table or the second table we dragged into the data source.
Why Was There a Problem?
Usually, we were and are quite free when we want to drag any secondary table to another place in our data model. Tables that are related to that secondary one will just move along with it. Not so with our root table. If we drag this one somewhere else, it won’t work. We cannot relate the root table with other tables in an already set structure. And if we drag the root table out of the data source, our whole model is gone. That makes it extremely tedious if we need to switch the root table with another one. Our whole model would disappear and we would have to build it again, defining all the different relationships. (Imagine our data model not having three tables, but twenty).
And the Solution?
Unlike with our traditional joins, we can swap the root table with any other logical table that is part of the data model. This functionality has been added with Tableau version 2022.1 in late March of 2022. We can now go to any other table that is part of the relationship model, right-click it and select Swap with root. Once we do that, our model is still intact and Tableau will just swap these two tables and retain all built up Relationships. We don’t need to fix anything after that because no Relationship will have changed.
The IFNULL Helper
What Is the Problem?
If we drag measures into our view, we may get a few NULLs in there. Measures retain values, as well as records, that don’t have a match in the other table will be part of our view. This is a classic problem as these NULLs are very visible in our tables. Let’s say we have two tables: one with birthdays of our friends or colleagues, and one with their hobbies. So, we now have some inspiration for birthday presents.
Both tables contain their first names, but unfortunately, for Carl and Chris, we don’t know their hobbies and for Matt, nobody has told us his birthday. And so, we find a NULLs beside their names, when we drag them into our view in Tableau. This makes reading our data quite tedious, not talking about users, who might expect that something is broken and thus don’t trust the data.
IFNULL Helps Out
Now, we cannot get rid of the NULLs that are shown in the green rectangles. We don’t know these hobbies and we cannot conjure them up. The NULLs in the blue rectangles are a different story. Tableau’s function IFNULL checks if the records of a specific field contain NULL values. If they don’t, the function will display the field as it is. If they do, we can tell Tableau what to fill those records with instead. In our case, we will tell Tableau to fetch the First Name (A); unless there is a NULL in there, then we want to fetch the First Name (B):
Querying – A Few Explanations
There are things that changed with Relationships, like the joining behavior in a query and the number of queries processed. We also have to talk about the order in which tables are queried.
With traditional joins, we defined the join type in the data source tab. When dragging fields into the view, Tableau was sending a query to first one of the tables to fetch dimensions from that table, but it was also sending a query to both tables at the same time to build out the join – the join that we had defined before. But the thing was: It always fired the join query out there, even if we worked with fields from the left table only.
This is not so with Relationships. The join type gets orders from our Ground Rule #2 from before: The dimensions and measures in the sheet define what join type we will get. And if we drag fields in from one table only, there will never be a join – just a query on that one table.
Defining a relationship tells Tableau how to relate the data if we choose to build a view with data from both tables, and only then will a Join be enforced.
Number of Queries
With traditional joins, we had a very granular query firework. Every dimension dragged in created one or two queries, especially when there were measures in there because the measure was re-calculated then, as well.
Relationships summarize or shorten a few of these queries. As a rule of thumb: Whenever the dimension structure changes, then the measures are re-calculated as before. But if we add new dimensions without changing the granularity of the viz (like a new dimension on Rows does not add new rows of data to the viz), then Tableau saves time by not firing the re-calculation queries again.
Parent-Child, Left-Right? In Which Order Do the Queries Run?’
This question has reached us quite a few times. When I relate a parent table and a child table, then use fields from the child table, is the parent table queried as well?
The answer is no. If we use fields from a single table out of a Relationship, only this table is queried. This means that if you have a data model with 20 related tables, each and every one really massive, then you will still query only one table when you have fields from that table only in your view. Tableau ignores the rest because it doesn’t need anything from there.
So, if a table is in a parent-child relationship or in a lookup relationship, it doesn’t matter: The fields alone determine which table is queried in our view. By the way, this was one of Tableau’s promises: When the new data model was introduced, query time shall be reduced due to only fetching what is needed. This is good news.
Of course, we tested that instead of just believing what Tableau said. We connected quite a few tables in a big relationship model, then we took the time for querying single tables compared to more than one table. The result was definite: shorter query times for single tables. This is also supported by the Performance Recorder in Tableau, which showed its SQL query getting data from one table only, used a cached query after we swapped the root table in our model, and built the same chart again.
For more on performance, stay tuned for later and more advanced articles in this series.
What Kind of Constant Are We Talking About Here?
Numbers in calculated fields. When we open up a calculated field and type in a 1, then this 1 is duplicated for every row of that data source. Let’s say our data source has ten rows. When we open up the View Data window, we will see ten 1s.
This becomes relevant of course, as soon as we drag this field that just contains the number 1 into our view. By default, Tableau will aggregate this number and sum it up. As such, when we have ten rows in our data source, we get the sum of 10 in the end.
What Is Different Here with Relationships?
With Relationships (and only then!), constants do not belong to a specific table anymore but are global fields. (Meaning, they reside in the lower area of the data pane). If we do not drag any other field into the view but our constant, then this constant will be exactly one number. This sounds weird, but believe me, it does make sense. As the 1 in our field does not belong to a table, there aren’t any rows for which it could be duplicated. The table consists of exactly one cell. As such, the 1 happens exactly one times, and the sum of that one 1 is 1 again.
But the moment we drag a dimension into the view, that changes. Now there are more rows/columns to that table, and the 1 is going to duplicate. In other words: Constants adapt to the granularity of the table they are used with.
Now, what we used to do without Relationships was calculating a row count with that trick. When there is only one logical table – not related to any other table – that still works. A field that contains the digit 1 will give us the row count of that table when it is summed up. If we want to do this trick with Relationships, we have to address the table in that calculated field.
There is a simple way to do that: Use an IF statement and check for something, that is always true, for example:
IF [Dimension1] = [Dimension1] THEN 1 END
That forces our 1 to be written into every row of the table that is coming from and will give us the row count.