The new data model brought us the Logical Layer, from which we can draw specific tables without being forced to load the data from all the related tables. Remember, with traditional joins, all sheets that worked with that join had to load the full data set from that joined table regardless if we were only interested in rows of one of the tables. With Relationships, we can drag the fields from only one table into the view while the rows from the other tables aren’t queried at all.
This behavior is resembled when Tableau is building extracts. When we extract our data and go into the Edit window on our data source, we will find two options for Data Storage at the top:
- Logical Tables
- Physical Tables
The first option is Tableau’s default way of storing extracts while the second option is not always available (more on that further down).
Extracting Logical Tables
With this option, Tableau saves each logical table within the extract file (aka hyper file). Three logical tables related to each other will result in three separate tables in one extract file. It doesn’t matter at all if there are any joins happening within one or all of them – three logical tables will be stored as three entities in the extract file.
Now, if we don’t use traditional joins at all (in the Physical Layer), then we won’t have to think about the way of storing the extract too much as the physical tables are our logical tables. But then, if we do use traditional joins, Tableau will treat those joined tables as one entity. Two tables joined together? For the Logical Layer, this new entity has now become one logical table. That means when we build traditional joins, these joins are manifested within the extract file. Tableau joins the data, then this new table is stored as one table in the hyper file.
Why does that matter, you might ask? Well, let’s say our two joined tables don’t have the same granularity. Let’s even say it’s a many-to-many connection, which will result in a lot of row duplication. This exploding table is then saved as an exploding extract file, which can get quite big.
Let’s say we have two tables of 10,000 unique rows each. Both tables take up 0.2 MB of disc space each, together 0.4 MB. Now, imagine we do a cross join between them (also cartesian join or 1to1 join) where every row from the left table will be mapped with every row from the right table. The result of that cross join will be a table with 100,000,000 rows with a file size of 4,000 MB. That would then be the size of our hyper file.
Now, let’s have a look at the alternative:
Extracting Physical Tables
With a traditional join happening, in our extract window, we can choose the Data Storage option for Physical Tables. Long story short: With this option, our tables are not joined when extracted. In fact, the extract ignores all joining that is going on in the Physical Layer.
For our example, in which we are cross-joining two tables of 10,000 rows each, that means these two tables will remain separate. As their combined file size is only 0.4 MB, the hyper file will be approximately 10,000 times smaller than before. The downside is, of course, that the actual process of joining will have to be performed when the view in Tableau is built.
For extracting physical tables, there are a lot of extract limitations. In fact, except for hiding unused fields, the whole Edit window for the extract will be greyed out: No extract filters, no data roll up, no sample rows, no TOP N rows, no incremental refresh. It’s just the full physical tables that will be extracted (Filters of course can then be added in the Data Source Filter menu.)
Also, the only possible join operator is the equal sign. If you are working with a greater than, less than or are doing an inequality join, extracting physical tables will not be available.
With logical tables, the extract filters are still working for us if we want to. But there is an important detail that might lead to confusion if not mentioned: Extract Filters process the whole data source model when the hyper file is written. They do not only affect the table where the field for filtering was coming from.
Let me explain as it is a bit counter intuitive. In our “First Rule of Relationships,” we learned that, with Relationships, the joins are happening in our view and not in the data source. That is still correct. Now, you might expect that extract filters treat the extracted tables separately because a logical table keeps its integrity in our extract file.
They don’t. For the extracting process, Tableau builds out a Full Join of all the data of both or more related tables. Then, it performs the filtering on this new full joined table. Then, it separates the full joined table again and only keeps the rows for each table that were left. From there, we use them like in any other Relationship model, and the joins will happen in the view. And yes: That means Tableau is performing a join for the extract filtering and is then performing other ones for the actual sheet building. And another yes: That same mechanism of joining at least once before the sheet is built applies to data source filters, too.
Currently, there is no native button for filtering tables separately when creating an extract. We would have to go to our database and filter the table there before connecting it to Tableau. We would do this by creating a specific view or working with a WHERE statement in SQL or directly in CustomSQL in Tableau.
I once got the question whether a Relationship Calculation could be a workaround. Unfortunately, this doesn’t really work as the full data load would still be sent to the view. Scenario: We could add an IF statement in the Edit calculation window when relating tables that filters for a specific dimension, for example. Then, we could map this calculation to a field from the related table. The result would be unmatched rows for everything, which does not fit the check of the IF statement. But as no rows get lost with Relationships, they would still appear in our view once we add a measure from this table. We could filter out all NULL values of that field later on (in a data source filter or somewhere else), but that doesn’t change the extract performance or size.
A Few Final Notes
Don’t connect directly to the hyper file of the extract unless you have a good reason to. There are a few disadvantages when doing so: For example, the table names will be lost and replaced by Tableau generic names. Also, the whole data model won’t be visible in the data source. The extract file only saves the actual data, not how it was modelled or how the tables were related to each other.
Tableau’s Performance Options for Relationships (coming in the next chapter) do not affect extracts in any special way. They manage how logical tables are treated when related, but it doesn’t matter if they are stored as logical tables, built in the view from physical tables or are filtered for within the extract.