Comments on The Flerlage Twins: Analytics, Data Visualization, and Tableau: Tableau's New Data Model & Relationships

The article is a rare case of looking at Tableau t...

2025-04-05T01:28:46.526-04:00

The article is a rare case of looking at Tableau through SQL lenses. I put it into my spaced repetion cards, so, forgive me for having questions over and over again. 'Spaced' indeed. ))

Now, I lost the understanding why we use INNER JOIN twice.

Instead of
SELECT [Customer Name], SUM([Sales])
FROM [Orders]
INNER JOIN (
SELECT [Orders].[Region]
FROM [Orders]
INNER JOIN [People_Multiple]
ON [Orders].[Region] = [People_Multiple].[Region]
WHERE [Person] = 'Central Person 1'
GROUP BY [Orders].[Region]
) AS [Table 1]
ON [Orders].[Region] = [Table 1].[Region]
GROUP BY [Customer Name]

I use

SELECT [Customer Name], SUM([Sales])
FROM [Orders]
INNER JOIN (
SELECT [People_Multiple].[Region]
FROM [People_Multiple]
WHERE [People_Multiple].[Person] = 'Central Person 1' //or whatever
GROUP BY [People_Multiple].[Region]
) AS [Table 1]
ON [Orders].[Region] = [Table 1].[Region]
GROUP BY [Customer Name]

I don't see the difference. There mustn't be duplication due to "GROUP BY [People_Multiple].[Region]". What do I miss here?

Yep, you're right. That was a poor example, wa...

2024-12-31T14:41:17.256-05:00

Yep, you're right. That was a poor example, wasn't it? I should have filtered on the "Central" Region instead of the specific person.

A great article. Imho, if you are going to be fair...

2024-12-18T23:35:03.995-05:00

A great article. Imho, if you are going to be fair to the old Data Model, "WHERE [Person] = 'Central Person 1'" will eliminate replication of the rows. The problem will arise with two persons like WHERE [Person] = 'Central Person 1' OR [Person] = 'Central Person 2'
Correct me if I wrong.
Many thanks for the knowledge!

Hi Ken On the topic of join culling which Tamas a...

2020-10-15T03:56:19.225-04:00

Hi Ken

On the topic of join culling which Tamas already corrected you on. While an Inner join + "Assume referential integrity" will cull the joins in the statement passed to the underlying database, most databases will cull a Left Join on a key field automatically if you are not selecting anything from the right table.

Example:
You have a fact sales table called dbo.Sales and a dimension table called dbo.SalesPeople.

If you set up an "old" data model with the fact first and the dimension table to the right, and do a left join, tableau might pass a query like this to the underlying database:

SELECT SUM(1) as [Number of records]
from dbo.Sales sales
left join dbo.SalesPeople SalesPeople
on Sales.SalesPeopleID=SalesPeople.SalesPeopleID

However, if the SalesPeopleID is the Primary Key for the SalesPeople table, any good RDBMS will now that a left join like the one above will lead to neither removal nor duplication of fact-table rows, and since you are not using any fields from the dimension table in your select, the query optimizer will simply not join the two tables.

You do not need to setup referential integrity in your database for this to work (at least not in SQL-Server)

Of course I remember you!! Yes, I think it's g...

2020-09-26T19:05:18.813-04:00

Of course I remember you!! Yes, I think it's going to make our lives so much easier. And just think of how much easier it will be for newbies. You no longer even really need to understand the concept of a join, which is pretty cool.

So nice of you to remember me Ken! Thanks so much ...

2020-09-26T12:41:14.762-04:00

So nice of you to remember me Ken! Thanks so much for your help with this; I feel I understand what's happening a lot better now. I can see how the new logical layer can be really helpful as we design our published datasources, allowing us to provide more tables that won't merge into a billion-row denormalized table at extract time. Thanks so much!

Oh I didn't realize that was you, Susan! :) Y...

2020-09-25T16:19:04.738-04:00

Oh I didn't realize that was you, Susan! :)

Yes, I think you have it correct. The size of the extract will depend on a number of different factors. If you perform physical joins, then if there is a one-to-many relationship, then data in one table will be duplicated, creating a larger extract. Whereas, if you relate them logically, the extract stores each set of data separately, so that extract should be smaller. Of course, they could also change things about extracts, compression, etc. from version to version so that may not always be exactly the case.

I should note that I haven't dug into this in ...

2020-09-24T13:36:43.149-04:00

I should note that I haven't dug into this in detail, so I'm not 100% about what I'm about to say, but I believe I have it correct...

By default, Tableau stores each "logical table" separately in the extract. Are you 3 SQL statements included in one logical table (with physical joins connecting them) or are they each their own logical table? If the latter, then Tableau will store them each separately within the extract. When the extract refreshes, it will execute each custom SQL separately and refresh each logical table in the extract. At that point, your SQL is done--it's only used for the extract refresh. However, when you use these extracts in a workbook, Tableau has to communicate to them and will use SQL to do that (acting as each is it's own table). At this point, it should be leveraging the new data model to determine how to build those SQL statements. A key point is that Tableau always had to execute SQL to communicate with extracts--it's really just another database--so this is not really creating extra load on the Tableau Server. It's just formulating better SQL statements.

Original comment was prior to download and playing...

2020-05-18T09:09:40.314-04:00

Original comment was prior to download and playing. So far, no noticeable performance impact on legacy queries. However, I'm frequently having to blend many data sources and so this really is a game changer for my work flow.
My new big question is whether this would somehow enable a row level security table using extracts as row duplication was always holding me back. Hmm.... Great stuff.

Good question. Without seeing your data, I can'...

2020-05-15T10:41:07.460-04:00

Good question. Without seeing your data, I can't be sure how this will work. Tableau has noted, however, that the new data model does not yet handle two fact tables unless they are joined together via a common dimension (see the "Multi-fact analysis" section of https://help.tableau.com/v2020.2/pro/desktop/en-us/datasource_datamodel.htm). It sounds like you may have this scenario (or something similar) so it may not address your level of detail problems in this case. Of course, this is just version 1 so keep an eye out--I have no idea if they intend to address this, but I'm sure they'll be making continued improvements over time.

Hi Sir, Thanks for the great post. I just wonder h...

2020-05-15T03:37:14.038-04:00

Hi Sir,
Thanks for the great post. I just wonder how it is possible to use relationships when you have different level of details data sources. For instance, in a common blending scenario that is I have sales data on daily level, and target quota data on monthly level, I try to compare my monthly sales (aggregated to month from day) with monthly target quota data.
Sales data has Category, Order ID, Product Name, Order Date columns.
Quato data has Category, Target, Month of Order Date columns.
How can I achieve a monthly level comparison?

You beat me to it, Nancy. I was just about to add ...

2020-05-13T19:54:37.841-04:00

You beat me to it, Nancy. I was just about to add a link to it. Thanks!

Hi Ken - you might want to link this great blog po...

2020-05-13T15:41:09.548-04:00

Hi Ken - you might want to link this great blog post from your post as well - went live this week: https://www.tableau.com/about/blog/2020/5/relationships-part-1-meet-new-tableau-data-model

Nancy Matthew (tech writer at Tableau)

My way to view the SQL query is using Tableau Log ...

2020-05-11T12:56:21.151-04:00

My way to view the SQL query is using Tableau Log Viewer (https://github.com/tableau/tableau-log-viewer) to trace httpd.log. By highlighting/filtering "query-begin" and "query-end" with Live mode in Tableau Log Viewer, you can lively see what SQL statements was actually executing.

Thank you! 😊

2020-05-10T11:52:12.857-04:00

Thank you! 😊

Thanks much Ken for your crisp and clear explanati...

2020-05-10T11:04:15.923-04:00

Thanks much Ken for your crisp and clear explanation of the new data model and I can foresee the ease on complex data models build by self service consumers. Definitely a game change in the Tableau data model world. As always, I admire your posts.

Tamas, what are your thoughts on the performance i...

2020-05-08T16:18:29.118-04:00

Tamas, what are your thoughts on the performance impact? I mean, I assume it's going to work the same way in that it formulates a new query each time (when using the logical model). Would you expect any performance impact? My gut tells me that it wouldn't be that noticeable with most reasonably sized extracts, but I'm really not sure. Would love your thoughts...

you can still aggregate tables (which is the defau...

2020-05-08T14:09:50.408-04:00

you can still aggregate tables (which is the default) for each logical tables. if you move your joins inside a logical table, the behavior is the same as in pre2020.2

I don't know a lot about hyper under the cover...

2020-05-08T08:54:53.796-04:00

I don't know a lot about hyper under the covers, but it's a database like any other and there is a need to communicate with it via a SQL. So, the impact should be pretty similar. Of course, hyper is really fast and sits right next to the workbook, so it's performance is really good, even with poorly optimized SQL. Thus, the impact may not be that noticeable.

Old workbooks will continue to work the same way a...

2020-05-08T08:52:48.985-04:00

Old workbooks will continue to work the same way as they always have. The new data model exposes two layers--the physical layer and the logical layer. Old fashioned joins can still be created using the physical layer. If you open an old workbook, you'll see the logical layer, but the joins will still be set up at the physical layer. You can get to that layer with a couple of clicks.

Hey Ken, does this new feature only affect new viz...

2020-05-08T02:03:13.885-04:00

Hey Ken, does this new feature only affect new viz's going forward or does Tableau somehow 'refactor' existing/old viz's as well so that they perform better?

Love that every new release has such great stuff p...

2020-05-07T21:32:15.555-04:00

Love that every new release has such great stuff packed in. As an ESRI user there's a lot to play with.

For this post, how will this new process impact Extracts? My understanding is that no matter how complicated the SQL, once extracted into a hyper DB it's all the same. Will this change anything for us extract heavy users?

Hi Michael. The good news is that Tableau has not ...

2020-05-07T16:48:07.156-04:00

Hi Michael. The good news is that Tableau has not eliminated the old way. There is not a physical layer and a logical layer. The logical layer deals with relationships, but the physical layer is the way it's always worked. So, I suspect that custom SQL will just operate at the physical layer. You can then add an additional logical layer on top of that, if desired. And getting to the physical model is just a couple of clicks.

You easily edit the relationships right in the dat...

2020-05-07T16:44:54.152-04:00

You easily edit the relationships right in the data model. This "logical" model is now the default, but you can get to the "physical" model (the old way) through a couple of clicks. So, if you are uncomfortable with the new data model, you can use the old way. I capture the SQL by placing a trace on my SQL Server database. There are other ways to do this, but this is the method I personally use. And I agree--a "Show Query" option would be great, particularly with this new setup.

Thanks for this post Ken - its a good first look i...

2020-05-07T14:14:16.073-04:00

Thanks for this post Ken - its a good first look into this pretty dramatic change. I haven't had a chance to play with 2020.2 yet - how or where do you define or edit the 'relationships' that Tableau uses to create this on-the-fly SQL for each worksheet or view? It seems cool, but also a little worrying, My admittedly somewhat pessimistic concern is always when an automated system tries to 'help' it will quite often do the exact thing you don't want. I'm hoping they've left in the manual override switch and you can still expressly define how you want the data to be linked if the automated query is returning incorrect results. Also how do you see the SQL that's being generated? Do you still have to generate a performance recording or pull it from the logs? I've been hoping for years Tableau would add a 'view query' option - this seems like an necessity now in order to verify results are accurate. Thanks!