tag:blogger.com,1999:blog-3178813192704651028.post5879148427645298796..comments2024-03-28T17:52:37.818-04:00Comments on The Flerlage Twins: Analytics, Data Visualization, and Tableau: Tableau's New Data Model & RelationshipsKen Flerlagehttp://www.blogger.com/profile/03698843288892226027noreply@blogger.comBlogger27125tag:blogger.com,1999:blog-3178813192704651028.post-40654324635487572622020-10-15T03:56:19.225-04:002020-10-15T03:56:19.225-04:00Hi Ken
On the topic of join culling which Tamas a...Hi Ken<br /><br />On the topic of join culling which Tamas already corrected you on. While an Inner join + "Assume referential integrity" will cull the joins in the statement passed to the underlying database, most databases will cull a Left Join on a key field automatically if you are not selecting anything from the right table.<br /><br /><br />Example:<br />You have a fact sales table called dbo.Sales and a dimension table called dbo.SalesPeople.<br /><br />If you set up an "old" data model with the fact first and the dimension table to the right, and do a left join, tableau might pass a query like this to the underlying database:<br /><br />SELECT SUM(1) as [Number of records] <br />from dbo.Sales sales<br />left join dbo.SalesPeople SalesPeople<br />on Sales.SalesPeopleID=SalesPeople.SalesPeopleID<br /><br />However, if the SalesPeopleID is the Primary Key for the SalesPeople table, any good RDBMS will now that a left join like the one above will lead to neither removal nor duplication of fact-table rows, and since you are not using any fields from the dimension table in your select, the query optimizer will simply not join the two tables.<br /><br />You do not need to setup referential integrity in your database for this to work (at least not in SQL-Server)Mortenhttps://twitter.com/MortenBoDnoreply@blogger.comtag:blogger.com,1999:blog-3178813192704651028.post-62692943885583471402020-09-26T19:05:18.813-04:002020-09-26T19:05:18.813-04:00Of course I remember you!! Yes, I think it's g...Of course I remember you!! Yes, I think it's going to make our lives so much easier. And just think of how much easier it will be for newbies. You no longer even really need to understand the concept of a join, which is pretty cool.Ken Flerlagehttps://www.blogger.com/profile/03698843288892226027noreply@blogger.comtag:blogger.com,1999:blog-3178813192704651028.post-14234256504233237852020-09-26T12:41:14.762-04:002020-09-26T12:41:14.762-04:00So nice of you to remember me Ken! Thanks so much ...So nice of you to remember me Ken! Thanks so much for your help with this; I feel I understand what's happening a lot better now. I can see how the new logical layer can be really helpful as we design our published datasources, allowing us to provide more tables that won't merge into a billion-row denormalized table at extract time. Thanks so much!Susan Glasshttps://www.blogger.com/profile/00673949704628313554noreply@blogger.comtag:blogger.com,1999:blog-3178813192704651028.post-71394546356484806812020-09-25T16:19:04.738-04:002020-09-25T16:19:04.738-04:00Oh I didn't realize that was you, Susan! :)
Y...Oh I didn't realize that was you, Susan! :)<br /><br />Yes, I think you have it correct. The size of the extract will depend on a number of different factors. If you perform physical joins, then if there is a one-to-many relationship, then data in one table will be duplicated, creating a larger extract. Whereas, if you relate them logically, the extract stores each set of data separately, so that extract should be smaller. Of course, they could also change things about extracts, compression, etc. from version to version so that may not always be exactly the case.Ken Flerlagehttps://www.blogger.com/profile/03698843288892226027noreply@blogger.comtag:blogger.com,1999:blog-3178813192704651028.post-80924010584333100932020-09-24T18:08:12.690-04:002020-09-24T18:08:12.690-04:00Thank you Ken! Apologies for the anonymous initial...Thank you Ken! Apologies for the anonymous initial post - I hope I've fixed my info for this reply. <br />Your last sentence above is really helpful to me. My understanding now is that there's two sets of sql - the first to populate/refresh the data in the extract from your source database (in your example a mysql db), and the second to populate the views when a workbook is used. And the new data model changes the queries used in the latter case (extract > workbook) but remains the same in the former (db > extract)? So the Orders People queries you show above are the extract > workbook queries? If this is all correct, then is the size of the extract the same regardless of whether you're using joins vs relationships with the same set of tables?Susan Glasshttps://www.blogger.com/profile/15098209742145024127noreply@blogger.comtag:blogger.com,1999:blog-3178813192704651028.post-56875180952363170932020-09-24T13:36:43.149-04:002020-09-24T13:36:43.149-04:00I should note that I haven't dug into this in ...I should note that I haven't dug into this in detail, so I'm not 100% about what I'm about to say, but I believe I have it correct...<br /><br />By default, Tableau stores each "logical table" separately in the extract. Are you 3 SQL statements included in one logical table (with physical joins connecting them) or are they each their own logical table? If the latter, then Tableau will store them each separately within the extract. When the extract refreshes, it will execute each custom SQL separately and refresh each logical table in the extract. At that point, your SQL is done--it's only used for the extract refresh. However, when you use these extracts in a workbook, Tableau has to communicate to them and will use SQL to do that (acting as each is it's own table). At this point, it should be leveraging the new data model to determine how to build those SQL statements. A key point is that Tableau always had to execute SQL to communicate with extracts--it's really just another database--so this is not really creating extra load on the Tableau Server. It's just formulating better SQL statements.Ken Flerlagehttps://www.blogger.com/profile/03698843288892226027noreply@blogger.comtag:blogger.com,1999:blog-3178813192704651028.post-68005681170485303302020-09-24T12:35:54.866-04:002020-09-24T12:35:54.866-04:00Ken thank you so much for this post! It really hel...Ken thank you so much for this post! It really helped me to understand relationships when I presented the changes to my team yesterday. I really appreciate your expertise and willingness to share it.<br />I'm a bit unclear on *when* the various sql statements are generated, and how they relate to what is stored in a published extract on the server. If I have a workbook that generates 3 different sql statements and I publish the workbook to our server with an extract, are the 3 sql statements executed every time the extract refreshes? Or is the extract stored as separate tables, and the sql is executed when someone actually interacts with a view? If the former is true, will the published datasource be different depending on which workbook I publish it from? If the latter is true, doesn't that mean there's more load on the server, because sql queries are being executed every time a view is opened?<br />Somewhat relatedly, we're a bit concerned about the ability of a user of a published datasource to generate a killer query. For example, the Tableau example Bookshop datasource could in theory generate over a billion rows if all tables are used (haha I tried it with joins!); we could publish the Bookshop datasource to the server without a problem, but there could be uses with queries with terrible performance. With joins at least we know the total size of the dataset ahead of time.<br />I'm sure there are some basic facts about extracts and memory management on the server that I should learn - not trying to put it all on you to teach me! But if you could point me in the right direction to learn more, I'd really appreciate it. Thank you!Susan Glasshttps://www.blogger.com/profile/15098209742145024127noreply@blogger.comtag:blogger.com,1999:blog-3178813192704651028.post-77312216652493268192020-05-18T09:09:40.314-04:002020-05-18T09:09:40.314-04:00Original comment was prior to download and playing...Original comment was prior to download and playing. So far, no noticeable performance impact on legacy queries. However, I'm frequently having to blend many data sources and so this really is a game changer for my work flow. <br />My new big question is whether this would somehow enable a row level security table using extracts as row duplication was always holding me back. Hmm.... Great stuff. Daniel Colehttps://www.blogger.com/profile/08332512612908221902noreply@blogger.comtag:blogger.com,1999:blog-3178813192704651028.post-86399959748358213822020-05-15T10:41:07.460-04:002020-05-15T10:41:07.460-04:00Good question. Without seeing your data, I can'...Good question. Without seeing your data, I can't be sure how this will work. Tableau has noted, however, that the new data model does not yet handle two fact tables unless they are joined together via a common dimension (see the "Multi-fact analysis" section of https://help.tableau.com/v2020.2/pro/desktop/en-us/datasource_datamodel.htm). It sounds like you may have this scenario (or something similar) so it may not address your level of detail problems in this case. Of course, this is just version 1 so keep an eye out--I have no idea if they intend to address this, but I'm sure they'll be making continued improvements over time.Ken Flerlagehttps://www.blogger.com/profile/03698843288892226027noreply@blogger.comtag:blogger.com,1999:blog-3178813192704651028.post-55716791249975544642020-05-15T03:37:14.038-04:002020-05-15T03:37:14.038-04:00Hi Sir,
Thanks for the great post. I just wonder h...Hi Sir,<br />Thanks for the great post. I just wonder how it is possible to use relationships when you have different level of details data sources. For instance, in a common blending scenario that is I have sales data on daily level, and target quota data on monthly level, I try to compare my monthly sales (aggregated to month from day) with monthly target quota data.<br />Sales data has Category, Order ID, Product Name, Order Date columns.<br />Quato data has Category, Target, Month of Order Date columns.<br />How can I achieve a monthly level comparison?Kürşatnoreply@blogger.comtag:blogger.com,1999:blog-3178813192704651028.post-86655678296363376902020-05-13T19:54:37.841-04:002020-05-13T19:54:37.841-04:00You beat me to it, Nancy. I was just about to add ...You beat me to it, Nancy. I was just about to add a link to it. Thanks!Ken Flerlagehttps://www.blogger.com/profile/03698843288892226027noreply@blogger.comtag:blogger.com,1999:blog-3178813192704651028.post-28834072992148884422020-05-13T15:41:09.548-04:002020-05-13T15:41:09.548-04:00Hi Ken - you might want to link this great blog po...Hi Ken - you might want to link this great blog post from your post as well - went live this week: https://www.tableau.com/about/blog/2020/5/relationships-part-1-meet-new-tableau-data-model<br /><br />Nancy Matthew (tech writer at Tableau)Unknownhttps://www.blogger.com/profile/17691817375024835785noreply@blogger.comtag:blogger.com,1999:blog-3178813192704651028.post-20897890781510183862020-05-11T12:56:21.151-04:002020-05-11T12:56:21.151-04:00My way to view the SQL query is using Tableau Log ...My way to view the SQL query is using Tableau Log Viewer (https://github.com/tableau/tableau-log-viewer) to trace httpd.log. By highlighting/filtering "query-begin" and "query-end" with Live mode in Tableau Log Viewer, you can lively see what SQL statements was actually executing.Terence Zhanghttps://public.tableau.com/profile/terence.zhangteng#!/?newProfile=&activeTab=0noreply@blogger.comtag:blogger.com,1999:blog-3178813192704651028.post-80471798546994025402020-05-10T11:52:12.857-04:002020-05-10T11:52:12.857-04:00Thank you! 😊 Thank you! 😊 Ken Flerlagehttps://www.blogger.com/profile/03698843288892226027noreply@blogger.comtag:blogger.com,1999:blog-3178813192704651028.post-36937424657817189682020-05-10T11:04:15.923-04:002020-05-10T11:04:15.923-04:00Thanks much Ken for your crisp and clear explanati...Thanks much Ken for your crisp and clear explanation of the new data model and I can foresee the ease on complex data models build by self service consumers. Definitely a game change in the Tableau data model world. As always, I admire your posts.Unknownhttps://www.blogger.com/profile/01954225977680808927noreply@blogger.comtag:blogger.com,1999:blog-3178813192704651028.post-473365622666086002020-05-08T16:18:29.118-04:002020-05-08T16:18:29.118-04:00Tamas, what are your thoughts on the performance i...Tamas, what are your thoughts on the performance impact? I mean, I assume it's going to work the same way in that it formulates a new query each time (when using the logical model). Would you expect any performance impact? My gut tells me that it wouldn't be that noticeable with most reasonably sized extracts, but I'm really not sure. Would love your thoughts...Ken Flerlagehttps://www.blogger.com/profile/03698843288892226027noreply@blogger.comtag:blogger.com,1999:blog-3178813192704651028.post-19757070209628394332020-05-08T14:09:50.408-04:002020-05-08T14:09:50.408-04:00you can still aggregate tables (which is the defau...you can still aggregate tables (which is the default) for each logical tables. if you move your joins inside a logical table, the behavior is the same as in pre2020.2Tamas Foldihttps://www.blogger.com/profile/04007178828159061311noreply@blogger.comtag:blogger.com,1999:blog-3178813192704651028.post-33786502044916483192020-05-08T08:54:53.796-04:002020-05-08T08:54:53.796-04:00I don't know a lot about hyper under the cover...I don't know a lot about hyper under the covers, but it's a database like any other and there is a need to communicate with it via a SQL. So, the impact should be pretty similar. Of course, hyper is really fast and sits right next to the workbook, so it's performance is really good, even with poorly optimized SQL. Thus, the impact may not be that noticeable.Ken Flerlagehttps://www.blogger.com/profile/03698843288892226027noreply@blogger.comtag:blogger.com,1999:blog-3178813192704651028.post-40910719200764986592020-05-08T08:52:48.985-04:002020-05-08T08:52:48.985-04:00Old workbooks will continue to work the same way a...Old workbooks will continue to work the same way as they always have. The new data model exposes two layers--the physical layer and the logical layer. Old fashioned joins can still be created using the physical layer. If you open an old workbook, you'll see the logical layer, but the joins will still be set up at the physical layer. You can get to that layer with a couple of clicks.Ken Flerlagehttps://www.blogger.com/profile/03698843288892226027noreply@blogger.comtag:blogger.com,1999:blog-3178813192704651028.post-27076645673205978332020-05-08T02:03:13.885-04:002020-05-08T02:03:13.885-04:00Hey Ken, does this new feature only affect new viz...Hey Ken, does this new feature only affect new viz's going forward or does Tableau somehow 'refactor' existing/old viz's as well so that they perform better? Anonymoushttps://www.blogger.com/profile/05160499895090973609noreply@blogger.comtag:blogger.com,1999:blog-3178813192704651028.post-18272446401091542622020-05-07T21:32:15.555-04:002020-05-07T21:32:15.555-04:00Love that every new release has such great stuff p...Love that every new release has such great stuff packed in. As an ESRI user there's a lot to play with. <br /><br />For this post, how will this new process impact Extracts? My understanding is that no matter how complicated the SQL, once extracted into a hyper DB it's all the same. Will this change anything for us extract heavy users? Daniel Colehttps://www.blogger.com/profile/08332512612908221902noreply@blogger.comtag:blogger.com,1999:blog-3178813192704651028.post-38628279243839244882020-05-07T16:48:07.156-04:002020-05-07T16:48:07.156-04:00Hi Michael. The good news is that Tableau has not ...Hi Michael. The good news is that Tableau has not eliminated the old way. There is not a physical layer and a logical layer. The logical layer deals with relationships, but the physical layer is the way it's always worked. So, I suspect that custom SQL will just operate at the physical layer. You can then add an additional logical layer on top of that, if desired. And getting to the physical model is just a couple of clicks.Ken Flerlagehttps://www.blogger.com/profile/03698843288892226027noreply@blogger.comtag:blogger.com,1999:blog-3178813192704651028.post-76289430736913170512020-05-07T16:44:54.152-04:002020-05-07T16:44:54.152-04:00You easily edit the relationships right in the dat...You easily edit the relationships right in the data model. This "logical" model is now the default, but you can get to the "physical" model (the old way) through a couple of clicks. So, if you are uncomfortable with the new data model, you can use the old way. I capture the SQL by placing a trace on my SQL Server database. There are other ways to do this, but this is the method I personally use. And I agree--a "Show Query" option would be great, particularly with this new setup.Ken Flerlagehttps://www.blogger.com/profile/03698843288892226027noreply@blogger.comtag:blogger.com,1999:blog-3178813192704651028.post-8086710330685222542020-05-07T14:14:16.073-04:002020-05-07T14:14:16.073-04:00Thanks for this post Ken - its a good first look i...Thanks for this post Ken - its a good first look into this pretty dramatic change. I haven't had a chance to play with 2020.2 yet - how or where do you define or edit the 'relationships' that Tableau uses to create this on-the-fly SQL for each worksheet or view? It seems cool, but also a little worrying, My admittedly somewhat pessimistic concern is always when an automated system tries to 'help' it will quite often do the exact thing you don't want. I'm hoping they've left in the manual override switch and you can still expressly define how you want the data to be linked if the automated query is returning incorrect results. Also how do you see the SQL that's being generated? Do you still have to generate a performance recording or pull it from the logs? I've been hoping for years Tableau would add a 'view query' option - this seems like an necessity now in order to verify results are accurate. Thanks!Mike Cnoreply@blogger.comtag:blogger.com,1999:blog-3178813192704651028.post-22272893199297223042020-05-07T13:14:24.659-04:002020-05-07T13:14:24.659-04:00Hi Ken, thanks for the "early bird" blog...Hi Ken, thanks for the "early bird" blog which analyzes and discusses the new data model. I would rank this improvement as the secondly most important progress which just follows "Level of Detail Expressions" function, which was added in Tableau 9. You are right. More tests and practices are needed to see if there will be no interfere with LOD, filters and other Tableau concepts. I have a concern. I think that this new model will not work with the custom SQL data sources, which the joins are fulfilled by SQL instead of diagram links in Tableau. In the real project, especially when we need to join more than two tables, we mostly use custom SQL as data source instead of table diagram links. That might still be a limitation. Anyway, I am happy to see that Tableau has started to think about this important thing. In my opinion, after the LOD introduction in 5 years ago, there is no very big improvement ever since (maybe still have some very good but not key improvement, for example, cross join, set and parameter actions, etc).<br /><br /><br />Thanks for all your blogs(with Kevin).<br /><br />Michael YeAnonymoushttps://www.blogger.com/profile/04826941847040967259noreply@blogger.com