Kirk Munroe: 14 Scenarios and When to Use Tableau Prep vs Desktop?

 

Kevin and I are pleased to welcome back Kirk Munroe for the second blog in his series about data modeling with Tableau. Kirk lives in Halifax, Nova Scotia, Canada and is a business analytics and performance management expert. He is currently one of the owners and principal consultants at Paint with Data, a visual analytics consulting firm, along with his business partner and wife, Candice Munroe, a Tableau User Group Ambassador and former board member for Viz for Social Good.

 

Kirk is also the author of Data Modeling with Tableau, an extensive guide, complete with step-by-step explanations of essential concepts, practical examples, and hands-on exercises. The book details the role that Tableau Prep Builder and Tableau Desktop each play in data modeling. It also explores the components of Tableau Server and Cloud that make data modeling more robust, secure, and performant. This blog will give you a taste of what’s available in the book, so if you find this useful, go buy the book! Also be sure to check out Kirks’ previous blog, 4 Common Tableau Data Model Problems…and How to Fix Them.

 

Desktop vs Prep

When modeling data in Tableau, one of the most common questions I hear is, “when do I use Tableau Prep and when should I use Desktop?” In this post, I am going to do my best to answer these questions in a pragmatic manner.

 

Why pragmatic? Data modeling often ends up in a very theoretical, almost dogmatic, conversation. In the context of Tableau, this often means that many of the items that are discussed here would ideally be done earlier in the data pipeline. I acknowledge that many of these items might be in the vein of “once the horse is out of the barn” but, in my experience, data engineering teams are regularly overtaxed and often have strict development processes that result in weeks (or months) for change requests. Business users don’t want to wait that long. So, we are going to make three assumptions before diving into our scenarios:

 

1) It’s In Your Hands - You, the reader, are tasked with coming up with an analysis for decision-makers in your organization and you don’t have the ability, funding, or time to go back to an earlier point in the data pipeline.

 

2) You have a Tableau Creator License (which gives you access to both Desktop and Prep) and don’t have access to other data prep tools. In other words, we are going to solve these use cases using only Tableau products.

 

3) Your organization has Tableau Prep Conductor on Tableau Server or Cloud. Prep Conductor is part of the Data Management Add-On and, without it, you cannot easily schedule Prep workflows to refresh automatically. Without this, the choice of Prep vs Desktop becomes more difficult as you can easily refresh data models created in Desktop, but Prep would require you to either refresh them manually or to use unwieldy batch files.

 

We are going to use the language of Prep and Desktop for brevity. Prep represents the Tableau Prep Builder client software or New Flow capability on the web. Desktop represents the Tableau Desktop client software or the New Workbook and New Published Data Source capabilities on the web.

 

One other important concept to always keep in mind is the trade-off between the cost of a query and the user experience (UX) for your end users. What is meant by the cost of a query? The simplest way to think of this is how long the query takes to run. A “cheap” query runs quickly (e.g. subsecond) and an “expensive” query takes longer to run. In the cloud computing era, this can often map exactly to the monetary metaphor it represents if you are paying by server utilization. The concept is still important with on-premise database servers as a longer query means longer queues for subsequent queries. We want to optimize our queries to be as quick (cheap) as possible but it isn’t always possible. In these cases, we want to pay the cost of these queries as infrequently as possible.

 

What does all this mean in the context of Prep vs Desktop? Prep will typically pay the cost of the query once per output–commonly once a day. In the case of a live connection, Desktop will pay the cost of the query every time someone interacts with the viz. This means, the more expensive the query, the better suited it is for Prep.

 

Of course, when we’re working with extracts, that’s not exactly the case. In our scenario, we’re most likely going to output our Prep flows to a hyper extract. You can, of course, do the same thing with Desktop. By doing this, we’re reducing the cost of the query to once, just like in Prep. Of course, an extract is, in fact, a database, so Tableau is always executing queries against it. But, at that point, the question of Prep vs Desktop becomes less about how many times a query is run and more about the complexity of those queries, which we’ll touch on in these scenarios.

 

What about the end-user experience? In many situations, this is what you have to trade off against query cost. Let’s take a common use case that we won’t cover in this post. In the Designing Efficient Workbooks whitepaper (which is an amazing resource, so I highly recommend having it handy at all times!), the authors talk about how expensive “only relevant values” filters are in Tableau. This is a trade-off I would make for user experience every time (assuming I couldn’t get rid of the quick filters on the dashboard altogether 😊). When a business user comes across a quick filter with values that, if selected, result in viz with no data, the experience is frustrating and often appears to be a bug. It can result in them not trusting or even using your viz. Worth the trade-off in performance! Of course, a slow-running query is not good for user experience either, so there’s a delicate balance for us to strike.

 

Scenarios

Now let’s jump into our scenarios. For each, I’ll provide one of the following recommendations:

 

Use Prep - Only Prep has the feature(s) needed.

 

Lean Prep - This is a case where it can be done in either but it is better done in Prep. If you’re already building a Prep flow for other purposes, then I’d highly recommend pushing this particular functionality into that flow.

 

Either - Both products work just fine so you’ll need to consider the cost of the query and user experience, then pick the tool that makes the most sense for your use case.

 

Lean Desktop - It can be done in either but it is better done in Desktop. If you’re not already building a Prep flow for other purposes, then it’s probably not worth creating one specifically for this scenario.

 

Use Desktop - Only Desktop has the feature(s) needed.

 

I’ve sorted these scenarios from Use Prep to Use Desktop as that is the order we would typically encounter them in the data pipeline.

 

1) Pivot Rows to Columns - Use Prep

Why: While you can sort of force Desktop to do something similar to this, the Pivot Rows to Columns feature only exists in Prep.

 

When might you encounter it: When your data is too narrow as described in 4 Common Tableau Data Model Problems…and How to Fix Them.

 

Exceptions: None. Trying to deal with this in Desktop results in bad UX and costly queries.

 

2) Cleaning Data - Use Prep

Why: This feature is somewhat limited in Desktop. Prep has robust machine learning algorithms to clean and profile data as you work.

 

When might you encounter it: When the source systems that produce your data don’t properly enforce text field input validation. Or, in general, you’re working with some sort of messy data set. In other words, a lot!

 

Exceptions: There are a few scenarios where, if you have only a little bit of data cleaning to do, you might be able to knock that out in Desktop. But, I’d recommend against that as data profiling is built into Prep, allowing you to catch cleanliness problems you would have never even noticed otherwise.

 

3) Date Scaffolding - Use Prep

Why: While possible to do this in Desktop, it’s very complicated. Prep, on the other hand, includes easy-to-use native functionality for date scaffolding.

 

When might you encounter it: When your data is too short as described in 4 Common Tableau Data Model Problems…and How to Fix Them.

 

Exceptions: None

 

4) Row-Level Calculations - Lean Prep

Why: A row-level calculation is always an option in Prep as it has all rows and the calculation is not dependent on the viz level-of-detail. If you do these calculations in Desktop, you might pay the price for the same query multiple times. (As noted earlier, in the case of extracts, Desktop will materialize these calculations. In other words, it will only pay the cost of the query when it rebuilds the extract, the same as Prep. However, it is not always easy to predict when Hyper materializes a calculation and when it doesn’t.)

 

When might you encounter it: It’s pretty rare to not have any need for row-level calculations, so it’s likely you’ll have to deal with this almost every time.

 

Exceptions: If you aren’t otherwise using Prep to build your data model and you test in Desktop (or with Performance Recorder) and it runs quickly.

 

5) Integration with Data Science Models - Either

Why: If the data science model is intended to run once per refresh of the data model, use Prep. If it is being used interactively based on user input, use Desktop. For example, you might need to score a large amount of data for customer sentiment; in this case, since it’s being done once, you can use Prep. Another example is interactively deciding on a customer's propensity to churn based on parameter inputs; in this case, the output is dependent upon user interaction, so you must use Desktop.

 

When might you encounter it: Any time you’re using an analytics extension (R, Python, etc.) to process data.

 

Exceptions: NA

 

6) Filtering Data - Either

Why: Both Prep and Desktop make it easy to filter both rows and columns of data. When filtering is needed to create your data model (for example, maybe you only need to show the last 3 years), then Prep is best. When filtering needs to be interactive for your user, then you must use Desktop.

 

When might you encounter it: Filters are used in almost everything I create, so you’ll encounter this in most scenarios. When deciding whether to use Prep or Desktop, you should focus on whether or not those filters need to be available to users.

 

Exceptions: N/A

 

7) Union Data - Either

Why: Both Prep and Desktop make it easy to union data. My preference is to do this in Prep if possible as it will greatly simplify your resulting Tableau Desktop data model and could help with query performance.

 

When might you encounter it: Fairly Often. Especially when legacy systems dump data to flat files for ingestion into analytics tools like Tableau.

 

Exceptions: N/A

 

8) Joining Tables that Don’t Cause Aggregation Issues - Either

Why: Both Prep and Desktop make it easy to join tables. Most joins do not cause aggregation issues as you will be joining dimension tables to fact tables. An example would be a Superstore-like scenario in the “real world.” Superstore is already a highly denormalized table. It would have likely been created by joining a fact table of sales with a few dimension tables. For instance, in terms of product analysis, the fact table would have only had Product ID in the fact (sales) table. This data would have been joined with a product table with fields for Category, Subcategory, and Product Name. This is a pretty common type of join and does not cause aggregation issues as the result of the join leaves the leave of aggregation the same as it was before the join–individual sales at the product level.

 

When might you encounter it: Frequently. It’s very common to need to join multiple tables together.

 

Exceptions: Not an exception, but this might have a slight “Lean Prep” because the Prep UI makes the results of your join immediately obvious in a visual manner. However, the nature of relationships might cause you to rethink this, as I’ll address later in this post.

 

9) Pivot Columns to Rows - Either

Why: Both Prep and Desktop make it easy to pivot columns to rows. As with unioning, I prefer pushing this to Prep in order to simplify your Desktop data model. Prep also has a little more functionality here, particularly the ability to use wildcards when pivoting.

 

When might you encounter it: When your data is too wide as described in 4 Common Tableau Data Model Problems…and How to Fix Them.

 

Exceptions: N/A

 

10) Aggregate Calculations - Use Desktop

Why: Aggregate calculations are typically dependent on the fields on the viz, so they should be done in Desktop. For example, consider profitability. You can create a SUM(Profit)/SUM(Sales) calculation in Desktop then use it to find profitability at the region level, customer level, or any level you choose–all using a single calculation. In Prep, you’d have to know those levels of detail ahead of time, then create individual fields for each one.

 

When might you encounter it: Aggregate calculations are very common and are typically used in most Tableau workbooks.

 

Exceptions: MIN and MAX calculations could make sense in Prep. Imagine that you want to know when a customer first bought a product, using Superstore as an example. The calculation { FIXED [Customer Name] : MIN([Order Date])} is not dependent on the level of the viz. Why not ensure you only pay the cost of that query once and put it in Prep?

 

11) “Conditional” Calculations - Use Desktop

Why: These are calculations that are “conditional” depending on what is placed on the view or based on some data input. Generally speaking, these tend to be table calculations (which will always compute differently based on the dimensions on the view) and some types of level-of-detail (LOD) calculations, particularly INCLUDE and EXCLUDE. In such cases, Desktop must be used.

 

When might you encounter it: It’s somewhat often that you’ll need to use these types of calculations in your work.

 

Exceptions: In some simple cases where a calculation is conditional on a single value, Prep could be used (particularly using a parameter). There are also other scenarios where the only option in Desktop is the use of a table calculation, but Prep could be used to limit such a need. For a good example of such a scenario, see the Get Data from a Prior Row/Partition section of Ken’s blog, 3 Creative Uses for Relationships/Joins in Tableau. The blog shows how to address the problem using Desktop, but we think it would be more effective to handle this upfront in Prep. The solution pushes the work to the data model, resulting in the elimination of any need for complicated table calculations.

 

12) Extending the Data Model (Sets, Folders, Hierarchies, Custom Formatting, etc.) - Desktop

Why: There is so much you can do with your data model in Desktop that you cannot create in Prep, including creating Sets and Hierarchies, organizing data into folders, applying default formatting (and other settings) to data items, etc.

 

When might you encounter it: All the time! This doesn’t mean Prep can’t be your go-to modeling tool. Just enhance your model after Prep runs the flow. The good news is that all your changes are maintained when the flow is run in the future.

 

Exceptions: N/A

 

13) “Joining” Tables that Result in Aggregation Levels - Use Desktop

Why: This isn’t possible in Prep. It is done in Desktop with relationships and not joins, hence the “joining” in the use case title. Let’s take the case of getting sales targets for each subcategory at the monthly level. Your consumers are looking to see how sales performed against targets. If you join these data, you are going to end up with issues because the sales data is at the transactional level, but targets are at the monthly level. This means you will either have to: (i) pre-aggregate our sales data before making the join (which limits details analysis of sales) or (ii) be really careful with the aggregation levels when creating sheets in Tableau. These data at different levels of aggregation are handled pretty seamlessly by relationships, which will we talk about shortly

 

When might you encounter it: Somewhat regularly. While this is not needed in all analyses, it is not an uncommon requirement. When it occurs, it can be difficult to determine the best approach.

 

Exceptions: N/A

 

14) Live Connections to Your Data - Desktop

Why: If you need a live connection to your data, then you cannot run it through a Prep flow ahead of time as Prep always outputs the data from flows into a new table/files, which are essentially point-in-time snapshots.

 

When might you encounter it: There are many use cases for live connections, especially with the emergence of high-performance cloud data warehouses. However, it’s important to only use live connections when you really need them because this can be a big performance hit.

 

Exceptions: There are no true exceptions as only Desktop can connect live but there may be cases where you create flows in Prep, output to a database instead of a published data source, and have Desktop connect live to that data. Or you may be able to schedule Prep flows to run frequently enough to create a data source that is near-real time enough to meet your users’ requirements.

 

The Impact of Relationships

Before closing out this blog, I think it’s important to briefly touch on the impact of relationships, a feature added to Tableau Desktop in version 2020.2. As discussed by Ken in his blog, Tableau's New Data Model, relationships provide a more flexible way to join your data that is dependent upon the dimensions and measures used in your view. Essentially, relationships cause Tableau to perform what Ken calls “Smart SQL”, only joining the tables that are needed to meet the need of a specific view. When using a live connection, this can help to improve query performance significantly by “culling” unneeded joins and tables. But, since extracts are also a type of database, it has a similar impact when connected to extracts.

 

Obviously, since relationships are largely viz-dependent, they don’t make sense in Tableau Prep. But their existence may cause you to reconsider how you build your data model in Prep. Let’s say, for example, you have two tables that you want to combine together. One has 10 thousand rows and will be used in every worksheet you create in Tableau. The other has 10 million rows and will only be used on one sheet. You could join the tables in Prep, but your resulting data model will have at least 10 million rows. Once you’re working with that data in Tableau Desktop, it will need to process all those rows for all of your sheets. This, of course, is the beauty of relationships. But this does not mean that you should dispense with Prep. Instead, you should just rethink your Prep workflow slightly. Instead of joining these within the workflow, you can work with both separately. Clean up and extend each table in the flow, but output them to their own tables/files. Then, in Desktop, connect those outputs using relationships, so that it will cull the large table when it’s not in use.


Summary

There you have it–14 Tableau data modeling scenarios and which Tableau product to use for each of them. As a companion to the blog, I’ve created a Tableau Public workbook to act as a cheat sheet.

 


 

My book, Data Modeling with Tableau, covers these and many other use cases and scenarios. In addition to modeling in Tableau Desktop and Prep Builder, it covers virtual connections, securing and sharing data models, creating models in the Tableau web interfaces, data model considerations for Ask and Explain Data, and data catalog and lineage. It is now available on Amazon in e-book and print formats.


 

Thanks for reading. If you have any questions or thoughts, feel free to drop a comment below and we will be glad to address them.

 

Kirk Munroe

February 20, 2023

No comments:

Powered by Blogger.