Whether you’re specifying physical joins or leveraging relationships and letting Tableau’s new data model determine which joins to create, joins are a powerful and vital tool in Tableau. They allow you to bring together data from multiple tables and analyze them as one singular result set. In fact, it’s pretty rare that I ever create a Tableau data model that doesn’t bring together multiple tables using joins/relationships.

While typical joins are quite common, I’ve found that we can use joins in some very creative ways in order to better prepare our data for analysis and visualization. In this blog, I’m going to share three creative uses of joins which can be incredibly valuable and prevent the need for complex calculations.

Get Data from a Prior Row/Partition

One common Tableau problem I see is when you need to perform some sort of calculation across multiple rows or partitions of data. For example, the following table shows sales, by month, for three customers.

 Period Customer Sales 01/01/2020 Søren Kierkegaard 83,689 02/01/2020 Søren Kierkegaard 67,590 03/01/2020 Søren Kierkegaard 60,582 04/01/2020 Søren Kierkegaard 75,482 05/01/2020 Søren Kierkegaard 72,009 06/01/2020 Søren Kierkegaard 64,886 01/01/2020 Albert Camus 76,481 02/01/2020 Albert Camus 75,519 03/01/2020 Albert Camus 49,479 04/01/2020 Albert Camus 15,754 05/01/2020 Albert Camus 50,847 06/01/2020 Albert Camus 26,576 01/01/2020 Jean Paul Sartre 47,474 02/01/2020 Jean Paul Sartre 89,102 03/01/2020 Jean Paul Sartre 62,544 04/01/2020 Jean Paul Sartre 47,535 05/01/2020 Jean Paul Sartre 83,919 06/01/2020 Jean Paul Sartre 49,783

Let’s say that I want to show the increase in sales from month to month for each customer. The typical approach to this is using LOOKUP to get the previous row’s value.

Difference

// Difference between this month and previous month.

SUM([Sales]) - LOOKUP(SUM([Sales]), -1)

This works pretty well in when the data is in a table, but this solution can start to break down pretty quicky. LOOKUP is a table calculation so it’s extremely sensitive to the dimensions on the view. Additionally, since this field is a table calculation, any subsequent calculations that use this field must also be table calcs and will have the same dependency upon the dimensions on the view. Anyone who’s found themselves in this situation knows that it can get pretty complex.

Ultimately, this would be much easier if we could make these row-level calculations, as we could then completely eliminate table calculations, providing us with much greater flexibility. To do that, we’d need a data structure like this:

 Period Customer Sales Previous 01/01/2020 Søren Kierkegaard 83,689 02/01/2020 Søren Kierkegaard 67,590 83,689 03/01/2020 Søren Kierkegaard 60,582 67,590 04/01/2020 Søren Kierkegaard 75,482 60,582 05/01/2020 Søren Kierkegaard 72,009 75,482 06/01/2020 Søren Kierkegaard 64,886 72,009 01/01/2020 Albert Camus 76,481 02/01/2020 Albert Camus 75,519 76,481 03/01/2020 Albert Camus 49,479 75,519 04/01/2020 Albert Camus 15,754 49,479 05/01/2020 Albert Camus 50,847 15,754 06/01/2020 Albert Camus 26,576 50,847 01/01/2020 Jean Paul Sartre 47,474 02/01/2020 Jean Paul Sartre 89,102 47,474 03/01/2020 Jean Paul Sartre 62,544 89,102 04/01/2020 Jean Paul Sartre 47,535 62,544 05/01/2020 Jean Paul Sartre 83,919 47,535 06/01/2020 Jean Paul Sartre 49,783 83,919

Here’s where we can get a bit creative with our joins. We’ll drag our table into the data pane again.

To help distinguish the purpose of the two logical tables, I’m going to rename the first “Current” and the second “Previous”. Now I’ll create a relationship between the tables using Customer Name and Period. For Period, we’ll create a relationship calculation on the left side:

And we’ll relate this to the normal Period field on the right side.

This basically instructs Tableau to pull the previous month’s data, for each customer, in the second table. Now we’ll see the following fields:

I’ll rename Sales (Sales by Month1) to Sales – Previous so that it’s clearer what this represents.

We now have both the current and previous sales amounts on a single row, so we can perform simple row-level calculations to get the difference.

Difference

// Difference between this month and previous month.

[Sales] - [Sales - Previous]

Because this is not a table calculation, it is not dependent upon the view. We can remove dimensions, perform other types of aggregations, add totals, etc. without the complexity of table calculations. In the end, this is a much more flexible solution than using LOOKUP.

Running Totals

Another common issue I see is the need to show a running total of some measure. The typical solution to this is the use of a running sum table calculation.

Running Total

// Running total of sales.

RUNNING_SUM(SUM([Sales]))

This has a similar problem to the first use case—RUNNING_SUM is a table calculation, which will cause additional complexity if you need to filter data, remove dimensions from the view, or perform additional calculations. For a example, let’s say you want your users to be able to see the running total for April only. You add a filter on Period and filter down to April.

The running sum requires the previous rows to be visible on the view in order to accurately compute the result. Because we’ve filtered out those previous rows, the table calculation only sums April. At this point, to solve this problem, we have to trick the Order of Operations by creating a LAST() = 0 calculation and using it as a filter. Yuck!!

Once again, it would be nice if we could eliminate the use of table calculations and convert these into row-level calculations. As you may have already guess, we can do this using a join. Like the first use case, we will drag the table into the data pane a second time. We’ll relate on Customer Name and Period, but we’ll set up the Period relationship with a >= comparison.

This will join all previous rows to the current row meaning that Sales from the second table will effectively be our running total—we don’t even need a calculation. I’ll rename this field Running Total and add it to my view.

And, since this is now just another field on each row, we can filter to April without having any impact on the running sum.

There are a few of things I should note about this solution. First of all, if your starting point for a running sum needs to be flexible, then this solution may not work well. For example, if you wanted your user to be able to choose the start and end data of the running total, then the running sum table calculation might be a better approach.

Second, if you perform this technique using a physical join, instead of relationships, then you will be duplicating your records quite a bit. For example, this 18 row data set would become 63 rows. So, if you’re dealing with millions of rows, you’ll need to be careful. Another issue with physical joins will be that your current Sales measure will be duplicated multiple times, so when you sum the measure, the values will also be duplicated. To address this, an LOD could be used. However, if this is done via relationships, Tableau will generate SQL in such as way that you will not get duplicate records, making relationships the better approach.

The final thing I want to note is that this solution can be very useful if you intend to animate a chart using Pages. I’m not sure of a way to show running sums using pages, but since the join produces a row-level measure, we can easily show the running sum over time using Pages.

Cross-Joins

My final creative use for joins is one I’ve talked about previously, cross-joins. As I discussed in SQL for Tableau Users Part 2: Combining Data, a cross-join will join each record from one table to each record from another. For example, let’s say we have two tables, Shapes and Colors and we wish to get all possible combinations of shapes and colors.

 Shape Circle Octagon Square Triangle

 Color Blue Red Yellow

To perform a cross join in Tableau, we’ll drag both tables to the data pane.

Then we’ll create relationship calculations on both sides with just the value 1 (or any value, for that matter, as long as both sides are the same). This will essentially join each record from the first table to each in the second.

Now we can list each combination of shapes and colors.

This has a ton of use cases. It’s used frequently when you need to densify your data, in order to do things such as draw curves. It’s also used in cases where you wish to create a data scaffold. I’ve written about both of these topics in the past, so I won’t go into any further detail here. If you’re interested, please see An Introduction to Data Densification and Creating a Date Scaffold in Tableau.

Wrap-Up

I personally love these three creative uses for joins and I leverage them all the time. They can address many common problems and can often eliminate a lot of complexity in your Tableau workbooks. I hope you find them useful as well.

I wish to note one last thing before I wrap up this blog. In all the examples I’ve shared above, I used Tableau Desktop to perform the joins via relationships. You can, of course, perform all of these using physical joins or within Tableau Prep (which uses physical joins). Just keep in mind that these approaches will produce duplicate records, which could lead to other complexities. Whenever possible, I think it’s best to leverage relationships, but there are times when you might prefer to perform your data prep up front via Tableau Prep or you may find that you want to force the use of physical joins. In those cases, these same techniques can be applied—just be aware of some of the drawbacks that I noted previously.

Ken Flerlage, August 2, 2021