Ads Top

Creating a Sankey Funnel in Tableau


I’ve become fairy well-known in the Tableau community for my sankey templates, but everything I’ve created was based on the work of others ahead of me. In particular, Jeffrey Shaffer, who was the first to build a sankey in Tableau, and Olivier Catherin, who built upon Jeff’s work to create the polygon-based sankey that most of us use today. Of course, there are many others who have innovated with sankeys—Chris Love, Alexander Mou, Chris DeMartini, Ian Baldwin, and Merlijn Buit—but the work of Jeff and Olivier has been the basis of most of my templates. That being the case, I was thrilled to finally meet Olivier at the 2019 Tableau Conference and for all three of us to take a photo together.


Left to Right: Jeff, Olivier, Ken (Yes, it does appear that I have my eyes closed)

Olivier was even kind enough to bring us t-shirts and laptop stickers!


So, fresh off of spending some time with two fellow sankey-lovers, what better blog to write than one about a new type of sankey—what I’m calling a sankey funnel.

Sankey Funnel
So what is a sankey funnel? The idea first came from Terry Dehart and the Progress Bible team. He had used my multi-level sankey template to build something like this (I’m using some sample lead generation data, but the concept was the same).


While this wasn’t bad, what he really wanted was for the flows to end at appropriate spots. For example, after Rejection Reasons A, B, and C, the lead is dead, so there is no need to flow to the next step. So, ideally, we’d have something more like this:


The removes the visual clutter of the first version, making it much easier for us to focus on what’s happening as we flow from left to right. I’ve chosen the name “Sankey Funnel” because this sankey works in a similar fashion to a funnel chart, while allowing for a bit more detail about each phase.

After working with Terry a bit, we were able to develop a method to hide the unnecessary flows and both he and I were happy. But since that time, I’ve gotten a number of additional requests for sankeys with this capability, so I decided to provide a new template, as well as explain how you can add this functionality to sankeys built using my prior templates.

Sankey Funnel Template
If you’d just like to plug-and-play your data, you can use my template. The Tableau template can be found here: Sankey Funnel Template. And the Excel data file is the same as used in my multi-level sankey template (the link above takes you to a list of files—you’ll want the one called Sankey Template Multi Level.xlsx). The key difference in how you use this template is that some of your “Step” values will be NULL. For instance, the data for the sankey shown above looks like this:


Notice that there are numerous blanks in the file. So, when you want the flow to stop (for example, after Rejection Reasons A, B, and C), you just leave the rest of the steps NULL (blank in Excel). Calculations within the Tableau template will then take care of the rest for you. Other than this, the process for using the template is exactly the same as the multi-level template (See the How to Use The Templates section of the multi-level sankey blog).

I should also note that your NULLs can appear at the beginning as well. By doing this, you can have new values added into the flow in the middle of the process. For example, you could create something like this:



Adjusting an Existing Multi-Level Sankey
If you’ve already used my multi-level sankey template but want to apply this technique to it, you have a couple of options. First, you could put your data into the template as explained above. If your sankey is relatively simple and you haven’t built up a lot of extra functionality around it, then this might be the best approach. But, if you feel it will be too much work to create from scratch, the following steps will guide you through updating your calculated fields to hide the NULL flows. Before doing that, I need to share some of the underlying details of the calcs used in the workbook:

  • Each “Bar” sheet is driven by a calculated field called N1 Bar Position(for the first bar), N2 Bar Position (for the second), etc.
  • These calculations are based on additional “helper” calculations, which are all then grouped together in one of the Bar folders.
  • Each “Curve” sheet is driven by a calculated field called Curve 1-2 Polygon, Curve 2-3 Polygon, and so on.
  • Like the bar calculations, these also have helper calculations and are grouped together in one of the Curve folders.
  • Since the N Bar Position and Curve Polygon calculations drive the plotting of the bars and curves, our goal is to modify these such that, when the value is NULL, we return NULL so that nothing is plotted.
  • Because the curves connect two bars, we need to check both the starting point and the ending point for NULLs.
  • Finally, my approach accounts for both NULL values and empty strings (“”). Technically, these are two different things, but I’ve included both to reduce any confusion.


With the above out of the way, here’s how you can modify your calculations to account for NULLs:

1) Modify all N Bar Position calculations, adding an IF statement to check for NULLs. For example, N2 Bar Position will look like this:


Note: The statement after the ELSE is the same as what is already there in the template, so no changes are required except wrapping it in the IF/ELSE statement.

The above changes will need to be made to all N Bar Position calculated fields, swapping out Step 2 for the appropriate step.

2) Make similar changes to the Curve Polygon calculated fields. For example, Curve 2-3 Polygon should become.


Like the Bar Position calcs, the only change here is to add the IF/ELSE statement. It is, however, critical that the IF statement check for NULLs/blanks in both the source step (e.g. Step 2) and the target step (e.g. Step 3). Similar changes must be made to all of the Curve Polygon calcs.

As always, thanks so much for reading! If you have any thoughts or comments, please leave them in the comments section below.


Note: On January 6, I posted a blog which details a new approach to drawing sankey curves which ensures that the width of flows remain constant from start to finish (rather than narrowing in the middle). I suggest transitioning to this new method to ensure greater analytical integrity. Equal-Width Sankey: A New Approach to Drawing Sankey Curves


Update February 4, 2020: One question I get pretty regularly is how you can extend the template so that it has additional flows. I won't go into a lot of detail here, but the process basically entails the following:

1) Add a new "Step" to the spreadsheet--you can just call it Step 6 for consistency. 
2) In Tableau, copy the "Bar" calculations. For instance, copy the calcs in the "Bar 5" folder to create "Bar 6" calculations. Then edit each of those new calculations to refer to the "6" version. 
3) Similarly, create copies of the calcs in the "Curve 4-5" folder to create "Curve 5-6" calculations. 
4) Copy the Bar 5 sheet to one called Bar 6 and Curve 4-5 to Curve 5-6, then edit the fields used on those sheets to use the Bar 6 and Curve 5-6 calculated fields.  

A note of warning: Often, when you copy the sheets then switch the calcs to use the new ones you've created, you may find that one of the table calcs turns red, indicating an error. If you hover over the calc, it will say that a field is missing. Despite all your efforts, you won't find a way to fix it. I've found that advanced table calculations, especially those with lots of nested calcs, sometimes get errors like this even though everything looks good under the covers. My best guess is that there is some missing pointer somewhere that causes it to get confused. The only way I've found to deal with this is to right-click on the red pill then set it to compute using some field (any field will do). You then have to go back and edit the table calculation, setting all the nested calcs to compute as desired. It can be a bit of a painful process, but it works.


Ken Flerlage, December 1, 2019


21 comments:

  1. Hi Ken, thanks a lot for this post--it's tremendously helpful! I am trying (with moderate success) to adapt your approach and your templates to my own dataset. I have 3 problems in particular, and would appreciate any tips you may have:

    1) I have 7 Steps in my Sankey Funnel, and have successfully created sheets for Bars 6 and 7. I can't do the same for the Curve 5-6 and Curve 6-7 sheets, as 'The table calculation requires a field that is missing'. I have updated the Colour pills and the Curve Polygon table calculation to compute using Steps 5 and 6, and 6 and 7 respectively, as I did in the Bars sheets, but this doesn't fix the problem.

    2) In a similar way to how there are rows exiting the funnel at each step (like 'Rejection Reason A' and 'Not Qualified Reason A' in your example), I am also trying to have rows joining midway along the funnel (i.e., new customers being introduced in Step 3). In my dataset, these rows currently have nulls for Steps 1 and 2 and non-null values from Step 3 onward.

    3) Is there a way to have one box at the final stage of the funnel 'holding' or aggregating all the customers at each stage lost through the funnel by bypassing the intermediary stages? Put differently, if I have 60 customers dropping out at Step 2, 40 at Step 3 and lose no further customers in the funnel, I'd like a box with those 100 customers without them having to move through red boxes at Step 4, Step 5, Step 6 and Step 7.

    Thanks in advance!
    Keith

    ReplyDelete
    Replies
    1. Hey Keith. Any way you could shoot me an email? These comments just don't allow enough flexibility. My email address is flerlagekr@gmail.com

      Delete
    2. Ken,

      I'm getting the same error that Ken is getting in #1 above. I keep getting the error that the curve #-# Polygon field is missing a field that is required. I even tried to re-create your exact tabs for Curve 1-2 and Curve 2-3, but continue to receive the same error. Is there something we are missing? Maybe a setting that isn't obvious? Thanks.

      Delete
    3. I've found that advanced table calculations, especially those with lots of nested calcs, sometimes get errors like this even though everything looks good under the covers. My best guess is that there is some missing pointer somewhere that causes it to get confused. The only way I've found to deal with this is to right-click on the red pill then set it to compute using some field (any field will do). You then have to go back and edit the table calculation, setting all the nested calcs to compute as desired. It can be a bit of a painful process, but it works.

      Delete
    4. Ken - Thanks for the feedback. I tried an individual field, but it still wouldn't work. I finally tried at the table level, which turned the pill from red to green. I was then able to go back and reset each of the nested calcs again with success.

      Delete
  2. Hi Ken, I have downloaded the Sankey funnel template and the multi-level dataset to see the funnel in action. But the multi-level dataset is not the same as your screenshot above. The dataset I downloaded has values A-Z across all steps, rather than Lead, Opportunity, Rejection, etc. Can you put the Sankey funnel dataset out on the share drive?

    ReplyDelete
    Replies
    1. Sure. I uploaded the file here: https://www.amazon.com/clouddrive/share/0pb9r9EVnY0sFoqDZeqPXAHVGbBPNJt08YFphBpeTnQ

      Delete
  3. Hi Ken, thanks for your guidance to build a Sankey funnel! Im interested in plotting a session funnel and there are optional steps that might be skipped by some users, how can I plot this situation in a multiple Sankey funnel? In your example: Connecting 'Leado' directly to 'Customer Retained' without intermediate steps. Thank you so much!

    ReplyDelete
    Replies
    1. I don't have a great solution for that, unfortunately. One option would be to have "Customer Retained" in multiple "steps". That wouldn't be great, but might be a workable solution.

      Delete
    2. Thank you Ken! Is there any way to calculate the % customers at step 1 that reached step 2 in order to show it as a label in each bar? Thanks again!

      Delete
    3. Yes, that would definitely be possible. Any chance you could send me a sample workbook? flerlagekr@gmail.com.

      Delete
  4. Hi Ken,

    Thank you for this amazing visualization, it is quite useful for plotting streamlines/pathways. I have a 5-Step Sankey, with data refreshing every day. However with every refresh as the data changes, color coding of the bars and curves also goes back to the default tableau color scheme. This seems to be more of a tableau problem -https://kb.tableau.com/articles/issue/colors-change-after-replacing-data-source?lang=de-de. Would you have any recommendations to deal with it ?

    Thanks,
    Vishakha

    ReplyDelete
    Replies
    1. That should not happen when you simply refresh the data. Are you just refreshing or replacing the data source? Any chance you could email me at flerlagekr@gmail.com?

      Delete
  5. Thank you Ken, just refreshing data source works perfectly fine.

    ReplyDelete
  6. Hi, if I want to add more levels, do I have to change anything in Model sheet in datasource?
    Thanks in advance

    ReplyDelete
    Replies
    1. Nope. That will stay the same. You just need to create the calcs and sheets as detailed above.

      Delete
  7. Hi Ken - Is there an order to follow when adjusting the curve table calculations? When I come across the 'missing field' issue and select a random "compute using" then edit table calculation to specific dimensions step x, step y the end result varies depending on what initial "compute using" I selected, none of which actually give the correct result for that curve. I tested this on one that was already working in the workbook and got the same result. No idea how to correct it :(

    ReplyDelete
    Replies
    1. The initial "compute using" that you choose should have no impact on the results as you're just going to be changing all of the table calcs. My guess is that you're not changing all of the nested calcs. Be sure to select each nested calc in the drop down, then change the way those compute as well.

      Delete
  8. Hi Ken - Do you have a template of your workbook with NULLs at the beginning of the funnel? The one pictured with Step 2A, Step 3A, Step 3B. When I have NULLs at the beginning I lose the entire first bar. I'd love to see how you built that one!

    ReplyDelete
    Replies
    1. Why would you have NULLs at the beginning? Would you be able to email me with further details? flerlagekr@gmail.com

      Delete

Powered by Blogger.