Note: On January 6, I posted a blog which details a new approach to drawing sankey curves which ensures that the width of flows remain constant from start to finish (rather than narrowing in the middle). I suggest transitioning to this new method to ensure greater analytical integrity. Equal-Width Sankey: A New Approach to Drawing Sankey Curves

Recently, Rodrigo Calloni mentioned to me that he wanted to create a visualization for the upcoming 2018 FIFA World Cup. His idea was to create a sankey diagram showing the top 10 countries and the number of goals scored in each World Cup. But he hadn’t previously created a sankey and wondered if I could help. Fortunately, I had previously built a nice template for sankeys from the work by Olivier Catherin and Jeffrey Shaffer which I provided to him and resulted in the following visualization (click on the image to see the interactive version).

Templatizing Complex Charts
Unfortunately, some chart types, including sankeys, can pretty difficult to create, even when you have a detailed tutorial. They tend to include data scaffolding, data densification, unorthodox joins, table calculations, trigonometry, etc. So when I create one of these, I always templatize them as much as possible. These templates typically consist of a relatively simple Excel spreadsheet into which some data can be plugged and a Tableau workbook, which simply connects to the spreadsheet. In the end, the process becomes relatively plug-and-play.

The complexity of these charts can often be a deterrent for some people. So, in this series of blog posts, I’ll be sharing some Excel/Tableau templates for creating three different charts—Sankeys, Sunbursts, and Joy Plots. In each post, I’ll start by describing the chart, detailing its potential use cases, and providing a high-level technical description of how it’s built in Tableau. I’ll then provide as step-by-step tutorial on how to use my templates.

Before I move onto my first chart type, I want to take a moment to encourage you to drill into these chart types further. Part of the reason I am providing these templates is in hope that it will pique your interest in better understanding how they’re built. Taking apart complex charts is a really good way to learn some of the more advanced features of Tableau, so I highly encourage you to take some time to understand how these work under the covers.

Sankey Diagrams
We’re going to start with sankey diagrams. Datavizcatalogue.comdescribes these as such: “Sankey Diagrams display flows and their quantities in proportion to one another. The width of the arrows or lines are used to show their magnitude, so the bigger the arrow [or line], the larger the quantity of flow.” These charts are named after Matthew Henry Phineas Riall Sankey (that's a lot of names!!) who first used this chart type to show energy efficiency of a steam engine as shown below. (Wikipedia).

So, the key use case for sankeys is for showing flow from one thing to another. One of the best use cases I’ve seen is colleges using sankeys to show graduating students’ majors and the proportion of those students who have gone into various professions. Unfortunately, sankeys are often misused in situations where there would be a much better alternative. Sure, sankeys look cool, but it simply isn’t always the best chart type. So, before I go any further, I just want to stress the importance of considering the best chart for your use case before using a sankey. Most of the time, a sankey will not be the best chart and that, in all honesty, is a really good thing because your better choices are most likely going to be much easier to create.

Implementation in Tableau
A number of different people have implemented sankey charts in Tableau. Some approaches consist of lines drawn from one point to another, but the one I’ll be discussing is the polygon sankey developed by Olivier Catherin and Jeffrey Shaffer. Since this method was developed, it’s become one of the most commonly used methods. The chart consists of three primary components. The first two are gantt bars which represent the source and target (from and to). The third is a set of polygons, sized by proportion, which show the flow from one thing to the next. These polygons tend to take the shape of a sigmoid curve—a type of curve which takes on an S-like shape—which are created by leveraging parametric equations. Building these polygons also requires data scaffolding and some complex table calculations, which I won’t go into here.

The Template
Before I share my template, I quickly want to mention the amazing work that Ian Baldwin, of the Information Lab, recently did which shows how to build a Sankey diagram in Tableau without any data prep beforehand. It's a brilliant post and well worth the read.

This sankey template is based directly on the method detailed by Olivier Catherin—I have not added any new innovations to the method. Rather, I’ve simply tried to turn it into a plug-and-play template. So, all the credit for this work goes directly to Olivier and Jeffrey.  I'd also be remiss if I didn't credit my colleagues at Bucknell University. I first came to understand the inner workings of this chart about a year ago when I took apart some of the sankeys used internally.

The template includes two components—an Excel spreadsheet and a Tableau workbook. The goal in developing these templates was to make it as easy as possible to plug in your own data. That being said, it does make certain assumptions. Primarily, it assumes that you already have aggregated your data. Modifying the template to allow for non-aggregated data would not necessarily be difficult, but I won’t be dealing with that here.

The Excel spreadsheet (you can find it here) has two sheets, Data and Model. Modelis used to handle the data densification and parameter values needed to build the sigmoid polygons using parametric equations. If you didn’t understand that sentence, don’t worry—you don’t need to modify this sheet at all; just make sure it’s in your spreadsheet. The Datasheet will be used to populate your pre-aggregated data. It contains just four columns. You can add more if needed, but these are the four that are required by the Tableau template. The columns are as follows:

Link – The purpose of this column is simply to join each row in the Modelworksheet to each row in Dataworksheet for the purpose of creating the polygon curves. But don’t worry too much about this. You simply need to make sure that every row has a value of “link” in this column. Note: Strictly speaking, we could use a join calculation in Tableau to join these sheets together, but for simplicity sake, I often like to include a separate column in my data set.

Step 1 – This column defines the starting point of the flow (i.e. the “from”). So, to use my example of college graduates, this would be student majors.

Step 2 – This column defines the end point of the flow (i.e. the “To”). This would be the college graduates’ professions.

Size – This column contains the aggregated value for the flow. Again using the college student example, this would be the total number of students that majored in one subject (for example, Chemistry) and ended up working in a particular field (for example, Food & Drug).

Each flow from one thing to the other will contain a separate record. Here’s how the sheet looks with some sample data:

Once you have populated the Data sheet, then you need to connect it to Tableau. Start by downloading the template Tableau workbooks, which are on my Tableau Public page. I’ve actually created two Tableau workbooks—one for horizontal sankeys and one for vertical sankeys. Then edit the data source and connect it to your Excel template. The workbook should update automatically to reflect your data. From here, you can do whatever you like with the chart—change the colors, add filters, update tooltips, etc. just as you normally would. Here’s how it looks with the sample data shown above.

And that’s all there is to it. If you use this template to create your own sankey, I’d love to see it. But please don’t create one simply because you can. Be sure to closely scrutinize your use case and make sure it’s the right chart type first.

Update April 14, 2019: I've just posted a set of six new variations of this sankey template. For more, check out the post: More Sankey Templates: Multi-Level, Traceable, Gradient, and More!!

Ken Flerlage, April 13, 2018

1. Hello,

I can't open the twbx file in Tableau 10.2. Do you have an idea to help me please?
Thanks a lot.
Fred

1. Here's a version in 10.2 format: https://www.amazon.com/clouddrive/share/kzUoHhPFFHmDQzQbvY4DcaCCrBzhfbud62psloc4Zwt

2. Thank you for this! Is there a way to add another flow to this diagram? I was able to get a sheet for the 3rd bar working but not for a second curve. Any help would be much appreciated!

1. Yes, but you'd have to duplicate all of the calculated fields used to create the curve.

2. If I am to understand this correctly - one would need to duplicate the calculated fields. Is recreating the bars (bar3, bar4, bar5,... ) also necessary?

3. Yes, unfortunately, you'd need to duplicate just about all of it.

4. Not a problem. Would I be able to email you the work I am doing to confirm I am on the correct path?

5. Would you be willing to share how you modified the excel & tableau templates to accomodate additional flows? I'm trying to add 3 more flows & am having a hard time figuring out exactly what i need to duplicate. Thank you!

6. I actually just created a new set of templates, some of which are multi-level, so my advice would be to check those out. You can find them here: https://www.kenflerlage.com/2019/04/more-sankey-templates.html

7. Wow Ken, thank you SO MUCH. This is incredible & exactly what I needed!!

8. My pleasure!

3. This saved me so much time! Thank you!

4. Looking forward to utilize this template!!

5. Thank you so much for this! I was able to show pre and post merger account locations very easily with this tool!

1. Great. Glad to hear it!

6. When I load my data in, t(Model) is colored red, which makes Sigmoid invalid, along with 5 other dependencies on Sigmoid. Path(Model) is also marked red and says it cannot be found in the database, the same as t(Model). How can I fix this issue?

Path(Model) is also marked red and says it cannot be found in the database, the same as t(Model).

1. Would you be able to send me an email with the details and perhaps a sample workbook? flerlagekr@gmail.com

2. Hi Ken, thanks for your response. I ended up having to right click on t(Model) and use Replace References and then did that for the Curve A Polygon as well, I think. I can't remember, but I was able to get it to work somehow.

Alternatively, I downloaded the horizontal Sankey template again and made a separate Excel file to load in. This time there was no need for any weird fixing. It worked perfectly. I must have done something wrong the first time.

Thank you so much for your response and this awesome template! It could not be any more convenient and does exactly everything I need, especially considering how I had no luck with any of the other tutorials I saw online.

3. Great. Glad you figured it out! And glad you found this useful.

7. This template was the most useful thing I've found yet when attempting to create a simple Sankey flow chart. I haven't got it working quite the way I want yet, but I have a very rough prototype that I can show to stakeholders to get their buy-in for more development work to be done. Thank you very much for posting these templates!

8. Just let you know that your templates are not working now (tableau 2019.1) - something to do with extract??? Thanks for supplying updated templates.

1. What version of Tableau are you using?

9. Ken, this article seems to be interesting but I did follow your advice by checking that informationlab article thingy. It's flawed as First step 9 seems to work but STep 10 is not working and is not producing sankey chart. Rather, it is just producing stacked bar charts. Do you know why or how? I would love to know why or how so I can progress to your article as your article is a bit advanced for me to learn first unless I try with informationlab and see how they tackle it before I take your version. Thanks for advice.

1. Might be more than we can address here. Could you email me at flerlagekr@gmail.com

10. Hi Ken, how can i change your tableau horizontal template from polygon to just lines? I am basically looking to create a multi level decision tree and so i am not really bothered about the flow size...Will really appreciate your inputs on this..

1. Hmmm, that's a tricky question. I'd probably need to know more about what you're trying to do. Any chance you could email me at flerlagekr@gmail.com?

2. emailed you

11. Hi Ken,

This is great- thank you! I was able to create a horizontal Sankey with one issue: the text in my "target" column is not displaying, no matter how large I make the display area. It is completely grey. Do you know what might be causing this?

1. Can you resize the sheet to make it wider? Or change the text color? It could be a lot of things. Perhaps you could email me with further details? flerlagekr@gmail.com

12. Hi Ken,

My colleague sent me this link and it seems your approach is an great starting point to learn Sankey.

The table "[TableauTemp].[Mode]\$" does not exist.

any idea what is the cause? This is after initial error where it ask for a newer version, which I download a trial 2019.1

Harry

1. Were you able to get it working then? If you'd like, I could downgrade it to a previous version. Could you email me at flerlagekr@gmail.com?

13. Nice blog.
Really helpful template, used it and tweeted it:

1. Glad it was useful, Melissa. FYI. I just posted a new set of sankey templates today, so be sure to check them out: https://www.kenflerlage.com/2019/04/more-sankey-templates.html

14. Hey Ken,
The tool is very helpful to edit and get it up and running. Thank you so much.

I want to sort my step 1 and step 2 values. I'm trying to understand the Curve A polygon field in rows on the Sankey chart but it gives me an error after clicking on edit table calculation saying "Invalid set function". Could you please help me on how to see the nested calculations under the rows "curve A polygon"?

1. This kind of troubleshooting is always difficult without seeing a workbook. Any chance you could send me an email with the workbook? flerlagekr@gmail.com

15. Hi! I'm having some trouble attempting to modify the calculations for the polygons and then re-rendering the charts (the idea is to allow for each "arm" to have multiple parts to add an extra layer of complexity for flows from the same source:target, e.g. to differentiate between fruits/vegetables compared to total produce). I think I have an idea of how to approach this (it will require making copies of the calculated fields and calculating the maxes/mins based on an "intermediate" step as opposed to the normal step1/2). However before I get there I'm having trouble just replicating the sankey flow itself.

Even from the original horizontal example template (I'm on 2018.3), if I try to create a new sheet and then input the same pills+marks you have on the example, I get errors in the "N* position min" and "curve a min" and "curve a polygon" calculations when I add them to the viz (says that they are missing required fields, but in the table calc options nothing is red, and I've tried adding every mark I could think of to detail to no avail). Any idea what could be wrong?

The steps I'm following are:
* new sheet
* add step 1, step 2 (detail) (with sorting configuration Field/Asc/Size/Sum)
* add min or max (detail)
* set Marks type to Polygon
* add t (model) as dimension to columns
* add Curve a max and curve a min (pill is red)
* add curve a polygon to rows (pill is red)

Any help would be greatly appreciated! Thanks for the awesome work.

1. What you're trying to do sounds pretty similar to the traceable (or aggregate traceable) sankey template I documented here: https://www.kenflerlage.com/2019/04/more-sankey-templates.html. If so, check that out and see if it helps.

I'm happy to help you get through these other issues, though it'll be difficult via these comments, so I'd suggest you email me at flerlagekr@gmail.com. Sankeys are difficult to troubleshoot, so I'll probably need you to include a packaged workbook.

2. Hi Ken,

I think I'll shoot you an e-mail with some questions about re-working the calcs. In terms of recreating a chart I managed to do so only after I did the following (posting here in case anyone else faces the same issue):
* Re-attempted the above, but still got the same errors for anything related to "*min" fields
* However I noted that your "*min" calculations in the new template actually just reference the normal calc (e.g. "N1 position min" = "N1 position"), yet "N1 Position" would show up green as a mark and "N1 position min" as red (no idea what's going on there).
* Replacing references to "N1 position min" to "N1 position" and likewise for other "*min" references seems to have solved it.

16. Calvin FriedrichMay 1, 2019 at 12:00 PM

Ken,

I have your template working with my own data and I love it! I have one question, though. Is there a way to customize the order of the Targets and Sources?

1. Yes, but it is, unfortunately, a little involved because of the complex table calculations. Any chance you could send me an email with a mockup of what you're trying to do? flerlagekr@gmail.com

17. Hello,

Just curious if there was a work around to the number of records as a measure - this seems to be an issue with the Ian, Olivier and Jeffrey models.

1. Not sure I understand the issue. What problem are you running into?

2. Hello,

Thank you for the quick response.

On this "The Information Lab" blog (https://www.theinformationlab.co.uk/2018/03/09/build-sankey-diagram-tableau-without-data-prep-beforehand/) there were a couple of folks who were having issues with the Sankey given that the "Chosen Measure" they selected was the Number of Units or CountD of IDs. Read every thread and there was no solution presented and this is the same issue I am having. I work for a hospital and basically we just want to use Sankey to chart the patient mix movement or change of a patient's insurance coverage year over year. i.e. Patient X has AETNA in 2016 and switched to MEDICARE in 2017. My dimensions will be the YEAR of coverage. Maybe Sankey is not the best approach?

3. I'd need to know more about the use case to be sure if sankeys are a good option or not, but I see no reason why you wouldn't be able to use Number of Records. In fact, I just tried it with my template and it worked. Any chance you could send me an email at flerlagekr@gmail.com? I realize you can't share real data with me, but if you could mock up some fake data, then I'd be happy to show you how to do this.

18. Hi Ken,

Awesome stuff! I'm really excited about using this for my data. I have a 200k row data set that this would be perfect for. How can I modify this to work with that dataset? I have 2 dimensions and a measure in mind to use, but need that level of detail for other parts of the report.

Any advice would be greatly appreciated!

1. Are you able to put the data in a structure like the Excel template?

2. Hi Ken,

The data is in that same format, 2 dimension columns and a measure column. Essentially, one column is an employee level column, like vp, director, etc. and the 2nd dimension column is a 'result' column. The measure column is a count. I've added a "Link" column as well. I'm just worried that joining on link to 200k rows will inflate the data too much and cause performance issues.

3. Yes, that will cause the data to grow significantly. Is it possible to pre-aggregate it for the purpose of creating this specific chart? You could then create a second data source with all the detail.

4. The data was actually around 600k rows and I just went with it...ended up with around 60 million rows of data, but it wasn't actually that bad from a performance perspective!

Thanks again!

5. Great. Glad to hear it!

19. Instead of replacing the data source, I believe it'd be possible to just click on the 'data source' tab at the bottom left corner and when it prompts to find the file you could just navigate to the updated excel sheet with new metadata right? Saves a bit of time making all the calculated fields again.

This is so great and although I haven't dived in and played around with the template fully yet I do plan on testing it out with my own dataset soon. Looking forward to it, thanks Ken.

1. Yes, if you have data in Excel, then you're correct. But, if your data is in a different format such as a database, then you'll have to do the replace thing.

20. Hi Ken,

Thanks for this amazing template - it is exactly what I need. I have recreated everything using my data and checked all the calcs and the sorts etc. a few times but my second bar doesn't line up with the curves (the first does). Any inkling as to why that may be? Thanks, Patrycja

21. Never mind about my last comment - I figured out the problem. One of my table calcs for the curve polygon was directed at the wrong dimensions.

22. Ken,
Can you provide details on the values portion of the template? In my data set I have assigned a percentage of contribution to each step. It is more of measure of impact at each step with the parts within each step totaling to 100%. I ended up adding up the percentages for each ID and using this for the size.

Thanks,
Luke

1. Not sure I follow you exactly, but the Size field is meant to be used for whatever measure you're wanting to visualize. Ultimately, the workbook will sum everything up and break into part-to-whole relationships automatically, so it's not exactly necessary for you to do that yourself. But if your numbers are percentages, that should work perfectly fine.

23. Ken, I'm super unfamiliar with Tableu but have become familiar with your template at the top of this page, and I wanted to ask if you might be able to provide some guidance with regard to how I might be able to tie this to a graphic?

I'm trying to show how involvement of levels of the spine change with respect to an exposure event. Have you ever seen the sankey diagram used in something like this? Thanks again!

24. How to highlight only the relevant sankey polygons. eg on the left there are 2 columns and on the left there are 5, the first left column only connects with 4 column on the right. So how to highlight only the 4 columns which are relevant on the right if I select the first column on the left?

1. Might be difficult to address via these comments, but please feel free to send me an email at flerlagekr@gmail.com

25. Brilliant! Thank you!

26. Hi Ken--great post and this template has been a life saver [and provided some kudos to me with my stakeholders--thanks for that:)] but i have one question. When i use volume or number of say deals this works great as it is one to one--always the same deal, but if for instance I wanted to look at a Sales Open Pipeline from date 1 to date N , the values of the deals will change--Date 1 Deal A was worth 100K but by Date N it had changed to 200k etc--can the Sankey take this into account if the template is using one value field?
Thanks again
Greg

1. This is an interesting request. Might be able to do something like that. Any chance you could email me so we can take it offline? flerlagekr@gmail.com

27. Hi Ken,
Will do with an example or 2.
Cheers
Greg

28. Hi Ken,

I went through your website on Sankey dashboard using Tableau. I would like to say thank you for your great work.

I have implemented all the calculated fields from the template workbook in my main dashboard and used the excel sheet (model) to join with my master data set based on the common field named 'Link'.

All works well but there is a problem as each record is getting multiplied 98 times and inflating my dataset. I already have million of records in my original dataset and with this Link join it's making the extract refresh impossible to work.

Is there any way I can avoid multiplying my data. Please advise.

Piyush

1. Just addressed this with you via email, but I'll post an answer here as well, for the benefit of others. You'll need to find a way to aggregate your data before bringing it into Tableau. Sankeys, by their nature, are meant to be aggregated. Even if you wanted to show each individual record in the sankey, it wouldn't really be usable because the flows would be so small. So I'd advise you to aggregate it using Tableau Prep, SQL, or whatever you have available. That should reduce the size of the data enough to make it manageable.

2. In order to limit the number of flows would it be possible to set a top N filter for both the left and right hand sides of the sankey, showing N flows and grouping all the remaining flows into an 'other' category?

3. That would probably be doable. Happy to help if you could send me an email. flerlagekr@gmail.com

29. Hi Ken – Thanks for putting this together. I was able to rebuild this from scratch with just 1 issues: Whitespace is being added between each "arm" instead of just between the "Step 1" values. Is that a common problem? Seems like I'm doing something wrong with the table calculations, but they seem to match your template. I can send my workbook if you don't have any quick suggestions. Thanks!

1. Definitely sounds like a problem with how the table calculations are being computed. Make sure you check all the table calcs (not just on the rows shelf, but also on the marks card--size, for example). And check all of the nested calcs to make sure they are all good as well. On each nested calc, also make sure the "At the level" selection matches the original. If none of those fix the problem, then feel free to email me at flerlagekr@gmail.com.

30. Hi Ken,

Thanks for putting this one for us , but i am unable to download the raw data from the amazon dive.

Thanks

1. Send me an email and I'll send you the file. flerlagekr@gmail.com

31. Hi Ken,

I used your template as a proof of concept and it worked out great! We are replicating the excel template in our SQL Warehouse and I tried connecting the workbook to our tables and it didn't work. It opens a file explorer option to search for a new excel file and I can't use the replace option either. Do I need to connect to the tables in SQL in a new data source and copy over the calculations or is there an easier way to do this?

Thanks!

Rebecca

1. I think that would be the best approach. Just structure your SQL data so that it matches the "Data" sheet (field names, etc.). Then join that to the Model sheet from Excel (or you could put that table into your SQL database if you like). Create the joins between the two. Then just copy all the calculated fields from my data source to yours. Then use the "Replace Data Source" option to swap them out. Give that a try and, if you run into any problems, feel free to contact me at flerlagekr@gmail.com.

32. Hey Ken, really appreciate the work you've done.

I have a question. In the dashboard i'm working on I want to add a sheet with a sankey where a user can select what dimensions to compare on the left and right sides based on the values they select in the respective parameter filters. I see above that you wrote that data should be pre-aggregated, but how should I go about doing this when there are many dimensions to choose from?

1. It's hard to answer these questions in this comments section. Would you be able to send me an email? flerlagekr@gmail.com

33. Fantastic article, really opened up my eyes to Sankeys.

I have a situation where a value in the source doesn't have a value for the target - i.e it should be shown in the source but not the target. Currently these values end up in a non defined target column. What should be the approach so these flow lines are not plotted but the source value is still shown. Thanks

1. Check out my funnel sankey blog post. I think this should address your requirement: https://www.flerlagetwins.com/2019/11/sankey-funnel.html

34. Hi Ken - I am in love with the Sankey Funnel and am trying to adapt it to detect process flow fallout. This blog and the templates have been a god send!! The multi-level sankey template has worked great for the bars and curves that were already established, however, when I try to duplicate the measures and/or try to recreate the curves I'm getting an error with the polygon pills. Each time I add the "Curve #-# Polygon" pill, it turns red and says that there is a missing field in the view. I've gone through all of the marks, calculations, and nested calculations. I've also tried to replicate the curve sheets that were already in the worksheet to see where I went wrong, but I can't seem to figure it out. Any advice as to what I'm missing? Thank you!!!

1. That's a common problem. See the very last paragraph on the following where I explain how to address this issue: https://www.flerlagetwins.com/2019/04/more-sankey-templates.html

35. Hello Ken,
Thank you so much for this template, it really helps a lot!
I'm trying to make the same sankey by myself, but I do not understand how the lines from each different source/target can be combined together, while in my diagram, all the lines are equally separated. I also do not know what is the function of the set action Exclusions(Node0, Node1). Could you give me any advice? Thank you so much!

1. Might be best to handle this offline. Can you email me at flerlagekr@gmail.com and share what you have so far? Those Exclusions sets are remnants of some old filter actions and are not needed.

36. Hi,

I can't seem to replicate the Curve a Polygon measure on the rows shelf. When I click into "Edit Table Calculation" on the shelf, for the original measure, it shows Step 1 and Step 2 in the drop down twice. How can I replicate this?

Thanks,
Ryan

1. I'd probably need to see the workbook. Can you email me at flerlagekr@gmail.com?

37. Dear Ken, Thank you so much! I have been struggling to understand the theory, on the other hand, I needed something like this plug and play because of the time constraint!