Split & Pivot Comma-Separated Values


In my time teaching and helping people with Tableau, I often come across situations where some key piece of data is locked in a comma-separated value (it could also be any other kind of delimited field). Take, for example, the following event data. We have two tables. The first—People—contains a list of people who could be invited to an event:



The second—Events—contains a list of events.


We have a single record for each event, but the list of invitees is contained in a comma-separated field called Invites. Ideally, we’d like to break our Event table into separate rows—one for each invitee:



This would allow us to then join Events to People, resulting in a table that improve our ability to analyze the data.


To do this, we need to first split the Invites column, then pivot the results. And that’s what I’ll be showing you in this blog.

Split & Pivot with Desktop
Splitting and pivoting data with Tableau Desktop is a bit hacky and has a few drawbacks. So, if Tableau Prep is available to you, I strongly suggest going that route as it will eliminate these drawbacks. That said, let’s start by doing this in Desktop.

A couple of notes before we begin. First, there are, of course, ways you could perform these operations with some manual work, but my goal here is to find an automated methodone that would allow you to publish a data source to Tableau Server and refresh automatically, without the need for any manual intervention. Second, if you’re not familiar with the concept of pivoting, then I’d recommend reading my blog, 3 Ways to Pivot Data for Tableau before reading the remainder of this post.

The basic steps we’ll take are to first split our field into multiple values, then pivot them. Sounds easy enough, so let’s give it a try. I’ve connected to the Events table. I’ll now perform a custom split on the Invites field.


In the Custom Split dialog, I’ll use a comma as the separator and choose to “Split Off” all columns. Tableau will then create three calculated fields.


Why 3 fields? That’s because the maximum number of values in our Invites field is three. Were there more comma-separated values in one of the fields, then Tableau would create additional split calculated fields.

Now let’s try to pivot on our three new fields. We’ll go back to the data pane, select the three fields, right-click and choose the “Pivot” option.


But, the option is not there. This is because you cannot pivot on a calculated field in Desktop. So, how can we perform this pivot? That’s where we need to get a bit hacky. We’ll start by unioning the table to itself three times—once for each of our split fields. Now, we’ll create the following calculated field:

Invite
// A single invite for each row.
CASE [Table Name]
WHEN "Events" THEN [Invites - Split 1]
WHEN "Events1" THEN [Invites - Split 2]
WHEN "Events2" THEN [Invites - Split 3]
END

This is basically just choosing one of the split fields based on the unioned tables, allowing us to have just one invite for each row as can be seen here:


The last step would be to then join this to the People table so we can get each person’s name. Ideally, we’d be able to use our new Invite field, but Tableau doesn’t allow us to use a named calculated field in a join. So, we need to use a join calculation. To do that, we’re going to have to repeat the logic above, combining everything into a single field. We can do that by copying the script for Invite then replacing each reference to the split columns with the actual logic from those calculated fields:

// A single invite for each row.
CASE [Table Name]
WHEN "Events"  THEN TRIM(SPLIT([Invites], ",", 1))
WHEN "Events1" THEN TRIM(SPLIT([Invites], ",", 2))
WHEN "Events2" THEN TRIM(SPLIT([Invites], ",", 3))
END

We then use this in a join calculation:


And, with that, we have our data in the correct structure:


Whew! That was a bit painful. As you can see, there are a number of drawbacks to this method. One is that it requires you to union your table to itself once for the maximum number of comma-separated values. This is okay if that number is known and is relatively low. If the number of values is unknown, then this could be very problematic. Another issue is that you are duplicating the logic of the calculated field. This is not a huge deal, but it’s something I tend to try to avoid. And, generally speaking, this is not a very automated or elegant approach. So, for a better solution to this problem, let’s turn to Tableau Prep.

Split & Pivot with Prep
We’ll start by connecting to our data and bringing in the Events table. Next we’ll add a clean step and perform the same custom split as we did in Desktop:


That will produce very similar results—three split fields:


We now have to pivot the data and this is where Tableau Prep really shines. Unlike Desktop, Prep can perform the same operations on calculated fields as on regular data fields. Add a pivot step and choose the “Use wildcard search to pivot” option. With this option, we can search based on a field name. The advantage here is that it will be automated—if we eventually have 4 comma-separated values in one of the Invites fields, then Tableau will automatically include that fourth split field in the pivot. We’ll search on “Invites - Split” and Prep will automatically bring in matching fields.


Now let’s add a clean step and take a look at the structure of our data.


We have some fields that are no longer required, such as Pivot Field Names and Invites, so we’ll remove them. We’ll also rename the Invites Split field to Invite. Finally, you’ll notice that we have a blank item in the Invite field. This is because some of our rows only had 2 comma-separated values. So, we can just filter these out (Note: This isn’t technically required as they will be removed when we join to the People table since we’ll be using an inner join).


The last thing to do is join this to the People table using a join step.


Then we can output the file:


As you can see, this method is easier, less hacky, and much more automated than the Desktop approach. Thus, if Prep is available to you, I highly recommend that you use it instead of trying to make this work in Desktop. Your future self will thank you!

As always, thanks for reading! I hope you find these techniques useful in your work. And, if you have any questions or comments, feel free to leave them in the comments section below.

Ken Flerlage, May 4, 2020
 

28 comments:

  1. Hi, i really like this website, lots of things to learn. is there anywhere that i can download the data sources which you are working with.....
    not just for this post but for the other one as well

    Thanks
    Pouya

    ReplyDelete
    Replies
    1. I need to start making these publicly available. In the meantime, if you have any data sources you'd like, please email me (flerlagekr@gmail.com) and I'd be happy to send them to you.

      Delete
  2. Hello,

    I was trying to follow along this the process in Tableau prep.

    However, all of my data is in one table, so after I have split and pivoted my data I cannot join the two tables as you have.

    Does this mean I should split my Excel file into multiple sources before I start this process.

    In my table I have 4 columns that need to go through this process, do I do each one individually and does this require splitting these 4 columns off in Excel to have a total of 5 tables?

    ReplyDelete
    Replies
    1. This sounds like a somewhat complex scenario. If you have 4 columns that all need to be split and pivoted individually, then that could be tricky. You'll need to do this process on each of the fields, then join them back together at the end. To do that, you'll need each record to have some sort of unique ID. If you don't have that in your data, you could generate one using ROW_NUMBER(). I'd be happy to help with this further. Feel free to email me at flerlagekr@gmail.com.

      Delete
  3. What is [Table Name] used in Invite calculated field?

    ReplyDelete
    Replies
    1. That is created when you union multiple tables in Tableau.

      Delete
    2. But how and where is that column [Table Name] created?
      First it doesn't appear (after the "custom split") but then you use it when you're creating the calculated "Invite" field

      Delete
    3. When you union tables together, Tableau automatically creates the "Table Name" field, which indicates which of the unioned tables you're working with.

      Delete
    4. which tables did you union to get the column[table name]?

      Delete
    5. Can you please elaborate how you union the Events table in order to get the [Table Name]?

      Delete
    6. See https://help.tableau.com/current/pro/desktop/en-us/union.htm

      Delete
  4. Is there some method to pivot rows to columns (with designating columns as id columns) in Tableau desktop? If yes, please share the methodology.

    ReplyDelete
    Replies
    1. I don't quite understand what you're asking. Can you clarify?

      Delete
  5. Hello, Thank You in advance I am getting an error when trying to do step to have just one invite for each row: the case statement throw an error saying: "Expected type table found string. Comparison in "CASE" expression must betable type.

    I am trying to split a column that has mutilpe string tags attached to each row for e.g:
    Tags
    ['children', 'creativity', 'culture', 'dance', 'education', 'parenting', 'teaching']

    ReplyDelete
  6. Hi Ken

    I am working on 4 tables at a time using the relationship method. The Tableau Prep worked well for me until I was asked to have my dashboard connected to a live data source which refreshes the data after a certain interval. In this case, Tableau Prep won't work anymore.

    Will your first suggestion work well with live data source having multiple table joins? To give some background, I have 2 years of ticket data which has a field called department separated by comma. I need individual rows for each of the department - the ticket number will remain the same.

    ReplyDelete
    Replies
    1. It might work, but I'd probably need to see the data to be sure.

      Delete
    2. So I tried your solution above but it isn't working.. the output view looks very similar to what you have except that my "Invite" data is broken down and visible into only one table "Event". It's all null in the "Events 1" Events 2" table. Could you please suggest what else can do? I can send you a piece of data over email but not sure how we can mimic the database connection here.

      Delete
    3. I'd probably need to see the data. Any chance you can mock up some fake data in Excel or csv then email me? flerlagekr@gmail.com

      Delete
    4. Hello, guys.
      First of all, thank you for your great article.

      I have same issue with Manisha.
      I connected to live data which updates everyday.
      After I split and pivot data, that new data source is not live anymore.
      Have you guys figured out this?

      Delete
    5. I don't see what could be causing this. Any chance you could provide a sample packaged workbook and/or steps to reproduce the issue?

      Delete
    6. Hi Manisha, did you ever find a resolution?

      Delete
  7. Excellent article. Saved me a ton of time by convincing me that I should go straight to T Prep rather than messing w custom splits in Desktop. Thanks, Ken!

    ReplyDelete
  8. Thanks for the helpful post! I'm trying to avoid using T Prep so that when I hand this project over, it's not necessary. BUT... there are 130+ IDs and some of the 450 items have 30+ IDs to split so I'm not sure about the T Desktop method. Any guidance on weighing the pros and cons for a specific case?

    ReplyDelete
    Replies
    1. Yikes! I wonder if you'd be able to email me and share a sample of your data? I'd like to take a look and see what other options might be available. flerlagekr@gmail.com

      Delete
  9. Thanks Ken, thats very helpful. I am very close to the solution but I got the same problem as the guy above had, which I couldn't figure out how to create the "table name" for the case function. You mentioned it can be done by performing union to itself, but doesn't doing union require specific condition setting? I am a bit confused

    ReplyDelete
    Replies
    1. Union is different from a join. Union just stacks the two data sets on top of each other. No condition is required. See https://help.tableau.com/current/pro/desktop/en-us/union.htm

      Delete

Powered by Blogger.