Ads Top

Split & Pivot Comma-Separated Values


In my time teaching and helping people with Tableau, I often come across situations where some key piece of data is locked in a comma-separated value (it could also be any other kind of delimited field). Take, for example, the following event data. We have two tables. The first—People—contains a list of people who could be invited to an event:



The second—Events—contains a list of events.


We have a single record for each event, but the list of invitees is contained in a comma-separated field called Invites. Ideally, we’d like to break our Event table into separate rows—one for each invitee:



This would allow us to then join Events to People, resulting in a table that improve our ability to analyze the data.


To do this, we need to first split the Invites column, then pivot the results. And that’s what I’ll be showing you in this blog.

Split & Pivot with Desktop
Splitting and pivoting data with Tableau Desktop is a bit hacky and has a few drawbacks. So, if Tableau Prep is available to you, I strongly suggest going that route as it will eliminate these drawbacks. That said, let’s start by doing this in Desktop.

A couple of notes before we begin. First, there are, of course, ways you could perform these operations with some manual work, but my goal here is to find an automated methodone that would allow you to publish a data source to Tableau Server and refresh automatically, without the need for any manual intervention. Second, if you’re not familiar with the concept of pivoting, then I’d recommend reading my blog, 3 Ways to Pivot Data for Tableau before reading the remainder of this post.

The basic steps we’ll take are to first split our field into multiple values, then pivot them. Sounds easy enough, so let’s give it a try. I’ve connected to the Events table. I’ll now perform a custom split on the Invites field.


In the Custom Split dialog, I’ll use a comma as the separator and choose to “Split Off” all columns. Tableau will then create three calculated fields.


Why 3 fields? That’s because the maximum number of values in our Invites field is three. Were there more comma-separated values in one of the fields, then Tableau would create additional split calculated fields.

Now let’s try to pivot on our three new fields. We’ll go back to the data pane, select the three fields, right-click and choose the “Pivot” option.


But, the option is not there. This is because you cannot pivot on a calculated field in Desktop. So, how can we perform this pivot? That’s where we need to get a bit hacky. We’ll start by unioning the table to itself three times—once for each of our split fields. Now, we’ll create the following calculated field:

Invite
// A single invite for each row.
CASE [Table Name]
WHEN "Events" THEN [Invites - Split 1]
WHEN "Events1" THEN [Invites - Split 2]
WHEN "Events2" THEN [Invites - Split 3]
END

This is basically just choosing one of the split fields based on the unioned tables, allowing us to have just one invite for each row as can be seen here:


The last step would be to then join this to the People table so we can get each person’s name. Ideally, we’d be able to use our new Invite field, but Tableau doesn’t allow us to use a named calculated field in a join. So, we need to use a join calculation. To do that, we’re going to have to repeat the logic above, combining everything into a single field. We can do that by copying the script for Invite then replacing each reference to the split columns with the actual logic from those calculated fields:

// A single invite for each row.
CASE [Table Name]
WHEN "Events"  THEN TRIM(SPLIT([Invites], ",", 1))
WHEN "Events1" THEN TRIM(SPLIT([Invites], ",", 2))
WHEN "Events2" THEN TRIM(SPLIT([Invites], ",", 3))
END

We then use this in a join calculation:


And, with that, we have our data in the correct structure:


Whew! That was a bit painful. As you can see, there are a number of drawbacks to this method. One is that it requires you to union your table to itself once for the maximum number of comma-separated values. This is okay if that number is known and is relatively low. If the number of values is unknown, then this could be very problematic. Another issue is that you are duplicating the logic of the calculated field. This is not a huge deal, but it’s something I tend to try to avoid. And, generally speaking, this is not a very automated or elegant approach. So, for a better solution to this problem, let’s turn to Tableau Prep.

Split & Pivot with Prep
We’ll start by connecting to our data and bringing in the Events table. Next we’ll add a clean step and perform the same custom split as we did in Desktop:


That will produce very similar results—three split fields:


We now have to pivot the data and this is where Tableau Prep really shines. Unlike Desktop, Prep can perform the same operations on calculated fields as on regular data fields. Add a pivot step and choose the “Use wildcard search to pivot” option. With this option, we can search based on a field name. The advantage here is that it will be automated—if we eventually have 4 comma-separated values in one of the Invites fields, then Tableau will automatically include that fourth split field in the pivot. We’ll search on “Invites - Split” and Prep will automatically bring in matching fields.


Now let’s add a clean step and take a look at the structure of our data.


We have some fields that are no longer required, such as Pivot Field Names and Invites, so we’ll remove them. We’ll also rename the Invites Split field to Invite. Finally, you’ll notice that we have a blank item in the Invite field. This is because some of our rows only had 2 comma-separated values. So, we can just filter these out (Note: This isn’t technically required as they will be removed when we join to the People table since we’ll be using an inner join).


The last thing to do is join this to the People table using a join step.


Then we can output the file:


As you can see, this method is easier, less hacky, and much more automated than the Desktop approach. Thus, if Prep is available to you, I highly recommend that you use it instead of trying to make this work in Desktop. Your future self will thank you!

As always, thanks for reading! I hope you find these techniques useful in your work. And, if you have any questions or comments, feel free to leave them in the comments section below.

Ken Flerlage, May 4, 2020


2 comments:

  1. Hi, i really like this website, lots of things to learn. is there anywhere that i can download the data sources which you are working with.....
    not just for this post but for the other one as well

    Thanks
    Pouya

    ReplyDelete
    Replies
    1. I need to start making these publicly available. In the meantime, if you have any data sources you'd like, please email me (flerlagekr@gmail.com) and I'd be happy to send them to you.

      Delete

Powered by Blogger.