# Creating a Basic Beeswarm Plot in Tableau

In this blog, I’m going
to show you a method of creating a simplified beeswarm plot. I use the term
“simplified” (or “basic”) because this method won’t provide some of the more
advanced functionality provided by other methods of creating this chart. My
goal, however, is to provide you with a method that allows you to create this
chart with just a handful of relatively simple calculations. I’ll also discuss
some of the limitations of this approach and provide you wish some alternatives
for creating a more full-featured beeswarm.

**What is a Beeswarm?**

Before we get into the
details of how to build this chart, I think it’s important to understand what
it is and how it works. A beeswarm plot is a way of showing distribution of a
given of a variable while also showing each individual data point. To help
demonstrate the chart, let's start with a simple histogram. The following shows
ages of Premier League players. The y-axis shows the number of players and the
x-axis shows the ages. With this chart, we can easily see the distribution of
the ages.

But what if we wish to
show each player individually, while still seeing the distribution? One option
would be to create separate marks for each, making it a unit histogram.

We still retain the
ability to see the overall distribution of ages, but we are also able to see
each player by hovering over the marks. If we now take the unit histogram and
align each "stack" along a common center point, we start to see
something like a beeswarm.

In fact, we might
consider this a beeswarm that uses rectangular shapes. Like the unit histogram,
we can still see the distribution of ages based on the various bulges within
the plot. But beeswarms typically use dots, not rectangular shapes, like shown
below.

The problem with this beeswarm,
however, is that we see some overlapping of dots. One of the main goals of a
beeswarm is to avoid this overlapping. To correct this, we could make this
chart much taller or we could make the dots much smaller. But, in this case, we
might be better off just rotating the chart.

This rotation gives the
beeswarm a relatively tight packing of the dots.

In addition to a beeswarm
showing all players in the Premier League, we can also break them down into
small multiples. For example, below I’ve created separate plots for the top 10
Premier League teams of 2020.

This allows us to compare
the distributions of player ages across various teams. The general shape of
each plot immediately gives us some good information, such as the fact that
Arsenal’s player ages a fairly evenly distributed between ages 17 and 33, while
Burnley has a much higher distribution of players who are 27+.

A logical next step from
here might be to turn this into a violin plot. Violin plots show the
probability density of data at different values and smooth them through use of
a kernel density estimator. Because of the need for such a smoothing algorithm,
these aren’t easily built in Tableau, but just to demonstrate these, I used an
online tool called BoxPlotR to create one:

Like the beeswarm, we can
see the relatively even distribution of ages for Arsenal and the aging players
of Burnley. While violins have some advantages over beeswarms (and other plots
such as box plots), we do lose visibility of individual players. So, if that’s
something we need to see, a beeswarm may be a good option.

**How to Build One**

So, let’s talk about how
to build this chart. As I’ve previously noted, this is not a true beeswarm
plot, but rather a simplified version that bins the data. Later in this blog, I’ll
share links to techniques for creating true beeswarms in Tableau.

The data we’re using is
pretty simple—it looks like this:

We’ll start by creating
our bins. We can do this using Tableau’s bins functionality, but I generally prefer
the BYOB technique
courtesy of Joe Mako and Jonathan Drummey, as it provides more flexibility. We
don’t really need that flexibility here, but we’ll use it anyway, just in case.

__1. Bin__
// Custom bins. Allows more flexibility than default
bins.

**INT**

**(**

**[Age]**

**/**

**[Bin Size]**

**)***

**[Bin Size]**

**-**

**IIF**

**(**

**[Age]**

**<0,**

**[Bin Size]**

**,0)**

In the above,

**Bin Size**is a parameter that allows us to set the bin size (in this example, we’re just using 1).
Next, we create a
calculated field to get the number of points in each strip (each line of dots).
This will allow us to then figure out the positioning of each dot so that they
are center aligned.

__Points in Strip__
// Number of points in the "strip".

// We could fix on Bin, but excluding ID gives...

// ...us more flexibility to add dimensions to our
view.

**{EXCLUDE**

**[ID]**

**:**

**COUNTD**

**(**

**[ID]**

**)**

**}**

Then we create a
calculated field to get us the maximum coordinate for each strip. It’s
important to note here that our shared center axis will be 0 so half of the
dots will have a positive coordinate and the other half will be negative.

__Max Coordinate__
// Highest positive
coordinate.

**[2. Points in Strip]**

**/2 - 0.5**

Finally, we use INDEX() to
calculate the placement of each individual dot.

__Coordinate__
// Coordinate for
each point.

**MAX**

**(**

**[3. Max Coordinate]**

**) -**

**INDEX**

**() + 1**

We wish to create a
beeswarm with age on the vertical (y) axis, so we’ll drop

**1. Bin**on the rows shelf. It’s possible, at this point, that Tableau will interpret this as a measure and try to aggregate it. If so, right-click the pill and change it to a dimension. Then we drop**ID**on the detail card so we get a mark for each player. Finally, we drop**4. Coordinate**on the columns shelf; once there, we right-click the pill and edit the table calculation, setting it to compute like this:
One of the keys here is
that we set it to restart on every bin—essentially causing it to restart
counting at 1 each time.

If you’ve done everything
correctly, you’ll have a simplified beeswarm that looks exactly like the one I
shared earlier.

**Why & When to Use a Beeswarm**

One question you might be
asking is why and when might we use this type of chart. It really isn’t much
different than our unit histogram, which is much easier to build, so why choose
a beeswarm? That’s a great question and it reveals one of the key flaws of the
simplified approach I’ve shared here. The key difference between a unit
histogram and a beeswarm (other than the alignment of the data points) is that
beeswarms typically do not bin the data, as I’ve done here. They typically show
the

*actual value*of the data point. So, the result of a beeswarm is generally not a set of straight horizontal lines, but lines that curve slightly as you move towards the ends.
Example
beeswarm from EDU Pristine’s How to create a Beeswarm Plot in R

That said, I personally
find this type of beeswarm to be difficult to understand. The lack of
consistency in the placement of the dots, to me, creates a sort of visual
clutter that makes it hard for me to tell what’s going on. And I question
whether or not showing the actual value gives us that much additional
information than what we get from binning the data. In many cases, I think a
binned version of the chart is more effective.

All this being said, the
question remains—when and why would we use this simplified beeswarm? And that’s
a great question. In most cases, I think that a unit histogram is probably
easier to read and understand, but the beeswarm does provide a viable
alternative that could be very effective in some situations. The key, as
always, is to understand your data, the target audience, and the questions
you’re trying to answer. If, after carefully considering these three things,
you find that this simplified beeswarm is viable chart, then now you know how
to create it!

**Alternative Approaches**

There may come a time
when you need a more full-fledged beeswarm plot without the limitations of this
simplified approach. In that case, you are in luck! There are a few different
methods for creating full-blown beeswarm plots in Tableau.

**R + Tableau**

R has a few different
packages that allow you create beeswarms. In his blog, Beeswarm Chart in Tableau … via R,
Dorian Banutoiu details how he leveraged these packages to create the
coordinates need to plot a beeswarm in Tableau.

**Pure Tableau**

If you’ve ever seen the
work of Rody Zakovich, you know that he’s a genius. He is able to make Tableau
do things that I would’ve never thought possible. In his blog, Automatic Bins and Beeswarms in
Tableau, Rody details a method for creating beeswarm plots 100% in Tableau.
This approach is pure genius, so it’s definitely worth checking out.

**Extensions**

One final method is extensions.
Zen Master, Chris DeMartini has done some amazing
work leveraging Elijah Meek’s Semiotic to create a variety of
charts, including beeswarms. His extensions can be found on the Tableau Community Extensions Page.

Ken Flerlage, November 9, 2020

Ken

ReplyDeleteThanks for this easy to understand guide. I've looked to use this as part of a project I'm working on but I'm having trouble when I add another dimension - say to colour some of the plot points.

What seems to happen is that the points start to overlay rather than remaining separate. I've tried an alternative in the 2. Points in Strip calc ({FIXED [1. Bin] : COUNTD [ID]} but it still results in the same effect.

It looks like the final 4. Coord calc is the one that shifts with the introduction of the additional dimension (for colour). Any ideas?

Thanks

Simon

Check the table calculation and make sure it's computing using that other dimension as well.

DeleteI had a look at that but still couldn't deal with the overlapping. Can I send you my basic Tableau working file? You may be able to quickly see where I've gone wrong. Or should I post to the Tableau forum?

DeleteTa!

Simon

Sure! You can email me using flerlagekr@gmail.com

Delete