Do I Truncate the Axis on this Line Chart or Not?

  


Have you ever built a line chart that simply looks like a flat line and provides very little value? Take the following line chart for example:



This chart shows values over time with very few variances...and this chart provides very little value. I bet I've run into this scenario at least hundred times and every time I do, I have to stop and think about the right way to handle it. I've done quite a bit of research on this topic and although there were some good resources, some were theoretical, many were complicated, and none of them really got me to where I needed to be, so I decided to write it myself. Once I walk through the problems, my thought process, my proposed solution (as well as how to do it technically within Tableau), I'd love to hear your feedback. 



So, we have the chart above. There are very few variances in the data and because of that, it just looks like a flat line - it provides very little value. We know that the cardinal sin with a bar chart is to truncate the axis, but this is not considered a sin with a line chart. So in Tableau (and I'm sure it works similarly in any tool), we can edit the axis and uncheck the "Include Zero" checkbox as shown below:





And in doing so, we get this as the result:





Do you like this?  I don't. In fact, I absolutely hate it. 


Instead of a range of 0% to 100% like before, we now show only a range of 91% to a little over 95%. The problem is that the drop in July. The change is only 94.5% to 91.2%, but it looks like it plummets to zero!  One of the major goals of data visualization is to help people more easily (and more quickly) see insights. Let's say this chart showed results of a customer satisfaction survey. If you were a busy executive and just glanced at this chart, you'd probably have a heart attack wondering what happened in July.  It was good then dropped to near zero!!!!  


Simply not including zero causes this chart to be very deceptive and I would avoid this at all costs.


So what options do we have?  We include zero and you can't see the variances and you don't include zero, and it over accentuates the variances. So how do we resolve this?  


Well, I've done some research and also spoke with Jeff Shaffer. One article that I ran into...that Jeff also shared...was an article from Robert Kosara. In this article, Robert discusses the concept of 45 degree banking. Here is a quote from his article: "In a paper from 1988, Bill Cleveland, Marylyn McGill, and Robert McGill proposed the idea that the average line slope in a line chart should be 45º. This has been dubbed banking to 45º and has turned into one of the bits of common wisdom in visualization as determining the ideal aspect ratio."  


Okay "average line slope in a line chart should be 45 degrees".  I read a pile of articles that reference this and I admit, I don't quite understand what is being suggested.  I even found a website (and Jeff referenced it as well) showing the math: Banking to 45 degrees: Aspect ratios for time series plots.  I'm pretty good at math, but I honestly just didn't get it. Perhaps someone could give me an assist.


So if I don't get the math, what next?


Well, we know that including zero was bad and not including zero was equally bad, just a different kind of bad. So, let's do something in between. (In fact, this is what I've always done in these cases, but I've never really methodically thought through what was best). Let's start by placing the data in the top two thirds of the chart and leaving one third of empty space at the bottom. Let me illustrate what I mean: 




Here, you can see that the top two thirds (blue area) of the chart show the data itself and the bottom third (orange area) of the chart is completely empty. Let's show the chart without the highlighted areas:





What are your thoughts on this?  Well, I still don't like it. I still think that it over-accentuates that drop in July and I believe you, as an executive, may not have a heart attack, but definitely a panic attack!


So instead of showing the data at the top two thirds of the chart (with one third empty space at the bottom), let's try 50/50...basically the data in the top half with empty space at the bottom half:





It's better...way better, but I still don't love it. It still feels like it is making that drop look more dramatic than it really is. So let's iterate again and show the data in the top third and show blank space in the bottom two thirds:





You know, I think we've got it!  In this case, we can see the trend, we can see the variances over time and the July value doesn't seem to drop in such a dramatic fashion. I would, however, recommend one more addition - lets make it very clear to the end user that the axis is, in fact, truncated. 





Now it is very clear to the end user that the axis does not start at zero.



Okay, now that we've "solved" this problem, let's talk about how we would do this in Tableau. Please download this workbook from Tableau Public and follow along.


In the scenario we landed on, our data appeared in the top third of the chart and we had white space in the bottom two thirds. In order to accomplish this, we will have to uncheck "include zero" (as noted way earlier) and we will feed in an invisible reference line to pad out the bottom of the Y axis. 


So lets first calculate the "range" of our data using a window_max and a window_min calculation.


Window Max Value

    // What is the max value in the chart

    WINDOW_MAX(SUM([Value]))


Window Min Value

    // What is the min value in the chart

    WINDOW_MIN(SUM([Value]))


To make sure they are doing what I want them to do, I just add them to the label card. Let's first start with the Max value.







That appears correct to me. (Note that if you are having trouble getting this to calculate properly, check out this blog post). We do the same thing with the Min to make sure it is calculating properly as well.


Now, let's calculate the "range" of data by subtracting the Min from the Max:


Range

    // What is the range between the max and min

    [Window Max Value] - [Window Min Value]



Now this tells us the spread of our data. In my case, it ranges from a Min value of 91.2% to 95.5%, a range of 4.3%. So we want to place our data in the top third and allow for blank space in the bottom two thirds. So we can use this range to do exactly that. So if 4.3% (our range) represents one third, then 4.3% X 2 represents two thirds. That means we want our Y axis to start at the minimum value in our data less two thirds of the range. Here are two illustrations, one to show the ranges and another to show the math to calculate the starting point on the Y axis:






Okay, so let's do that in Tableau. Before doing that, we should keep in mind that if our data fluctuates wildly, then we could introduce negative Y axis values. Perhaps your data allows in, but in my case, the values will always be positive so I'll never want my axis starting point to go lower than 0. For that reason, I'm building that into my calculation. I'll do that by first taking the Window Min less the range X 2 (as we discussed above) then take the Max of that and 0 (essentially yielding the higher of the two):


Y Axis Min to Place Data in Top 1/3

    // Will use a reference line to set the chart in the top 3rd of the area
    // Basically, this will leave two thirds of the chart empty at the bottom
    // But never go below 0

    MAX([Window Min Value] - [Range]*2, 0)


Finally, add the above calculation to the detail card, right-click on the axis, add a reference line using the above calculation, then remove all labels, lines, etc. See the gif for a quick walkthrough:







As our last step, let's add a subtitle to our chart. To do that, just add the red section to your title.


<Sheet Name>

Axis Starts at <AGG(Y Axis Min to Place Data in Top 1/3)>



And there you have it! Just a couple of calculations and you have a chart that shows the variances, but doesn't over-accentuate them to make them deceiving to your end user.


As a side note, in much of my work, I allow users to toggle between the option presented above and simply showing it on a full scale (in this case, 0 - 100). I like to provide them with the option to see the data in both ways. To do this, I create a parameter with two values, one to show the full range and one to "zoom in" (I am typically verbose in the parameter options to make it very clear). I then adjust my Y Axis Min calculation to be zero when "full axis" is selected and to use the calculation when "zoom" is selected. 


As mentioned above, you can download this workbook from Tableau Public. The workbook contains all the iterations discussed above as well as all of the calculations. This includes not only the calculations required for the top third, but also for the top two thirds and top half (should you believe they are better than what I suggested).


As a side note, Jeff Shaffer mentioned to me that Excel applies something called the five sixths rule when determining how to scaled the Y axis. According to this really interesting article from Peltier Tech, "if Ymin is less than 5/6 of the Ymax, the automatic minimum Y axis scale value is zero. If Ymin is 5/6 of Ymax or greater, then the automatic minimum Y axis scale value is the first major unit less than or equal to Ymin – (Ymax – Ymin)/20 (the two MSDN articles have an unfortunate typographical error, showing the divisor equal to 2, not 20)." It seems to me (and to Jeff who shared this with me) that it accomplishes generally the same thing as my technique above. However, the problem is that it will do it on just about any chart, including a bar chart, where truncating the axis is, as mentioned, a cardinal sin. So be careful.


Okay, that's all for now. But I would love to hear your feedback. Do you agree with my assessment of showing the data in the top third? Or do you have a better option? I'd love to hear your feedback!






Need help with anything related to Tableau?  Through Moxy Analytics, Ken and I provide consulting services such as Tableau Lifeline (get us for 1 hour to help solve a sticky problem), Fractional Data Hero (get us on your team for N number of hours a month for whatever you want us for), Tableau Training, and of course, project work. Click the Icon below if you are interested.



Kevin Flerlage, February 12, 20224

Twitter LinkedIn Tableau Public





No comments:

Powered by Blogger.