Good/Satisfactory/Poor ranges on dashboards: Not as effective as they seem

tl;dr: This excerpt from my upcoming book, Beyond Dashboards, is the fifth in a seven-part series on how to determine which metrics to visually flag on a dashboard (i.e., with alert dots, different-colored text, etc.) in order to draw attention to metrics that require it. In this post, I look at the “Good/Satisfactory/Poor” method used on many dashboards. While not as problematic as the “vs. previous period” or “single-threshold” methods that I discussed in previous posts, this method still has several serious drawbacks that become obvious when pointed out. In the next post in this series, I’ll introduce a more useful approach called “four-threshold” visual flags.

 

One of the most common ways to determine which metrics to visually flag on a dashboard is to define “Good,” “Satisfactory” and “Poor” ranges for each metric, and then flag metrics that currently fall into their respective “Good” or “Poor” ranges:

Good-Satisfactory-Poor Small Example.png

Sometimes, these ranges have other names such as “Above expectation/Meets expectation/Below expectation,” but my comments here apply to those and other “three-range” methods, as well. While many dashboards use this method to determine which metrics to flag, it has several major drawbacks and limitations that become obvious when pointed out (if you read my last post on single-threshold flags, some of these will sound familiar since “Good/Satisfactory/Poor” ranges share several of the same drawbacks as single-threshold flags):

  • The points that divide the three ranges are hard to set. When we ask users to, for example, choose the point at which “Satisfactory” becomes “Poor” for a given metric, we’re asking them to pick the one single point at which that metric suddenly flips from being considered perfectly fine to a problem that needs to be solved. This is virtually never the way that things work in reality, though. Instead, the transition from “Satisfactory” to “Poor” is almost always gradual; there’s a point at which the user would start to become vaguely concerned about the metric and then the level of concern would increase until the metric reached another, further point, which would indicate a crisis. Choosing one single point to represent the gradual transition from “vaguely concerning” to “crisis” isn’t just difficult, it’s impossible. For the same reason, choosing one single point to divide the “Good” and “Satisfactory” ranges is also problematic.

  • “Good” and “Satisfactory” are too similar.  When two of the three ranges are called “Good” and “Satisfactory” specifically, this makes it even harder for users to choose a point that divides these two ranges. While most people would agree that “good” is better than “satisfactory,” they’d have a hard time explaining exactly what the difference is between these two terms. After all, if we consider a metric’s current value to be “good,” then aren’t we necessarily “satisfied” with it, as well? At what point would it cease to be “satisfactory” and become “good,” then? This problem can be mitigated by assigning these terms more precise definitions within the organization or by using more precise range names such as “Above expectation/Meets expectation/Below expectation,” but this is rarely done in practice and “Good/Satisfactory/Poor” remains very common.

  • Minor problems look the same as catastrophes. Because metrics are either visually flagged or not, a metric that’s a little lower than we’d like it to be looks the same as a metric that indicates the worst problem we’ve had in years; both are simply flagged as “Poor.” On a dashboard where multiple metrics are flagged as “Poor,” then, the user is given no indication as to where to focus first or whether there are any real catastrophes lurking among the flags. Every time they review the dashboard, then, the user must click through every single flag in order to figure out what needs to be dealt with first and to see if there are any actual emergencies.

  • “Kind of good” looks the same as “incredibly fantastic.” “Good” flags suffer from a similar problem to “Poor” flags, in that a metric that’s doing kind of well looks the same as a metric that indicates a stunningly positive development; both are simply flagged as “Good.”

These problems are often compounded by the fact that different people in the organization may have differing understandings of what terms such as “Good,” “Satisfactory,” and “Poor” mean. For instance, when a metric’s value crosses from Satisfactory to Poor, users might variously assume that to mean that the metric has become a minor concern, a major concern, or a crisis. Or, they might assume that it means that the metric now requires some kind of action to be taken in response to it, or that it’s dropped below a limit set in a service level agreement. Without precise, agreed-upon definitions of what qualitative terms such as “Good,” “Satisfactory” and “Poor” mean, users will have even more trouble setting these ranges and the ranges are likely to be set inconsistently by different users.

Readers who are familiar with bullet graphs may point out that that chart type addresses some of the drawbacks that I’ve listed here, and they’d be right. Bullet graphs are a great innovation and they do mitigate several of the issues that I’ve listed, but not all of them. I’ll save that discussion for another post, though.

In the next installment in this blog post series, I’ll introduce the “four-threshold” flags that I now recommend to my consulting clients since this type of visual flag doesn’t have any of the drawbacks or limitations that I’ve listed for the “vs. previous period,” “single-threshold,” and “Good/Satisfactory/Poor” methods. I'll then conclude the series with a post on useful statistics for setting visual flag thresholds automatically.

To be notified of future posts like this, subscribe to the Practical Reporting email list.