Automatically flag metrics that require attention on dashboards using statistics (book excerpt)

tl;dr: In order to gain traction and acceptance among users, dashboards must visually flag metrics that are underperforming, overperforming, or behaving in other ways that warrant attention. If a dashboard doesn’t flag metrics, it becomes very time-consuming for users to review the dashboard and spot metrics that require attention among a potentially large number of metrics, and metrics that urgently require attention risk going unnoticed. In previous blog posts, I discussed several common ways to determine which metrics to flag on a dashboard, including good/satisfactory/poor ranges, % change vs. previous period, % deviation from target, and the “four-threshold” method. Most of these methods, however, require users to manually set alert levels for each metric so that the dashboard can determine when to flag it, but users rarely have the time set flags for all of the metrics on a dashboard. Techniques from the field of Statistical Process Control can be used to automatically generate reasonable default alert levels for metrics that users don’t have time to set levels for manually.

This post is based on a section from my upcoming book, Beyond Dashboards, and is the final installment in an eight-part series of posts on visually flagging metrics that require attention on dashboards.

In previous posts in this series, I looked at different methods for flagging metrics on a dashboard that require attention, and how the most common methods, such as good/satisfactory/poor ranges, % change vs. previous period, and % deviation from target are all surprisingly problematic. I then recommended the “four-threshold” method as a more effective way to determine which metrics to flag on a dashboard.

 Most of these methods, though, require users to manually set alert levels for each metric so that the dashboard knows when to flag or not flag it. Setting alert levels is very time-consuming, however, especially if the dashboard contains a large number of metrics which, of course, most dashboards do. A common way of dealing with this challenge is to ask users to set alert levels for only a small number of “key” metrics, and simply not flag the other metrics at all. Obviously, this is far from ideal since “non-key” metrics can—and regularly do—indicate major problems that require attention. Wouldn’t it be great if we could somehow generate default alert levels for metrics automatically, so that even “non-key” metrics with no manually set alert levels would get flagged on a dashboard if they required attention?

Enter SPC

When I first came across the field of Statistical Process Control (SPC), I was amazed that I hadn’t heard of it before. How could something so useful still be so relatively unknown? Yes, there are some small conferences and publications, but when I ask people if they’ve heard of it, the answer is almost always “no”.

So, what, exactly, is SPC? SPC a set of practices and statistical techniques that was originally proposed in the 1920s for analyzing quality control data from manufacturing processes to identify ways of improving those processes. For example, the daily count of defective products from a factory could be analyzed using SPC techniques to determine if, on any given day, a meaningful change had occurred in the manufacturing process that warranted investigation, or if the defect count was just experiencing normal, random fluctuations that could be safely ignored.

Since it was first introduced for analyzing manufacturing data, SPC has been applied in many other contexts. For example, if our organization is measuring average employee satisfaction every day, a chart of its recent history might look this this:

emp_sat with no lines.png

Using simple SPC techniques, we can use this time series of historical values to calculate what’s known as a “natural process range” for this metric:

emp_sat with two lines w natural range.png

As long as a metric stays within its natural process range, any fluctuations can be safely assumed to be due to normal randomness with no identifiable cause, and therefore require no action:

emp_sat with two lines w dont panic label.png

If, on a given week, employee satisfaction falls outside of its natural process range, though, this suggests that some kind of meaningful change has occurred in the organization that’s impacting employee satisfaction, and that we should investigate and possibly take action on it:

emp_sat with two lines w uh-oh.png

There are at least four ways that SPC techniques can be used to substantially improve the effectiveness of dashboards.

1.      SPC techniques can automatically set default alert levels for all metrics on a dashboard

As I mentioned at the beginning of this post, users generally don’t have time to set alert levels for all metrics on a dashboard, and so many metrics simply don’t get alert levels and therefore never get flagged, even if they’re behaving in ways that definitely require attention.

Using SPC techniques, we can automatically generate reasonable default alert levels for metrics, without any input required from users. Whenever a metric goes outside of its natural process range, it can be automatically flagged on a dashboard as requiring attention, even if no one has manually set alert levels for that metric:

emp_sat with two lines w dont flag label.png
emp_sat with two lines w flag this on dashboard.png

SPC-generated alert levels won’t be perfect, though, since they won’t account for externalities such as limits that are set out in service level agreements. This means that metrics that genuinely require attention occasionally won’t get flagged, and that there will be some false alerts. This is far better than having many metrics that never get flagged at all on a dashboard, however, which forces users to sift through large numbers of values to try to spot problems on their own. Also, ideally, users will manually set alert levels for more and more metrics over time, overriding the SPC-generated default levels if necessary.

This technique becomes even more powerful when there’s a need to monitor large datasets of items (transactions, patients, customers, etc.) that can be segmented by a variety of dimensions (regions, ages, activity levels, etc.). For example, a bank might want a dashboard to monitor its two million customers’ level of transaction activity segmented by region, age, gender, mortgage status, credit card status, etc. Using SPC, every segment can be monitored for important changes without asking dashboard users to set dozens or hundreds of alert levels, and even sub-segments (e.g., customers in the Eastern region who also have a mortgage) can be monitored for meaningful changes that require attention.

But, don’t we need four alert levels for the “four-threshold” method?

If you read the post in which I recommended the “four-threshold” method of flagging metrics on a dashboard, you’ll know that that method requires four alert levels for each metric (“Crisis”, “Actionably bad”, “Actionably good”, and “Extraordinary”), not two. In what’s almost certainly going to be considered a controversial move, I’ve adapted this century-old technique so that it can automatically generate four levels instead of two:

emp_sat with four thresholds.png

I’m still tweaking the math for calculating the “Crisis” and “Extraordinary” levels (if you’re an SPC expert, I welcome your thoughts), but that’s the general idea.

Again, though, this method will generate reasonable default alert levels, not perfect ones. Users should always have the option to override statistically generated alert levels manually since human-set levels will usually be preferable to statistically generated default levels. When users set alert levels manually, though, we have to make sure that they set them intelligently. Specifically, we have to make sure that users avoid setting “overly sensitive” alert levels which will trigger false alerts on dashboards, i.e., metrics that get flagged when they don’t actually require attention. SPC comes to the rescue here, as well, which is the second way that SPC can substantially improve dashboard effectiveness.

2.      SPC-generated levels can steer users away from setting “overly sensitive” alert levels

The effectiveness of innumerable dashboards has been compromised by users setting “overly sensitive” alert levels for metrics.  When users do this, metrics get flagged on the dashboard when they fall even slightly outside of what the user considers to be an ideal range, so large numbers of metrics get flagged even on normal days when everything is basically O.K. There are at least two reasons why a metric that’s not in what the user considers to be an ideal range shouldn’t get flagged on a dashboard:

  1. The metric could just be experiencing normal, random fluctuations (a.k.a. “noise” or “natural variation”) and will likely drift back into a more desirable range shortly. In these cases, flagging the metric on a dashboard creates an unnecessary distraction and forces the user to waste time investigating the metric even though it’s behaving normally.

  2. The metric might be outside of its ideal range but still not bad enough that something actually needs to be done about it. For instance, if an organization’s headcount is 1% higher than its plan says it should be (i.e., a 1% deviation from the ideal number), that might not be enough of a deviation to warrant any action. For example, it might only be necessary for someone to actually do something about it if it drifts more than 5% from plan, so flagging a 1% deviation creates an unnecessary distraction on the dashboard.

When users set overly sensitive alert levels for metrics, dashboards end up suffering from “Christmas tree syndrome”, wherein many metrics get flagged with red and green indicators even when they don’t actually require attention. Genuine problems and opportunities then get buried in the pile of false alerts and users may start ignoring the alert flags altogether. Having a lot of false alerts on a dashboard also dramatically slows down the process of reviewing the information on a dashboard since it forces users to waste time figuring out which alerts genuinely need to be investigated and which are just noise that can be ignored.

We can use SPC to dissuade users from setting overly sensitive alert levels by showing them SPC-generated levels when we ask them to manually set alert levels for a given metric. We can then advise users to set alert levels that are “less sensitive” than the SPC-generated levels in order to avoid flagging metrics that are just experiencing normal, random fluctuations:

emp_sat with user guidance.png

But what if users aren’t very familiar with a given metric and, therefore, not sure where to set alert levels for it in the first place? SPC comes to the rescue here, as well.

3.      SPC-generated levels can guide users who are unsure of where to set alert levels for a given metric

When users don’t know where to manually set alert levels for a given metric, SPC-generated alert levels can act as “reasonable default” levels to help guide them. Users can then just accept the SPC-generated default levels or tweak them as necessary.

4.      Various SPC techniques can detect a variety of patterns that suggest that a metric requires attention

 Consider this metric:

emp_sat with sudden dip.png

Clearly, something has happened recently that warrants immediate attention but, since the metric remained within its natural range the whole time, it wouldn’t normally get flagged on the dashboard. Fortunately, there are additional SPC techniques that we can use to automatically detect and flag a variety of other patterns that indicate that a metric probably warrants attention, such as:

  • Sudden spikes and dips

  • Changes in volatility (metrics that suddenly become more volatile or less volatile)

  • Changes in steady state (metrics that move from one “typical range” to a higher or lower “typical range”)

  • Other patterns that indicate that a metric probably warrants attention

Fortunately, the math behind all of these techniques is straightforward and many sample formulas and spreadsheets are available online. Yes, SPC can get quite complex; however, for most dashboards, only the basic techniques are needed and it’s definitely the kind of capability that can be implemented at a very basic level at first and then become more sophisticated over time. If you’d like to know more about SPC, great places to start are Don Wheeler’s book Understanding Variation, Stephen Few’s Signal, or Stacey Barr’s Practical Performance Measurement.