The biggest misconception in data visualization

tl;dr: When designing a chart, most people try to come up with the ‘best way to visualize the data’. This often results in charts that are unobvious or useless to readers, though. Instead, we should try to design charts that best answer a specific question or that best communicate a specific insight about the data, even though such charts don’t answer all questions that readers might have about the data.

Like any field, data visualization has some common misconceptions floating around in it. There’s one, though, that I think has done more damage than any other, which is the assumption that…

“When designing a chart, the goal is to find the overall best way to visualize the data.”

“WTF are you talking about?”

How can that be a misconception? Am I suggesting that your goal should be to find a bad way to visualize the data? Obviously not. What am I saying, then?

Well, have a look at the data in the table below and three potential ways of visualizing it for our company’s CEO. Which of the three graphs do you think is the best way to visualize this data, graph A, B, or C?

The answer, of course, is that any one of these graphs could be ‘the best way to visualize this data’, depending on what, specifically, we need to say about the data:

  • If the CEO needs to know which regions have the highest expenses, then Graph A is ‘the best way to visualize this data’.

  • If the CEO needs to know which regions are doing a better or worse job of sticking to their budget, then Graph B is ‘the best way to visualize this data’.

  • If the CEO needs to know which regions are contributing most to the company’s overall budget overage, then Graph C is ‘the best way to visualize this data’.

Is any one of these graphs the ‘overall best way to visualize this data’, or the ‘truest representation of this data’? How would we even go about determining that? All three—and many other possible variations—are potentially ‘the best way to visualize this data’, depending on what, specifically, we need to say about the data. None of them is the ‘overall best way to visualize this data’, or ‘the best representation of this data’. In fact, there’s never a single, ‘overall best way’ to visualize any dataset; there are only ‘best ways to say different things about the data’, such as which regions have the highest or lowest expenses, or which regions are doing a better or worse job of sticking to their budgets.

That’s the harsh reality of data visualization that few people seem to realize: Charts never ‘show the data’, they always just say a few specific things about the data. Different ways of visualizing the same dataset make different insights about that data more obvious, less obvious, and not visible at all. Yes, it would be awesome if we could make charts that ‘just show the data’, i.e., that make all possible insights obvious or that answer all possible questions that readers might have about the data, but those charts don’t exist.

“Why not?”

Well, if we try to create a chart that makes all possible insights obvious or that answers all possible questions that readers might have about the data, we’ll always end up with a ‘spaghetti chart’:

 
 

Even this doesn’t answer every question that the CEO might have about this data, though. For example, if the CEO wanted to quickly see what fraction of total expenses each region represents, or how these expenses compare to those of the previous year, we’d need to add even more clutter. Indeed, we’d never stop adding clutter to our chart in a quest to ‘just show the data’ because there’s always a virtually unlimited number of things that we could say about any dataset.

“Why don’t we just use a table, then?”

Well, tables do ‘just show the data’ without saying anything about the data. Indeed, tables don’t make any insights obvious at all. For example, based on the table alone in the scenario above, is it obvious which regions are doing a better or worse job of sticking to their budget? Or what fraction of total expenses each region represents? Sure, the reader can get those insights, but they’re going to have to work for them and possibly do some calculations, and they’re far less likely to notice interesting or unexpected patterns or relationships in a table of numbers than in a graph.

Tables are also many times slower to consume than graphs and require a lot more cognitive effort to process, which substantially increases the risk that readers won’t get the insights they need from a table—or will just skip over it altogether—because it requires too much cognitive effort to consume. In most situations, then, saying a few things about the data (i.e., showing a graph) is far more useful than saying nothing about the data (i.e., showing a table).

“So, what does all this mean when it comes to actually designing charts?”

The next time you sit down to create a new chart, instead of asking yourself, “What’s the best way to visualize this data?”, ask yourself, “Do I know why I’m creating this chart?”, i.e., do you know what specific insight or answer you need the chart to communicate about the data? If the answer to that question is “no” (which it will be surprisingly often), you need to step away from the charting software and go find out. Perhaps you’ll need to do some exploratory analysis, or speak more with the target audience but, one way or another, you need to figure out what, specifically, your chart needs to say about the data. If you don’t, many of your design choices (chart type, color palette, etc.) will be quasi-random guesses, and the chances that the audience will get what they need from your chart will be low.

Once you’ve figured out what, specifically, your chart needs to say about the data, the next step is to accept that whatever design you come up with is going to communicate that specific insight or answer that specific question clearly (hopefully, anyway…), but there will be many other potentially interesting questions and insights that won’t be obvious in your chart, or possibly not visible at all. Not only is that O.K., it’s the only way it can work (unless you give your audience a spaghetti chart).

What happens if, try as you might, you can’t find out specifically why the audience needs to see a particular dataset or needs to see a chart? For example, perhaps the CEO has simply asked for “expenses for each department” and you don’t have the opportunity to ask them why they need that information because they’re too busy to meet with you. These are unpleasant situations to be in, but they do happen. In my Practical Charts course, we discuss strategies for increasing the odds that we end up giving the audience something that will be at least somewhat useful to them, but these strategies will have to be a topic for a future article since this one’s already longer than I’d like it to be. The bottom line, though, is that our chart probably won’t be as useful to the audience as it could be if we design it without knowing specifically what it needs to communicate about the data.

“So, are you also saying that…”

No. I want to be clear about a few things that I’m not saying:

  • I’m not saying that all the ways to visualize a given dataset are ‘potentially best’ ways. For any dataset, there are plenty of ways to visualize it that aren’t useful in any plausible scenario, that are fundamentally confusing, or that are just plain misleading:

Outside of obviously bad ways such as these, though, there are always many ‘best ways’ to visualize any dataset.

  • I’m not saying that, because there’s never a single ‘overall best way to visualize this data’, that whether one chart is better than another comes down to personal opinion or preference. For any given scenario (the nature of the data + what we need to say about that data + knowledge of the audience), different chart designs will be objectively better or worse ways to visualize that data for that scenario. How could we know if one chart design is objectively better than another for a given scenario? We could recruit representative members of our target audience and run an experiment to test the different chart designs to determine which one most effectively answers the question at hand or communicates the insight we need to communicate, and that ultimately best achieves whatever effect we want to have on the target audience.

    Of course, we usually don’t have the time or resources to run such experiments, so part of learning data visualization involves getting good at making educated guesses about which chart designs would perform best, were we to test them experimentally with members of our target audience. Having some knowledge of major findings from data visualization research studies is helpful and can make those guesses more educated, but research findings generally aren’t specific enough to point to the best chart in a specific scenario.

    Whether we have the resources to determine which chart design is objectively better or not, though, the fact remains that one of the designs is always objectively better than the others. It’s not an inherently subjective assessment.

  • I’m not saying that, as long as you know specifically what you need to say about the data, you’ll automatically be able to design an effective chart. It takes a fair amount of skill to take some data, a specific reason why the audience needs to see that data, and knowledge of the target audience (level of dataviz sophistication, current concerns, etc.), and turn all that information into an effective chart. The chart creator has to know how to choose chart types, chart arrangements, color palettes, scale formatting, and how to make many other types of design decisions. These are the skills that I teach in my Practical Charts course, and it’s 14 hours long…

“Umm, this seems kind of obvious…”

The fact that there isn’t a single ‘overall best’ way to visualize a given dataset may seem obvious to some when it’s spelled out like this, but getting out of the mindset of ‘trying to find the best way to visualize this data’ and into the mindset of ‘designing the chart that best communicates a specific insight or best answers a specific question’ requires a fundamental shift in thinking that relatively few people seem to have made. I regularly hear even well-known experts discussing which chart design ‘best represents the data’ without even mentioning what, exactly, the chart is supposed to do. As I see it, though, that’s like arguing about whether a hammer or a screwdriver is ‘the best tool’ without ever mentioning if we need to pound in a nail or tighten a screw.

“But is this really the biggest misconception in data visualization?”

I think so, yes…

  • It’s very widespread. While some people have fully internalized the idea of trying to find the best way to answer a specific question or communicate a specific insight, most still try to find ‘the best way to visualize this data’, without considering the specific reason why the audience needs to see that data in the first place.

  • It’s caused innumerable arguments regarding which of two (or more) chart designs is ‘better’, which could have been instantly resolved if everyone involved had realized that one chart design would be ‘the best chart’ in one scenario, and the other chart design would be ‘the best chart’ in a different scenario.

  • If we design a chart by trying to find ‘the best way to visualize this data’, there’s a dramatically higher risk that the target audience will find the resulting chart to be too unobvious—or possibly even useless—because many of our design choices (chart type, color palette, highlighting, etc.) will be guesses since they won’t be geared around communicating a specific insight or answer.

  • Trying to find ‘the best way to visualize this data’ makes designing effective charts a lot harder than it needs to be. Once we realize that all charts just say a few things about the data, it becomes a lot easier to choose chart types, color palettes, scale formats, etc. in light of the specific insight or answer that we need to communicate. We’re no longer trying futilely to design charts that anticipate every possible question that the audience might have about the data, or trying to find some ‘overall best’ representation of the data that doesn’t actually exist.

Let me know your thoughts in the comments, though. Do you have a different take on this idea?

By the way...

If you’re interested in attending my Practical Charts or Practical Dashboards course, here’s a list of my upcoming open-registration workshops.