Stop trying to create “general purpose” charts (because they don’t exist)

tl;dr: I frequently encounter the misconception that, for a given set of data, it’s possible to design a chart that will be useful regardless of the audience or the reason why that audience might need to see that data. Such “general purpose” charts don’t exist, though, since any visualization of a given data set will inevitably serve some audiences and purposes well and others not. In order to create a useful chart, then, the target audience and reason(s) why that audience needs to see that data must be identified beforehand.

A video version of this post is available for those who prefer watching to reading:

 
 

You’ve volunteered to act as a test subject in a psychology experiment. You arrive at the prescribed time and location of the experiment and a researcher asks you to watch a video of a professional tennis match, and then write a description of the match. Sounds simple enough. You watch the video and then turn to your laptop to start writing, but you’re unsure of how to begin.

“What kind of description?” you ask the researcher.

“Just write the most effective description of the match that you can; the text that best describes the match,” comes the singularly unhelpful answer.

“Most effective”? “Best describes”? Those could mean anything, depending on what the description will be used for. You think for a moment about just a few of the potential purposes that such a description might serve:

  • To provide readers of the losing player’s city newspaper with hope for a better outcome next time.
  • To provide readers of the winning player’s city newspaper with reasons to expect him to make it to the finals.
  • To explain tennis to a reader who knows nothing about the game.
  • To get readers to watch future matches.
  • To pitch a movie screenplay about the heartwarming story of the rookie player who overcame insurmountable odds to win his first professional match.
  • To provide a play-by-play of the match that can be read to blind audiences.
  • Etc., etc., etc.

Obviously, you can’t write the “best,” or “most effective” description of the match unless you know for whom you’re writing the description and why you’re writing it for them. If you try to write a “general purpose” description that would serve any purpose for any audience, it will surely fail to fulfill most of the potential purposes that such a description might serve. With an unspecified purpose and audience, you’re forced to randomly pick an audience and invent a purpose in order to even begin writing something, though the chances of those guesses being right are probably pretty low.

For some reason, though, we rarely realize that we face this exact same problem when we’re asked to create the “best, most effective visualization” of a given set of data. Too often, we get a marching order such as this and start building a chart with an inadequate (or non-existent) understanding of the purpose and audience for the chart. Just as it’s impossible to write a useful description of a tennis match without an identified audience and purpose, though, it’s also impossible to design an effective chart without an identified audience and purpose.

For example, “Create a graph of our company’s recent headcount history” might sound like reasonable request, but here are just a few of the chart designs that that could fulfill such a request if it didn’t also specify the intended audience and purpose of the graph:

For board members who want to know if we hit our headcount target for each month since last March’s board meeting:

BoD graph2 - small.png

For the head of HR, who wants to know if the employee retention program that they launched in mid-July is actually reducing employee churn/turnover:

HR graph - small.png

For the CEO, who’s wondering where the recent increase in headcount came from:

CEO graph - small.png

All of these graphs are based on exactly the same underlying headcount data, and all are honest, potentially useful responses to the request, “Create a graph of our company’s recent headcount history.” As we can see, though, the design varies dramatically when we assume different audiences and purposes. Without a very clear idea of who the target audience is and why they need to see that data, we’d be unable to determine which one of these designs—or a virtually infinite number of other possible designs—would be the “best” or “most effective” way to visualize this particular data.

Certainly, life would be simpler if it were possible to create a general-purpose, one-size-fits-all chart for a given set of data. The messy reality, though, is that any chart that we might create for a given data set will inevitably serve some audiences and purposes well and others not, even if we’re trying to be completely audience- and purpose-agnostic. Ultimately, this will always be the case because creating a data visualization is always an act of summarization (unless our “chart” consists of a raw text dump of all disaggregated values). In order to summarize anything, whether it’s a tennis match or a set of employee headcount data, we must remove detail, filter information, and group and aggregate lower-level information into higher-level information. We must choose what to highlight and what to downplay or leave out, and when to bring in external information, such as a player’s life story or the launch date of the employee retention program. Without a very clear idea of why we’re summarizing the information and for whom, we can’t make any of those decisions. If we power through and try to create a summary anyway, we’ll be targeting a randomly chosen audience and made-up purpose, even if we don’t realize it and (mistakenly) believe that we’re actually creating a general-purpose summary. Ultimately, though, the result will be the same: a summary/chart that won’t be as useful as it could be—or that might even be useless—to the those who receive it.

Unfortunately, the misconception that general-purpose charts exist is widespread and I see the sad consequences of it all the time in organizations with which I work, where decision-makers are often frustrated by the data visualizations that they receive. Yes, this frustration is often due to a lack of basic data visualization skills among chart creators but, just as often, it arises because no one involved realizes that useful charts can only be created when the people who create them know exactly why and for whom they’re creating those charts.

I also see this problem regularly in data visualization contests, wherein participants are asked to design the “most effective” or “best” visualization of a given set of data, but where no purpose or audience are specified by the contest organizers. Such contests aren’t meaningful since the effectiveness of different chart designs can’t be judged without first specifying what effect the chart is supposed to have in the first place.

So, the next time you’re asked to “create a chart of this data” and you don’t have a clear understanding of the purpose and audience that the chart will serve, dig until you have that understanding. If the purpose is very complicated or the audience highly specialized, start learning fast. If your audience doesn’t understand why you need to know what they’re going to be using the chart for and still asks you to “just design a chart that shows the data,” send them a link to this article <grin>.