Journalists, data scientists, researchers, software developers: We all have to work with data. And we all rely on the default layout of data plots from plotting libraries or programs like Microsoft Excel. Unfortunately, this layout can be vastly improved.
The conventional data plot
A data plot consists of the following components:
- A frame. The frame holds the x-axis and y-axis and defines the borders of the plot.
- Labels. Labels describe the plot. A plot usually has at least the following labels: A description of the x-axis and a description of the y-axis; ticks labels that describe the value of certain locations on the x- and y-axis; graph labels that describe the data-set plotted (also known as legend).
- Data. Data can be plotted in different ways, for example as lines, points or bars.
Of these, the default frame of a plot often is poorly designed and can be significantly improved. Plotting libraries and office suites usually create a simple, rectangular frame that holds your data:
This type of frame holds little useful information. Concrete issues:
- The upper and right borders contain no information, so they are unnecessary ink that distracts from the actual data.
- The frame does not tell you the start- and end-values per data dimension, but default, well-rounded milestones like 0 and 50. But the data actually goes below 0 in both dimensions.
- The frame has arbitrary proportions that may skew the data. On the x-axis, the difference between value 0 and 20 seems like a much larger difference than on the y-axis, because the plot does not use the same scale for the two axes.
The range frame improves on these issues with easy-to-implement solutions.
The Range Frame
Edward R. Tufte proposed range frames in his book „The Visual Display of Quantitative Information“. Range frames provide multiple benefits compared to the traditional four-sided frame, and pose no disadvantage.
The basic idea:
- Avoid chart junk (ink that holds no information).
- Let the frame tell the reader the start- and the end-values of the data.
- Put the data in the right proportion.
Avoid chart junk. We get rid of the upper and right borders. This already makes the plot easier to look at.
Tell the Reader the Start and the End-Values of the Data. Let the axes not span the whole way from one side of the plot to the other, but make them start at the lowest value and end at the highest value of the corresponding axis. Label these values in the plot.
This immediately shows the value range. It shows the reader that the data on the x-axis starts at 0 and goes to 49, and the data on the y-axis starts at -11.2 and goes to 58.6.
Put the data in the right proportion. If the data dimensions have the same units (for example when you compare IQ levels or microservice runtimes), the axes should have the same data scale to not skew the presentation:
See how different the data looks now! If the data dimensions have different units, it is the author’s responsibility to find proportions between x- and y-axis that do not create a misleading impression on the data.
Python Library
For matplotlib and Python, I have created the library matplotlib-tufte to turn any default matplotlib plot into a range frame.
Top comments (0)