Introduction
If you're like me, you go to bed around the 40th tweak to your graph, bleary eyed and beaten. Formatting tick labels in particular is incredibly frustrating, especially when Offset Notation ruins an axis.
To make things easier, I've laid some simple formatting examples for an uncooperative axis using the set_major_formatter
feature. This is not a comprehensive list of all formatting options but is simple and effective for some of the more obvious cases.
Purpose
This is useful because there are many ways to display numbers. Often, it's money ($x,000 for instance), but could include percentages, engineering notation, logarithmic scales, decimals, dates, or countless others. Getting your graph to tell a story is vital for any visualization. Hopefully this will save you some time.
Background
If you want to scroll down to the examples below, feel free to skip this section. For those who want more, here's a little background on axis
and the set_major_formatter
function.
Axis Class
To format our axis, we're going to work in the axis
class of matplotlib. axis
class you ask? What's the difference between axis
and axes
? The axes
essentially contains everything in the plot, where axis
just pertains to the y or x-axis ticks themselves, especially tick location and formatting. To access the x axis, you would type ax.xaxis
. For the y axis, ax.yaxis
. See informal diagram below courtesy of Madplotlib
Formatter
Within the Axis class, two common objects that pertain to tick display are Locator
and Formatter
. I'm going to utilize a formatter
, specifically, set_major_formatter
. This formatter
accepts either a str, a function, or a pre-built formatter
instance. We'll discuss each of these three options in the following sections.
Examples
Tick Formatting - String
Intro
Our first example will show how to format using a str.method()
argument. When doing this, we use typical str.format()
with set_major_formatter
. To do this, you pass an x
with a colon inside the {}
. Let's see how a typical line would look.
So how does it work? To highlight the different formatting options. I'll use a simple graph as a template.
With this Source Code underneath:
df_sample = pd.DataFrame([400, 200, 200, 800, 100, 0, 1200, 700, 800, 700, 200, 400],
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
ax = df_sample.plot.bar()
ax.set_title('Sample')
ax.set_xlabel('X-Axis')
ax.set_ylabel('Y-Axis')
ax.legend().remove()
I gave the plot a title and label for the x and y axes but no other major formatting.
Now let's see how we can pretty this up!
Sample 1: Dollars
SO, let's say the y-axis represents money and I want to show it in dollars. Specifically, I'd like to add '$' and commas separating the 0s. Let's see what happens when I add one line of Code
ax.yaxis.set_major_formatter('${x:,.0f}')
This is typical str.format()
with the ,
used to separate 0s and .0f
to signify the number of decimal places (0, in this case).
Sample 2: Percentages
What about percentages. Let's show the y-axis with percentage sign and 2 decimal places. I can use our familar syntax but with a tweak:
ax.yaxis.set_major_formatter('%{x:,.2f}')
Sample 3: Additional Text
How does it look with a longer string? Let's have a little fun.
ax.yaxis.set_major_formatter('Total Days {x:,.0f} until the Apocalypse')
This is bad form, but you can see how the plot adjusts to show the full sentence with a comma separating the 0s.
Tick Formatting - Pre-Built Formatter Instance
Intro
Besides string format, another option is pre-built formatter instances that matplotlib has which can also be utilized for more specific number representation. A few common options have to do with dates, engineering notation, etc. We'll explore a few here.
Sample 4a: Date.Formatter (Manual)
Let's use the same dataset we used above and see how Date.Formatter looks.
As you can see, we got the numbers to appear as dates. How did we do that? We had to import matplotlib.dates
and then use Date.Formatter
I used the same dataset to easily see how the formatting works but it's important to note how int
or float
types are converted to date times. In Date.Formatter
, this is "done by converting date instances into days since an epoch (by default 1970-01-01T00:00:00)" (from Matplotlib.org). Why 1970? Who knows, but I agree that everything before 1970 could have been created by a simulation.
Let's look at the Source Code underneath to see what we added. The first line import matplotlib.dates as mdates
import matplotlib.dates as mdates
df_sample = pd.DataFrame([400, 200, 200, 800, 100, 0, 1200, 700, 800, 700, 200, 400],
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
ax = df_sample.plot.bar()
ax.set_title('Sample')
ax.set_xlabel('X-Axis')
ax.set_ylabel('Y-Axis')
ax.legend().remove()
ax.yaxis.set_major_formatter(mdates.DateFormatter("%Y-%b"))
Note the last line we used here. I used the mdates.DateFormatter
function to show the Year and the 3-Letter Month using ("%Y-%b"). I could use it to display a number a few ways like "%m-%d-%y" (Month-Day-Year), for example. These are both considered manual ways to utilize DateFormatter
.
Sample 4b: Date (Automatic)
For a more automated, layout conscious format there's a function called ConciseDataFormatter
. This automatically configures that date in the most concise way, based on the plot. Unlike previous examples, it requires the location of the tick labels. For simplicity sake, I'll use the existing tick locations using a locater instance called get_major_formatter
. Let's see:
ax.yaxis.set_major_formatter(mdates.ConciseDateFormatter(ax.yaxis.get_major_locator()))
You can see the formatter decided to display just the month at the bottom, the days of the month along the y-axis, and the final year and month at the top. This doesn't always reveal the desired look but it's good to know if you're trying to save space on your figure.
Sample 5: Engineering Notation (Manual)
When displaying scientific units, engineering notation is often the best way. To do this, we need to import the EngFormatter
, and then use set_major_formatter
while specifying our units. So, with our same original dataset and code we would simply add:
from matplotlib.ticker import EngFormatter
and then
ax.yaxis.set_major_formatter(EngFormatter(unit='kg'))
The plot now shows the 'k' representing 1,000 next to the kilogram.
Sample 6: Logarithm Exponents
And of course, for the ever popular Logarithmic Exponents, there is a formatter that will return said exponents, in this case of log base 10 (remember, our y-axis from the original dataset was 0-1200).
The y axis shows the exponent of the log. How did we get 3 and 3.08? (Obnoxious reminder: 10^x = 1000, so x = 3. And 10x = 1200, so x = 3.08).
Other Pre-built Function Formatters
This is the complete list here, courtesy of Madplotlib.org
Tick Formatting - Custom Function Formatter
The third option for the set_major_formatter is to write a custom formatting function. This formatter function would have to take in x
for the value of the tick, and pos
, for the position of the tick. It would then return a str
of what you want displayed.
Summary
This post should familiarize you with how to use the set_major_formatter
function and provide some simple examples. These examples should save you time and are also applicable to other formatters (like set_minor_formatters
). I've also provided additional resources of topics not covered here. This is important because every graph tells a story for your audience. The more your storytelling is clear and concise, the better it is for everyone!
Resources
Matplotlib.org - Axis.set_major_formatter(formatter)
https://matplotlib.org/stable/api/_as_gen/matplotlib.axis.Axis.set_major_formatter.html#matplotlib.axis.Axis.set_major_formatter
Anatomy of a Figure
https://matplotlib.org/stable/gallery/showcase/anatomy.html
Matplotlib.org - Tick Locating and Formatting
https://matplotlib.org/stable/api/ticker_api.html#tick-locating-and-formatting
Matplotlib.org - Date Tick Labels
https://matplotlib.org/stable/gallery/text_labels_and_annotations/date.html#sphx-glr-gallery-text-labels-and-annotations-date-py
Top comments (0)