Today, I will show you how to create a lipstick chart for visualizing progress on metrics where the lower the value, the better.
I've prepared a simple dataset about mortality and diseases, so that you can focus on creating the visualization.
The data comes from the World Bank, and if you want to learn more, I've written about the visualization in my new free newsletter Data Wonder.
Let's get started.
Step 1 - Importing libraries
The first and simplest part is to import the required libraries like pandas and matplotlib.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from PIL import Image
from matplotlib.lines import Line2D
Congratulation, you just completed step 1! 🥳
Step 2 - Create a Seaborn style
Next, I want to create a color-scheme and select a font. Sites like Coolors and Colorhunt are great resources when searching for beautiful colors.
Here's my code and settings to create the seaborn style for this tutorial.
FONT_FAMILY = "serif"
BACKGROUND_COLOR = "#FAE8E0"
TEXT_COLOR = "#33261D"
BAR_COLOR = "#EF7C8E"
sns.set_style({
"axes.facecolor": BACKGROUND_COLOR,
"figure.facecolor": BACKGROUND_COLOR,
"text.color": TEXT_COLOR,
"font.family": FONT_FAMILY,
"xtick.bottom": False,
"xtick.top": False,
"ytick.left": False,
"ytick.right": False,
"axes.spines.left": False,
"axes.spines.bottom": False,
"axes.spines.right": False,
"axes.spines.top": False,
})
I'm removing all the ticks and lines to create a clean visualization, and the grids don't add any valuable information to our lipstick chart.
Step 3 - Reading the data
You can read the CSV directly from the url as I do in the code below.
df = pd.read_csv(
"https://raw.githubusercontent.com/oscarleoo/matplotlib-tutorial-data/main/mortality-and-decease.csv"
)
Here's what the dataframe looks like.
Most values are self-explanatory except for per
, which shows each row's scale. For example, the latest "Maternal mortality" value was 223 out of 100,000 births.
Step 4 - Adding bars
Now it's time to add some data.
I'm adding bars for both 2000 and the latest values. Since my goal is to show the relative decrease for each value, I'm dividing each row by its 2000 value.
That means each bar for 2000 will reach 1, so it's only a visual helper and doesn't add any additional information.
Here's my function to add bars.
def add_bars(ax, x, width, alpha, label):
sns.barplot(
ax=ax, x=x, y=[i for i in range(len(x))], label=label,
width=width, alpha=alpha,
color=BAR_COLOR,
edgecolor=TEXT_COLOR,
orient="h"
)
I create a figure and run the add_bars()
function like this.
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(18, 2.7 * len(df)))
add_bars(
ax=ax, x=df["2000"] / df["2000"],
width=0.55, alpha=0.2, label="2000"
)
add_bars(
ax=ax, x=df["latest_value"] / df["2000"],
width=0.7, alpha=1, label="Latest"
)
The result for the code we have so far looks like this.
Let's continue.
Step 5 - Formatting the axes
The names for each row are to long to use without line-breaks. That's why I created the following function to add \n
to the strings in a few places.
def split_name(name, limit=20):
split = name.split()
s = ""
for s_ in split:
if len(s.split("\n")[-1] + s_) > limit:
s += "\n" + s_
else:
s += " " + s_
return s.strip()
I also want increase the fontsize and remove unnecessary information to make the chart readable. The code to create the visualization now looks like this.
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(18, 2.7 * len(df)))
...
ax.set(xlabel=None, ylabel=None, xticks=[])
ax.tick_params("y", labelsize=28, pad=32)
ax.tick_params("x", labelsize=20, pad=16)
ax.set_yticks(
ticks=[i for i in range(len(df))],
labels=[split_name(n, limit=19) for n in df["indicator_name"]],
linespacing=1.7, va="center"
)
Here's the updated results.
Let's add some additional information.
Step 5 - Adding useful information
You always want to make sure that the users understand what they are looking at. Right now, we have no such information.
For starters, I want to add the current values, which I do with the following function.
def add_info_text(ax, row, index):
value = round(row["latest_value"], 1)
per = row["per"]
year = row["latest_year"]
text = "{:,} out of\n{:,} ({})".format(value, per, year)
ax.annotate(
text=text,
xy=(0.02, index),
color="#fff",
fontsize=24,
va="center",
linespacing=1.7
)
And since the purpose is to show the relative decrease of each metric compared to its value from 2000, I have another function showing the change for each row.
def add_change_text(ax, row, index):
change = round(100 * row["change"], 1)
text = "{:,}%".format(change)
x = row["latest_value"] / row["2000"] + 0.02
ax.annotate(
text="{:,}%".format(change), xy=(x, index), fontsize=22,
va="center", linespacing=1.7
)
I add both functions under a for-loop.
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(18, 2.7 * len(df)))
...
for index, row in df.reset_index().iterrows():
add_info_text(ax, row, index)
add_change_text(ax, row, index)
Here's the output.
It's starting to look good.
Step 6 - Adding a title and legend
In this step, I'm simply using some built in Matplotlib functions to add a title and legend. Since we defined label
in add_bars()
much of the styling is automatic.
Apart from defining a title and legend, I'm also adding a border using Line2D
for visual effect.
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(18, 2.7 * len(df)))
...
line = Line2D([-0.33, 1.0], [-0.9, -0.9], color=TEXT_COLOR)
line.set_clip_on(False)
ax.add_artist(line)
title = "Lipstick Chart - Relative\nDecreases Compared\nto 2000"
plt.title(title, x=-0.32, y=1.11, fontsize=58, ha="left", linespacing=1.6)
plt.legend(bbox_to_anchor=(0.75, 1.14), loc='lower center', borderaxespad=0, ncol=1, fontsize=44, edgecolor="#FAE8E0")
Here's what the chart looks like now.
Step 7 - Creating an image and adding padding
The chart looks a bit cramped, so the last step is to add some padding. I'm doing that by turning the figure into a PIL Image with the following function.
def create_image_from_figure(fig):
plt.tight_layout()
fig.canvas.draw()
data = np.frombuffer(fig.canvas.tostring_rgb(), dtype=np.uint8)
data = data.reshape((fig.canvas.get_width_height()[::-1]) + (3,))
plt.close()
return Image.fromarray(data)
And here's the function to add padding.
def add_padding_to_chart(chart, left, top, right, bottom, background):
size = chart.size
image = Image.new("RGB", (size[0] + left + right, size[1] + top + bottom), background)
image.paste(chart, (left, top))
return image
We have now written all code required to create the data visualization we aimed for.
Here's the full code snippet that uses all functions to create the final lipstick chart.
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(18, 2.7 * len(df)))
add_bars(
ax=ax, x=df["2000"] / df["2000"],
width=0.55, alpha=0.2, label="2000"
)
add_bars(
ax=ax, x=df["latest_value"] / df["2000"],
width=0.7, alpha=1, label="Latest"
)
ax.set(xlabel=None, ylabel=None, xticks=[])
ax.tick_params("y", labelsize=28, pad=32)
ax.tick_params("x", labelsize=20, pad=16)
ax.set_yticks(
ticks=[i for i in range(len(df))],
labels=[split_name(n, limit=20) for n in df["indicator_name"]],
linespacing=1.7, va="center"
)
for index, row in df.reset_index().iterrows():
add_info_text(ax, row, index)
add_change_text(ax, row, index)
line = Line2D([-0.35, 1.0], [-0.9, -0.9], color=TEXT_COLOR)
line.set_clip_on(False)
ax.add_artist(line)
title = "Lipstick Chart - Relative\nDecreases Compared\nto 2000"
plt.title(title, x=-0.32, y=1.11, fontsize=58, ha="left", linespacing=1.6)
plt.legend(bbox_to_anchor=(0.75, 1.14), loc='lower center', borderaxespad=0, ncol=1, fontsize=44, edgecolor="#FAE8E0")
image = create_image_from_figure(fig)
image = add_padding_to_chart(image, 20, 50, 10, 50, BACKGROUND_COLOR)
And here's the finished product.
We're done!
Conclusion
Thank you for reading this tutorial; I hope you learned some tricks you can reuse for your data visualization projects.
If you want to see more tutorials and beautiful data visualizations, follow me here, subscribe to Data Wonder, and to oscarl3o on Twitter.
See you next time.
Top comments (0)