Oscar Leo

Posted on Aug 31, 2023

How to Create a Lipstick Chart With Matplotlib

#datavisualization #python #matplotlib #tutorial

Today, I will show you how to create a lipstick chart for visualizing progress on metrics where the lower the value, the better.

I've prepared a simple dataset about mortality and diseases, so that you can focus on creating the visualization.

The data comes from the World Bank, and if you want to learn more, I've written about the visualization in my new free newsletter Data Wonder.

Let's get started.

Step 1 - Importing libraries

The first and simplest part is to import the required libraries like pandas and matplotlib.

import numpy as np
import pandas as pd

import seaborn as sns
import matplotlib.pyplot as plt

from PIL import Image
from matplotlib.lines import Line2D

Congratulation, you just completed step 1! 🥳

Step 2 - Create a Seaborn style

Next, I want to create a color-scheme and select a font. Sites like Coolors and Colorhunt are great resources when searching for beautiful colors.

Here's my code and settings to create the seaborn style for this tutorial.

FONT_FAMILY = "serif"
BACKGROUND_COLOR = "#FAE8E0"
TEXT_COLOR = "#33261D"
BAR_COLOR = "#EF7C8E"

sns.set_style({
    "axes.facecolor": BACKGROUND_COLOR,
    "figure.facecolor": BACKGROUND_COLOR,

    "text.color": TEXT_COLOR,
    "font.family": FONT_FAMILY,

    "xtick.bottom": False,
    "xtick.top": False,
    "ytick.left": False,
    "ytick.right": False,

    "axes.spines.left": False,
    "axes.spines.bottom": False,
    "axes.spines.right": False,
    "axes.spines.top": False,
})

I'm removing all the ticks and lines to create a clean visualization, and the grids don't add any valuable information to our lipstick chart.

Step 3 - Reading the data

You can read the CSV directly from the url as I do in the code below.

df = pd.read_csv(
    "https://raw.githubusercontent.com/oscarleoo/matplotlib-tutorial-data/main/mortality-and-decease.csv"
)

Here's what the dataframe looks like.

Most values are self-explanatory except for per, which shows each row's scale. For example, the latest "Maternal mortality" value was 223 out of 100,000 births.

Step 4 - Adding bars

Now it's time to add some data.

I'm adding bars for both 2000 and the latest values. Since my goal is to show the relative decrease for each value, I'm dividing each row by its 2000 value.

That means each bar for 2000 will reach 1, so it's only a visual helper and doesn't add any additional information.

Here's my function to add bars.

def add_bars(ax, x, width, alpha, label):
    sns.barplot(
        ax=ax, x=x, y=[i for i in range(len(x))], label=label,
        width=width, alpha=alpha,
        color=BAR_COLOR,
        edgecolor=TEXT_COLOR,
        orient="h"
    )

I create a figure and run the add_bars() function like this.

fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(18, 2.7 * len(df)))

add_bars(
    ax=ax, x=df["2000"] / df["2000"],
    width=0.55, alpha=0.2, label="2000"
)

add_bars(
    ax=ax, x=df["latest_value"] / df["2000"],
    width=0.7, alpha=1, label="Latest"
)

The result for the code we have so far looks like this.

Let's continue.

Step 5 - Formatting the axes

The names for each row are to long to use without line-breaks. That's why I created the following function to add \n to the strings in a few places.

def split_name(name, limit=20):
    split = name.split()
    s = ""

    for s_ in split:
        if len(s.split("\n")[-1] + s_) > limit:
            s += "\n" + s_
        else:
            s += " " + s_

    return s.strip()

I also want increase the fontsize and remove unnecessary information to make the chart readable. The code to create the visualization now looks like this.

fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(18, 2.7 * len(df)))
...

ax.set(xlabel=None, ylabel=None, xticks=[])
ax.tick_params("y", labelsize=28, pad=32)
ax.tick_params("x", labelsize=20, pad=16)

ax.set_yticks(
    ticks=[i for i in range(len(df))],
    labels=[split_name(n, limit=19) for n in df["indicator_name"]],
    linespacing=1.7, va="center"
)

Here's the updated results.

Let's add some additional information.

Step 5 - Adding useful information

You always want to make sure that the users understand what they are looking at. Right now, we have no such information.

For starters, I want to add the current values, which I do with the following function.

def add_info_text(ax, row, index):
    value = round(row["latest_value"], 1)
    per = row["per"]
    year = row["latest_year"]
    text = "{:,} out of\n{:,} ({})".format(value, per, year)

    ax.annotate(
        text=text, 
        xy=(0.02, index), 
        color="#fff", 
        fontsize=24,
        va="center", 
        linespacing=1.7
    )

And since the purpose is to show the relative decrease of each metric compared to its value from 2000, I have another function showing the change for each row.

def add_change_text(ax, row, index):
    change = round(100 * row["change"], 1)
    text = "{:,}%".format(change)
    x = row["latest_value"] / row["2000"] + 0.02

    ax.annotate(
        text="{:,}%".format(change), xy=(x, index), fontsize=22,
        va="center",  linespacing=1.7
    )

I add both functions under a for-loop.

fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(18, 2.7 * len(df)))
...

for index, row in df.reset_index().iterrows():
    add_info_text(ax, row, index)
    add_change_text(ax, row, index)

Here's the output.

It's starting to look good.

Step 6 - Adding a title and legend

In this step, I'm simply using some built in Matplotlib functions to add a title and legend. Since we defined label in add_bars() much of the styling is automatic.

Apart from defining a title and legend, I'm also adding a border using Line2D for visual effect.

fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(18, 2.7 * len(df)))
...

line = Line2D([-0.33, 1.0], [-0.9, -0.9], color=TEXT_COLOR)
line.set_clip_on(False)
ax.add_artist(line)

title = "Lipstick Chart - Relative\nDecreases Compared\nto 2000"
plt.title(title, x=-0.32, y=1.11, fontsize=58, ha="left", linespacing=1.6)
plt.legend(bbox_to_anchor=(0.75, 1.14), loc='lower center', borderaxespad=0, ncol=1, fontsize=44, edgecolor="#FAE8E0")

Here's what the chart looks like now.

Step 7 - Creating an image and adding padding

The chart looks a bit cramped, so the last step is to add some padding. I'm doing that by turning the figure into a PIL Image with the following function.

def create_image_from_figure(fig):
    plt.tight_layout()

    fig.canvas.draw()
    data = np.frombuffer(fig.canvas.tostring_rgb(), dtype=np.uint8)
    data = data.reshape((fig.canvas.get_width_height()[::-1]) + (3,))
    plt.close() 

    return Image.fromarray(data)

And here's the function to add padding.

def add_padding_to_chart(chart, left, top, right, bottom, background):
    size = chart.size
    image = Image.new("RGB", (size[0] + left + right, size[1] + top + bottom), background)
    image.paste(chart, (left, top))
    return image

We have now written all code required to create the data visualization we aimed for.

Here's the full code snippet that uses all functions to create the final lipstick chart.

fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(18, 2.7 * len(df)))

add_bars(
    ax=ax, x=df["2000"] / df["2000"],
    width=0.55, alpha=0.2, label="2000"
)

add_bars(
    ax=ax, x=df["latest_value"] / df["2000"],
    width=0.7, alpha=1, label="Latest"
)

ax.set(xlabel=None, ylabel=None, xticks=[])
ax.tick_params("y", labelsize=28, pad=32)
ax.tick_params("x", labelsize=20, pad=16)

ax.set_yticks(
    ticks=[i for i in range(len(df))],
    labels=[split_name(n, limit=20) for n in df["indicator_name"]],
    linespacing=1.7, va="center"
)

for index, row in df.reset_index().iterrows():
    add_info_text(ax, row, index)
    add_change_text(ax, row, index)

line = Line2D([-0.35, 1.0], [-0.9, -0.9], color=TEXT_COLOR)
line.set_clip_on(False)
ax.add_artist(line)

title = "Lipstick Chart - Relative\nDecreases Compared\nto 2000"
plt.title(title, x=-0.32, y=1.11, fontsize=58, ha="left", linespacing=1.6)
plt.legend(bbox_to_anchor=(0.75, 1.14), loc='lower center', borderaxespad=0, ncol=1, fontsize=44, edgecolor="#FAE8E0")

image = create_image_from_figure(fig)
image = add_padding_to_chart(image, 20, 50, 10, 50, BACKGROUND_COLOR)

And here's the finished product.

We're done!

Conclusion

Thank you for reading this tutorial; I hope you learned some tricks you can reuse for your data visualization projects.

If you want to see more tutorials and beautiful data visualizations, follow me here, subscribe to Data Wonder, and to oscarl3o on Twitter.

See you next time.

DEV Community

How to Create a Lipstick Chart With Matplotlib

Step 1 - Importing libraries

Step 2 - Create a Seaborn style

Step 3 - Reading the data

Step 4 - Adding bars

Step 5 - Formatting the axes

Step 5 - Adding useful information

Step 6 - Adding a title and legend

Step 7 - Creating an image and adding padding

Conclusion

Top comments (0)

Read next

Solutions to Fix WHEA_UNCORRECTABLE_ERROR Blue Screen Error in Win11（Fix 1-3）

How to use TURN server with PeerJs

What's Really Going on Behind the Scenes of Loading Bars? 🤔💡

Implement Face ID Authentication in the iOS App