DEV Community

Cover image for How to Create a Lipstick Chart With Matplotlib
Oscar Leo
Oscar Leo

Posted on

How to Create a Lipstick Chart With Matplotlib

Today, I will show you how to create a lipstick chart for visualizing progress on metrics where the lower the value, the better.

I've prepared a simple dataset about mortality and diseases, so that you can focus on creating the visualization.

The data comes from the World Bank, and if you want to learn more, I've written about the visualization in my new free newsletter Data Wonder.

Let's get started.


Step 1 - Importing libraries

The first and simplest part is to import the required libraries like pandas and matplotlib.

import numpy as np
import pandas as pd

import seaborn as sns
import matplotlib.pyplot as plt

from PIL import Image
from matplotlib.lines import Line2D
Enter fullscreen mode Exit fullscreen mode

Congratulation, you just completed step 1! 🥳


Step 2 - Create a Seaborn style

Next, I want to create a color-scheme and select a font. Sites like Coolors and Colorhunt are great resources when searching for beautiful colors.

Here's my code and settings to create the seaborn style for this tutorial.

FONT_FAMILY = "serif"
BACKGROUND_COLOR = "#FAE8E0"
TEXT_COLOR = "#33261D"
BAR_COLOR = "#EF7C8E"

sns.set_style({
    "axes.facecolor": BACKGROUND_COLOR,
    "figure.facecolor": BACKGROUND_COLOR,

    "text.color": TEXT_COLOR,
    "font.family": FONT_FAMILY,

    "xtick.bottom": False,
    "xtick.top": False,
    "ytick.left": False,
    "ytick.right": False,

    "axes.spines.left": False,
    "axes.spines.bottom": False,
    "axes.spines.right": False,
    "axes.spines.top": False,
})
Enter fullscreen mode Exit fullscreen mode

I'm removing all the ticks and lines to create a clean visualization, and the grids don't add any valuable information to our lipstick chart.


Step 3 - Reading the data

You can read the CSV directly from the url as I do in the code below.

df = pd.read_csv(
    "https://raw.githubusercontent.com/oscarleoo/matplotlib-tutorial-data/main/mortality-and-decease.csv"
)
Enter fullscreen mode Exit fullscreen mode

Here's what the dataframe looks like.

Screenshot of the dataframe

Most values are self-explanatory except for per, which shows each row's scale. For example, the latest "Maternal mortality" value was 223 out of 100,000 births.


Step 4 - Adding bars

Now it's time to add some data.

I'm adding bars for both 2000 and the latest values. Since my goal is to show the relative decrease for each value, I'm dividing each row by its 2000 value.

That means each bar for 2000 will reach 1, so it's only a visual helper and doesn't add any additional information.

Here's my function to add bars.

def add_bars(ax, x, width, alpha, label):
    sns.barplot(
        ax=ax, x=x, y=[i for i in range(len(x))], label=label,
        width=width, alpha=alpha,
        color=BAR_COLOR,
        edgecolor=TEXT_COLOR,
        orient="h"
    )
Enter fullscreen mode Exit fullscreen mode

I create a figure and run the add_bars() function like this.

fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(18, 2.7 * len(df)))

add_bars(
    ax=ax, x=df["2000"] / df["2000"],
    width=0.55, alpha=0.2, label="2000"
)

add_bars(
    ax=ax, x=df["latest_value"] / df["2000"],
    width=0.7, alpha=1, label="Latest"
)
Enter fullscreen mode Exit fullscreen mode

The result for the code we have so far looks like this.

A first bar chart

Let's continue.


Step 5 - Formatting the axes

The names for each row are to long to use without line-breaks. That's why I created the following function to add \n to the strings in a few places.

def split_name(name, limit=20):
    split = name.split()
    s = ""

    for s_ in split:
        if len(s.split("\n")[-1] + s_) > limit:
            s += "\n" + s_
        else:
            s += " " + s_

    return s.strip()
Enter fullscreen mode Exit fullscreen mode

I also want increase the fontsize and remove unnecessary information to make the chart readable. The code to create the visualization now looks like this.

fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(18, 2.7 * len(df)))
...

ax.set(xlabel=None, ylabel=None, xticks=[])
ax.tick_params("y", labelsize=28, pad=32)
ax.tick_params("x", labelsize=20, pad=16)

ax.set_yticks(
    ticks=[i for i in range(len(df))],
    labels=[split_name(n, limit=19) for n in df["indicator_name"]],
    linespacing=1.7, va="center"
)

Enter fullscreen mode Exit fullscreen mode

Here's the updated results.

Barchart with formatted axes

Let's add some additional information.


Step 5 - Adding useful information

You always want to make sure that the users understand what they are looking at. Right now, we have no such information.

For starters, I want to add the current values, which I do with the following function.

def add_info_text(ax, row, index):
    value = round(row["latest_value"], 1)
    per = row["per"]
    year = row["latest_year"]
    text = "{:,} out of\n{:,} ({})".format(value, per, year)

    ax.annotate(
        text=text, 
        xy=(0.02, index), 
        color="#fff", 
        fontsize=24,
        va="center", 
        linespacing=1.7
    )
Enter fullscreen mode Exit fullscreen mode

And since the purpose is to show the relative decrease of each metric compared to its value from 2000, I have another function showing the change for each row.

def add_change_text(ax, row, index):
    change = round(100 * row["change"], 1)
    text = "{:,}%".format(change)
    x = row["latest_value"] / row["2000"] + 0.02

    ax.annotate(
        text="{:,}%".format(change), xy=(x, index), fontsize=22,
        va="center",  linespacing=1.7
    )
Enter fullscreen mode Exit fullscreen mode

I add both functions under a for-loop.

fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(18, 2.7 * len(df)))
...

for index, row in df.reset_index().iterrows():
    add_info_text(ax, row, index)
    add_change_text(ax, row, index)
Enter fullscreen mode Exit fullscreen mode

Here's the output.

Lipstick chart with added information

It's starting to look good.


Step 6 - Adding a title and legend

In this step, I'm simply using some built in Matplotlib functions to add a title and legend. Since we defined label in add_bars() much of the styling is automatic.

Apart from defining a title and legend, I'm also adding a border using Line2D for visual effect.

fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(18, 2.7 * len(df)))
...

line = Line2D([-0.33, 1.0], [-0.9, -0.9], color=TEXT_COLOR)
line.set_clip_on(False)
ax.add_artist(line)

title = "Lipstick Chart - Relative\nDecreases Compared\nto 2000"
plt.title(title, x=-0.32, y=1.11, fontsize=58, ha="left", linespacing=1.6)
plt.legend(bbox_to_anchor=(0.75, 1.14), loc='lower center', borderaxespad=0, ncol=1, fontsize=44, edgecolor="#FAE8E0")
Enter fullscreen mode Exit fullscreen mode

Here's what the chart looks like now.

Lipstick chart with title and legend


Step 7 - Creating an image and adding padding

The chart looks a bit cramped, so the last step is to add some padding. I'm doing that by turning the figure into a PIL Image with the following function.

def create_image_from_figure(fig):
    plt.tight_layout()

    fig.canvas.draw()
    data = np.frombuffer(fig.canvas.tostring_rgb(), dtype=np.uint8)
    data = data.reshape((fig.canvas.get_width_height()[::-1]) + (3,))
    plt.close() 

    return Image.fromarray(data)
Enter fullscreen mode Exit fullscreen mode

And here's the function to add padding.

def add_padding_to_chart(chart, left, top, right, bottom, background):
    size = chart.size
    image = Image.new("RGB", (size[0] + left + right, size[1] + top + bottom), background)
    image.paste(chart, (left, top))
    return image
Enter fullscreen mode Exit fullscreen mode

We have now written all code required to create the data visualization we aimed for.

Here's the full code snippet that uses all functions to create the final lipstick chart.

fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(18, 2.7 * len(df)))

add_bars(
    ax=ax, x=df["2000"] / df["2000"],
    width=0.55, alpha=0.2, label="2000"
)

add_bars(
    ax=ax, x=df["latest_value"] / df["2000"],
    width=0.7, alpha=1, label="Latest"
)

ax.set(xlabel=None, ylabel=None, xticks=[])
ax.tick_params("y", labelsize=28, pad=32)
ax.tick_params("x", labelsize=20, pad=16)

ax.set_yticks(
    ticks=[i for i in range(len(df))],
    labels=[split_name(n, limit=20) for n in df["indicator_name"]],
    linespacing=1.7, va="center"
)

for index, row in df.reset_index().iterrows():
    add_info_text(ax, row, index)
    add_change_text(ax, row, index)

line = Line2D([-0.35, 1.0], [-0.9, -0.9], color=TEXT_COLOR)
line.set_clip_on(False)
ax.add_artist(line)

title = "Lipstick Chart - Relative\nDecreases Compared\nto 2000"
plt.title(title, x=-0.32, y=1.11, fontsize=58, ha="left", linespacing=1.6)
plt.legend(bbox_to_anchor=(0.75, 1.14), loc='lower center', borderaxespad=0, ncol=1, fontsize=44, edgecolor="#FAE8E0")

image = create_image_from_figure(fig)
image = add_padding_to_chart(image, 20, 50, 10, 50, BACKGROUND_COLOR)
Enter fullscreen mode Exit fullscreen mode

And here's the finished product.

Final lipstick chart

We're done!


Conclusion

Thank you for reading this tutorial; I hope you learned some tricks you can reuse for your data visualization projects.

If you want to see more tutorials and beautiful data visualizations, follow me here, subscribe to Data Wonder, and to oscarl3o on Twitter.

See you next time.

Top comments (0)