Oscar Leo

Posted on Aug 29, 2023

How to Create Eye-Catching Country Rankings Using Python and Matplotlib

#python #datascience #tutorial #datavisualization

Hi, and welcome to this tutorial, where I’ll teach you to create a country ranking chart using Python and Matplotlib.

What I like about this visualization is its clean and beautiful way of showing how countries rank compared to each other on a particular metric.

The alternative to using a standard line chart showing the actual values get messy if some countries are close to each other or if some countries outperform others by a lot.

If you want access to the code for this tutorial, you can find it in this GitHub repository.

If you enjoy this tutorial, make sure to check out my other accounts.

Data Wonder on Substack
oscarl3o on Twitter
oscarleo on Medium

Let’s get started.

About the data

I’ve created a simple CSV containing GDP values for today’s ten largest economies for this tutorial.

The data comes from the World Bank, and the full name of the indicator is "GDP (constant 2015 us$)".

If you want to know more about different ways of measuring GDP, you can look at this Medium story, where I use the same type of data visualization.

Let’s get on with the tutorial.

Step 1: Creating rankings

Step one is to rank the countries for each year in the dataset, which is easy to do with pandas.



def create_rankings(df, columns):
    rank_columns = ["rank_{}".format(i) for i in range(len(columns))]
    for i, column in enumerate(columns):
        df[rank_columns[i]] = df[column].rank(ascending=False)

    return df, rank_columns

The resulting columns look like this.

That’s all the preprocessing we need to continue with the data visualization.

Step 2: Creating and styling a grid

Now that we have prepared our data, it’s time to create a grid where we can draw our lines and flags.

Here’s a function using Seaborn that creates the overall style. It defines things like the background color and font family. I’m also removing spines and ticks.



def set_style(font_family, background_color, grid_color, text_color):
    sns.set_style({
        "axes.facecolor": background_color,
        "figure.facecolor": background_color,

        "axes.grid": True,
        "axes.axisbelow": True,

        "grid.color": grid_color,

        "text.color": text_color,
        "font.family": font_family,

        "xtick.bottom": False,
        "xtick.top": False,
        "ytick.left": False,
        "ytick.right": False,

        "axes.spines.left": False,
        "axes.spines.bottom": False,
        "axes.spines.right": False,
        "axes.spines.top": False,
    }
)

I run the function with the following values.



font_family = "PT Mono"
background_color = "#FAF0F1"
text_color = "#080520"
grid_color = "#E4C9C9"

set_style(font_family, background_color, grid_color, text_color)

To create the actual grid, I have a function that formats the y- and x-axis. It takes a few parameters that allow me to try different setups, such as the size of the labels.



def format_ticks(ax, years, padx=0.25, pady=0.5, y_label_size=20, x_label_size=24):
    ax.set(xlim=(-padx, len(years) -1 + padx), ylim=(-len(df) - pady, - pady))

    xticks = [i for i in range(len(years))]
    ax.set_xticks(ticks=xticks, labels=years)

    yticks = [-i for i in range(1, len(df) + 1)]
    ylabels = ["{}".format(i) for i in range(1, len(df) + 1)]
    ax.set_yticks(ticks=yticks, labels=ylabels)

    ax.tick_params("y",labelsize=y_label_size, pad=16)
    ax.tick_params("x", labeltop=True, labelsize=x_label_size, pad=8)

Here’s what it looks like when I run everything we have so far.



# Load data
years = ["2000", "2005", "2010", "2015", "2020", "2022"]
df = pd.read_csv("rankings.csv", index_col=None)
df, rank_columns = create_rankings(df, years)

# Create chart
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(15, 1.6*len(df)))
format_ticks(ax, years)

And here’s the resulting grid.

Now we can start to add some data.

Step 3: Adding lines

I want a line showing each country's rank for each year in the dataset—an easy task in Matplotlib.



def add_line(ax, row, columns, linewidth=3):
    x = [i for i in range(len(columns))]
    y = [-row[rc] for rc in columns]

    ax.add_artist(
        Line2D(x, y, linewidth=linewidth, color=text_color)
    )

Then I run the function for each row in the dataset like this.



# Load data
years = ["2000", "2005", "2010", "2015", "2020", "2022"]
df = pd.read_csv("rankings.csv", index_col=None)
df, rank_columns = create_rankings(df, years)

# Create chart
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(15, 1.6*len(df)))
format_ticks(ax, years)

# Draw lines
for i, row in df.iterrows():
    add_line(ax, row, rank_columns)

I’m using the same color for each line because I want to use country flags to guide the eye. Using a unique color for each line makes sense, but it looks messy.

Step 4: Drawing pie charts

I want to indicate how a country’s economy grows over time without adding text. Instead, I aim to inform in a visual format.

My idea is to draw a pie chart on each point showing the size of a country’s economy compared to its best year.

I’m using PIL to create a pie chart image, but you can use Matplotlib directly. I don’t because I had some issues with aspect ratios.



def add_pie(ax, x, y, ratio, size=572, zoom=0.1):
    image = Image.new('RGBA', (size, size))
    draw = ImageDraw.Draw(image)

    draw.pieslice((0, 0, size, size), start=-90, end=360*ratio-90, fill=text_color, outline=text_color)
    im = OffsetImage(image, zoom=zoom, interpolation="lanczos", resample=True, visible=True)

    ax.add_artist(AnnotationBbox(
        im, (x, y), frameon=False,
        xycoords="data",
    ))

The value for the size parameter is slightly larger than the size of my flag images which are 512x512. Later, I want to paste the flags on the pie charts.

Here’s the updated code.



# Load data
years = ["2000", "2005", "2010", "2015", "2020", "2022"]
df = pd.read_csv("rankings.csv", index_col=None)
df, rank_columns = create_rankings(df, years)

# Create chart
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(15, 1.6*len(df)))
format_ticks(ax, years)

# Draw lines
for i, row in df.iterrows():
    add_line(ax, row, rank_columns)

    for j, rc in enumerate(rank_columns):
        add_pie(ax, j, -row[rc], ratio=row[years[j]] / row[years].max())

And here’s the result.

It’s starting to look informative, so it’s time to make it beautiful.

Step 5: Adding flags

I love using flags in my charts because they are simply beautiful.

Here, the purpose of the flags is to make the chart visually appealing, explain which countries we’re looking at, and guide the eye along the lines.

I’m using these rounded flags. They require a license, so, unfortunately, I can’t share them, but you can find similar flags in other places.

I’ve had some issues getting the pie and flag to align perfectly, so instead of creating a separate function to add a flag, I’m rewriting the add_pie() function.



def add_pie_and_flag(ax, x, y, name, ratio, size=572, zoom=0.1):
    flag = Image.open("<location>/{}.png".format(name.lower()))
    image = Image.new('RGBA', (size, size))
    draw = ImageDraw.Draw(image)
    pad = int((size - 512) / 2)

    draw.pieslice((0, 0, size, size), start=-90, end=360*ratio-90, fill=text_color, outline=text_color)
    image.paste(flag, (pad, pad), flag.split()[-1])

    im = OffsetImage(image, zoom=zoom, interpolation="lanczos", resample=True, visible=True)

    ax.add_artist(AnnotationBbox(
        im, (x, y), frameon=False,
        xycoords="data",
    ))

I add it right after the pie chart function.



# Load data
years = ["2000", "2005", "2010", "2015", "2020", "2022"]
df = pd.read_csv("rankings.csv", index_col=None)
df, rank_columns = create_rankings(df, years)

# Create chart
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(15, 1.6*len(df)))
format_ticks(ax, years)

# Draw lines
for i, row in df.iterrows():
    add_line(ax, row, rank_columns)

    for j, rc in enumerate(rank_columns):
        add_pie_and_flag(
            ax, j, -row[rc], 
            name=row.country_name,
            ratio=row[years[j]] / row[years].max()
        )

And now you can behold the visual magic of using flags. It’s a huge difference compared to the previous output.

We suddenly have something that looks nice and is easy to understand. The last thing to do is to add some helpful information.

Step 5: Adding additional information

Since not everyone knows all the flags by heart, I want to add the country’s name to the right.

I also want to show the size of the economy and how each country compares to the highest ranking.

Here’s my code for doing that.



def add_text(ax, value, max_value, y):
    trillions = round(value / 1e12, 1)
    ratio_to_max = round(100 * value / max_value, 1)

    text = "{}\n${:,}T ({}%)".format(
        row.country_name, 
        trillions,
        ratio_to_max
    )

    ax.annotate(
        text, (1.03, y), 
        fontsize=20,
        linespacing=1.7,
        va="center",
        xycoords=("axes fraction", "data")
    )

As before, I add the function to the main code block. Note that I’m also adding a title.



years = ["2000", "2005", "2010", "2015", "2020", "2022"]
df = pd.read_csv("rankings.csv", index_col=None)
df, rank_columns = create_rankings(df, years)

fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(15, 1.6*len(df)))
format_ticks(ax, years)

for i, row in df.iterrows():
    add_line(ax, row, rank_columns)

    for j, rc in enumerate(rank_columns):
        add_pie_and_flag(
            ax, j, -row[rc], 
            name=row.country_name,
            ratio=row[years[j]] / row[years].max()
        )

    add_text(ax, value=row[years[-1]], max_value=df.iloc[0][years[-1]], y=-(i + 1))
    plt.title("Comparing Today's Largest Economies\nGDP (constant 2015 us$)", linespacing=1.8, fontsize=32, x=0.58, y=1.12)