DEV Community

Sean Atukorala
Sean Atukorala

Posted on

10 1

How to create an Age Distribution Graph Using Python, Pandas and Seaborn

Have your ever wondered how to create an age distribution graph using Python, Pandas and Seaborn? If so, keep reading in order to find out how!

Here the graph we'll learn to build in this tutorial

Figure 1: Here the graph we'll learn to build in this tutorial

Setup

First, here is the GitHub repo for this tutorial: Kaggle Titanic Project

We'll be working with the contents in the file age-distribution-graph.ipynb for this tutorial.

Note: We'll be working with Jupyter Notebook for this tutorial so if you don't have it installed you can do so in the official Jupyter website

Development

After opening up age-distribution-graph.ipynb you'll notice that the code is divided up into blocks that can be run individually.

Let's go through each code block one by one:



import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
import os
warnings.filterwarnings("ignore")



Enter fullscreen mode Exit fullscreen mode

Here we are importing all the neccessary libraries for constructing the Histograph that we're about to build. We'll be using Seaborn to create the Histograph using its histplot method(more on that method in their docs page)
The warnings.filterwarnings("ignore") line is making sure to never print warnings that match an ordered list of filter specifications(more on warnings.filter() in their official docs page)

Next, we add the following code block:



def read_data():
    train_data = pd.read_csv("data/train.csv")
    test_data = pd.read_csv("data/test.csv")
    return train_data, test_data

train_data, test_data = read_data()


Enter fullscreen mode Exit fullscreen mode

Here we're defining the read_data() method, which is responsible for loading the data contained in a .csv file into a Pandas DataFrame object(more on DataFrame in their official docs).
Now the train_data variable contains the training data and the test_data variable containing the testing data.

Next we can add the following code:



def survived_age_table(feature):
    sns.histplot(data=train_data, x='Age', hue='Survived', palette=['yellow', 'green']).set_title(f"{feature} Vs Survived")
    plt.legend(labels=['Died', 'Survived'])
    plt.show()



Enter fullscreen mode Exit fullscreen mode

This method is responsible for creating the age distribution graph. Here are some more details about it:

  • First we create the histogram by calling the method sns.histplot()(more on this method can be found in their official docs).
  • The data parameter takes an input data structure, which is a pandas.DataFrame in our case.
  • The x parameter specifies the variable subject to being counted, which in this case is the Age variable. Assigning a variable to the hue parameter, Survived in our case, would be an instance of conditional subsetting, whereby a seperate histogram containing its own unique values and colors will be rendered in the same graph.
  • The palette parameter is a way to choose the colors to use when mapping the hue variable.
  • Finally, we can set the title of the histogram via set_title()
  • The plt.legend() method is a way to customize the legends displayed in the legend box located in the top right of the histogram.
  • Lastly, plt.show() displays our histogram.

And here is our finished histogram:

Our Finished Histogram

Figure 2: Our Finished Histogram

Thanks for following along and I hope this article was helpful to you.

Conclusion

Well that's it for this post! Thanks for following along in this article and if you have any questions or concerns please feel free to post a comment in this post and I will get back to you when I find the time.

If you found this article helpful please share it and make sure to follow me on Twitter and GitHub, connect with me on LinkedIn and subscribe to my YouTube channel.

Image of Timescale

Timescale – the developer's data platform for modern apps, built on PostgreSQL

Timescale Cloud is PostgreSQL optimized for speed, scale, and performance. Over 3 million IoT, AI, crypto, and dev tool apps are powered by Timescale. Try it free today! No credit card required.

Try free

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Explore a sea of insights with this enlightening post, highly esteemed within the nurturing DEV Community. Coders of all stripes are invited to participate and contribute to our shared knowledge.

Expressing gratitude with a simple "thank you" can make a big impact. Leave your thanks in the comments!

On DEV, exchanging ideas smooths our way and strengthens our community bonds. Found this useful? A quick note of thanks to the author can mean a lot.

Okay