DEV Community

Cover image for Pandas and Data Visualization Using Matplotlib and Seaborn
Joseous Ng'ash
Joseous Ng'ash

Posted on

Pandas and Data Visualization Using Matplotlib and Seaborn

New chapter in Learning data analytics and data science. The focus now is on Pandas as a Python library alongside Matplotlib and Seaborn for data visualization.
Am writing this article to guide beginners who are already or beginning the data analytics and data science profession.

Introduction to Python Data Analysis

In modern world Data has become most valuable asset. Business, healthcare institutions, financial and even social media platforms rely heavily on data to make informed decisions.
Raw data is of no use and that where Data Analysis and visualization becomes important.

One of most used programming language is Python for data analysis because of its simplicity and powerful libraries.
Commonly used libraries are Pandas, Matplotlib and Seaborn.
Pandas: Helps in cleaning, organizing and analyzing data.
Matplotlib and Seaborn: are used to create visual representations of data.

This article introduces Pandas and explains how Matplotlib and Seaborn can be used for effective data visualization in a begginer friendly way.

What is Pandas and Why it Matters

Pandas is open source Python Library used for data manipulation and analysis. It provides simple and efficient tools for working with structured data eg CSV, spreadsheets and databases.

Main data structures in Pandas are:

  • Series: it is a one-dimensional array like structure.
import pandas as pd
age = {"Age":[18, 20, 23, 40, 50, 24]} 

series = pd.DataFrame(age)
series

#Output: 
Age
0   18
1   20
2   23
3   40
4   50
5   24
Enter fullscreen mode Exit fullscreen mode
  • DataFrame: It is two dimensional table similar to an Excel spreadsheet or SQL table.
students = {"Name":["Mark", "John", "Nancy"],
            "Grade":["A", "B", "C"],
            "Course":["Data Science", "Data Engineering", "Data Analytics"]}

student_grades = pd.DataFrame(students)
student_grades

#Output: 
    Name    Grade   Course
0   Mark    A   Data Science
1   John    B   Data Engineering
2   Nancy   C   Data Analytics
Enter fullscreen mode Exit fullscreen mode

Pandas matters because it simplifies complex tasks such as:

  • Performing calculations and statistical analysis
  • Filtering and sorting information
  • Cleaning missing or incorrect data
  • Reading datasets from files

Before using Pandas, it must be installed, run the following command:

pip install pandas
Enter fullscreen mode Exit fullscreen mode

Then imported into Python script:

 import pandas as pd
Enter fullscreen mode Exit fullscreen mode
  • Reading Data Pandas can easily read files such as CSV and Excel Example:
import pandas as pd

df = pd.read_csv("students.csv")
print(df.head())
Enter fullscreen mode Exit fullscreen mode

The head() function displays the first five rows of the dataset.

  • Checking Data Information When you want to understand the structure of your data:
print(data.info()) 
print(data.describe())
Enter fullscreen mode Exit fullscreen mode

info() shows column names, data types and missing values
describe() provides statistical summaries such as maximum values, averages

  • Handling Missing Values Missing values can affect the analysis results. Checking missing data
 print(data.isnull().sum())
Enter fullscreen mode Exit fullscreen mode

Removing missing values:

data = data.dropna()
Enter fullscreen mode Exit fullscreen mode

Filling missing values:

data["Age"] = data["Age"].fillna(data["Age"].mean())
Enter fullscreen mode Exit fullscreen mode

This replaces missing age values with the average age.
Filtering allows users to select specific information
Example:

high_scores = data[data["Score"] > 70] 
print(high_scores)
Enter fullscreen mode Exit fullscreen mode

Sorting data:

sorted_data = data.sort_values(by="Score", ascending=False) 

print(sorted_data)
Enter fullscreen mode Exit fullscreen mode

These operations help organize data for better understanding and reporting

Data Visualization Fundamentals

Data Visualization: It is the process of representing data graphically using charts, graphs and plots. Visualization makes it easier to identify patterns, trends and relationship in data.

For Example:

  • Scatter plots shows relationships between variables
  • Bar Charts compares category
  • Pie Chart display proportions
  • Line Charts show trends over time

Visualizations helps understand large datasets because humans interpret visuals faster than raw numbers.

Python provides powerful visualization libraries, with Matplotlib and Seaborn being among the most widely used.

Using Matplotlib for Charts

Matplotlib is one of the oldest and most flexible visualization libraries in Python. It provides full control over the chart customization

To install Matplotlib

pip install matplotlib
Enter fullscreen mode Exit fullscreen mode

Import it:

import matplotlib.pyplot as plt
Enter fullscreen mode Exit fullscreen mode

Creating a Line Chart
A line chart is used to show trends.
Example:

import pandas as pd
import matplotlib.pyplot as plt 
plt.figure(figsize=(6, 3))

sns.lineplot(data=housing_df, x="bathrooms", y="bedrooms")
plt.title("Bathrooms vs Bedrooms")
plt.xlabel("Bathrooms")
plt.ylabel("Bedrooms")
plt.show()
Enter fullscreen mode Exit fullscreen mode

The chart shows relationship between Bathrooms and bedrooms.
Line chart

Creating Bar Chart
Bar chart compare categories.
Example:

# Average satisfaction score by property type
avg_satisfaction_by_prop = housing_df.groupby("property_type")["satisfaction_score"].mean().sort_values(ascending = False).reset_index()

#Plot
plt.figure(figsize = (6,3))

sns.barplot(data = avg_satisfaction_by_prop, x = "property_type", y = "satisfaction_score")
plt.title("Average Satisfaction Score by Property type")
plt.xlabel("Property Type")
plt.ylabel("Average Satisfaction Score")
plt.show()
Enter fullscreen mode Exit fullscreen mode

The chart compares Average satisfaction per property type

bar chart

Creating Pie Chart
Pie charts represents percentages.
Example:

furnishing_counts = housing_df["furnishing"].value_counts()

explode = (0.05, 0.05, 0.05)

plt.figure(figsize = (6, 6))

plt.pie(furnishing_counts, explode = explode, labels = furnishing_counts.index, autopct="%1.1f%%")
plt.title("Distribution Of the furnishing status")
plt.show()
Enter fullscreen mode Exit fullscreen mode

Pie Chart:

Pie chart

Matplotlib is highly customizable and allows users to change colors, labels, chart sizes, and grid styles.

Using Seaborn for Statistical Visualizations

Seaborn: It is Python library built on top of Matplotlib. It provides more attractive and advanced statistical visualizations with less code.

Install Seaborns

pip install seaborn
Enter fullscreen mode Exit fullscreen mode

Import it:

import seaborn as sns
Enter fullscreen mode Exit fullscreen mode

Seaborn works smoothly with Pandas DataFrames.

Example dataset:

import pandas as pd 
data = { "Student": ["John", "Mary", "Peter", "James"], "Score": [85, 90, 78, 88] }
df = pd.DataFrame(data)
Enter fullscreen mode Exit fullscreen mode

Bar Plot

# Average monthly rent by property type
plt.figure(figsize=(6, 3))

sns.barplot(data=housing_df, x = "property_type", y = "monthly_rent_kes", estimator = "mean", palette = "bright")
plt.title("Average monthly rent by property type")
plt.xlabel("Property Type")
plt.ylabel("Average monthly rent")
plt.xticks(rotation=45)
plt.show()
Enter fullscreen mode Exit fullscreen mode

Seaborn automatically applies better styling than in matplotlib.
Example of more styled Bar Plot:

Histogram
Histograms show data distribution.
Example:

# What is the distribution of monthly rent
plt.figure(figsize = (6, 3))

sns.histplot(data=housing_df, x = "monthly_rent_kes")
plt.title("Distribution of monthly rent")
plt.xlabel("Monthly rent")
plt.ylabel("Number of properties")
plt.show()
Enter fullscreen mode Exit fullscreen mode

This helps determine distribution of Monthly Rent.

Histogram

Scatter Plot
It reveals relationships between variables.
Example:

# Plotting relationship between bedrooms and bathrooms

plt.figure(figsize=(6, 3))

sns.scatterplot(data=housing_df, x="bathrooms", y="bedrooms")
plt.title("Bathrooms vs Bedrooms")
plt.xlabel("Bathrooms")
plt.ylabel("Bedrooms")
plt.show()
Enter fullscreen mode Exit fullscreen mode

This chart helps to show the relationship between bathrooms and bedrooms.

Scatter plot

Heatmap
Heatmaps show relationship between numerical values.
Example:

# Correlation analysis

correlation = housing_df[numerical_columns].corr()

plt.figure(figsize=(6,6))

sns.heatmap(correlation, annot=True, fmt=".2f")
plt.title("Correlation Heatmap for Numerical Variables")
plt.show()
Enter fullscreen mode Exit fullscreen mode

They helps identify strong or weak relationships in datasets.

Heatmap Example

Heatmap

Matplotlib is suitable when detailed customization is required, while Seaborn is ideal for creating visually appealing statistical charts quickly.
In real world, most Analysts use both libraries together because seaborn is built on top of Matplotlib.

Best Practices and Common Mistakes.

Best Practices:

  • Always clean data before analysis
  • Use of appropriate charts
  • Add labels and titles to charts
  • Visualization should be simple and easy to read
  • Check for missing or duplicate data

Common Mistakes:

  • Forgetting labels and or legends
  • Ignoring missing values
  • Using wrong charts
  • Overcrowding charts with too much information

Clear visualization communicates datasets information effectively without confusion.

Conclusion

For analyst to created clear and understandable visualizations, they must use Pandas, Matplotlib, and Seaborn.

Pandas simplifies data cleaning, manipulation, and analysis.
Matplotlib and Seaborn transform raw numbers into meaningful visual insights.

For data analyst or data science beginners, learning these libraries is essential because they are widely used in industries like finance, healthcare, marketing and business intelligence.
Mastering these tools is an important step toward becoming a skilled analyst or data scientist.

Top comments (0)