New chapter in Learning data analytics and data science. The focus now is on Pandas as a Python library alongside Matplotlib and Seaborn for data visualization.
Am writing this article to guide beginners who are already or beginning the data analytics and data science profession.
Introduction to Python Data Analysis
In modern world Data has become most valuable asset. Business, healthcare institutions, financial and even social media platforms rely heavily on data to make informed decisions.
Raw data is of no use and that where Data Analysis and visualization becomes important.
One of most used programming language is Python for data analysis because of its simplicity and powerful libraries.
Commonly used libraries are Pandas, Matplotlib and Seaborn.
Pandas: Helps in cleaning, organizing and analyzing data.
Matplotlib and Seaborn: are used to create visual representations of data.
This article introduces Pandas and explains how Matplotlib and Seaborn can be used for effective data visualization in a begginer friendly way.
What is Pandas and Why it Matters
Pandas is open source Python Library used for data manipulation and analysis. It provides simple and efficient tools for working with structured data eg CSV, spreadsheets and databases.
Main data structures in Pandas are:
- Series: it is a one-dimensional array like structure.
import pandas as pd
age = {"Age":[18, 20, 23, 40, 50, 24]}
series = pd.DataFrame(age)
series
#Output:
Age
0 18
1 20
2 23
3 40
4 50
5 24
- DataFrame: It is two dimensional table similar to an Excel spreadsheet or SQL table.
students = {"Name":["Mark", "John", "Nancy"],
"Grade":["A", "B", "C"],
"Course":["Data Science", "Data Engineering", "Data Analytics"]}
student_grades = pd.DataFrame(students)
student_grades
#Output:
Name Grade Course
0 Mark A Data Science
1 John B Data Engineering
2 Nancy C Data Analytics
Pandas matters because it simplifies complex tasks such as:
- Performing calculations and statistical analysis
- Filtering and sorting information
- Cleaning missing or incorrect data
- Reading datasets from files
Before using Pandas, it must be installed, run the following command:
pip install pandas
Then imported into Python script:
import pandas as pd
- Reading Data Pandas can easily read files such as CSV and Excel Example:
import pandas as pd
df = pd.read_csv("students.csv")
print(df.head())
The head() function displays the first five rows of the dataset.
- Checking Data Information When you want to understand the structure of your data:
print(data.info())
print(data.describe())
info() shows column names, data types and missing values
describe() provides statistical summaries such as maximum values, averages
- Handling Missing Values Missing values can affect the analysis results. Checking missing data
print(data.isnull().sum())
Removing missing values:
data = data.dropna()
Filling missing values:
data["Age"] = data["Age"].fillna(data["Age"].mean())
This replaces missing age values with the average age.
Filtering allows users to select specific information
Example:
high_scores = data[data["Score"] > 70]
print(high_scores)
Sorting data:
sorted_data = data.sort_values(by="Score", ascending=False)
print(sorted_data)
These operations help organize data for better understanding and reporting
Data Visualization Fundamentals
Data Visualization: It is the process of representing data graphically using charts, graphs and plots. Visualization makes it easier to identify patterns, trends and relationship in data.
For Example:
- Scatter plots shows relationships between variables
- Bar Charts compares category
- Pie Chart display proportions
- Line Charts show trends over time
Visualizations helps understand large datasets because humans interpret visuals faster than raw numbers.
Python provides powerful visualization libraries, with Matplotlib and Seaborn being among the most widely used.
Using Matplotlib for Charts
Matplotlib is one of the oldest and most flexible visualization libraries in Python. It provides full control over the chart customization
To install Matplotlib
pip install matplotlib
Import it:
import matplotlib.pyplot as plt
Creating a Line Chart
A line chart is used to show trends.
Example:
import pandas as pd
import matplotlib.pyplot as plt
plt.figure(figsize=(6, 3))
sns.lineplot(data=housing_df, x="bathrooms", y="bedrooms")
plt.title("Bathrooms vs Bedrooms")
plt.xlabel("Bathrooms")
plt.ylabel("Bedrooms")
plt.show()
The chart shows relationship between Bathrooms and bedrooms.

Creating Bar Chart
Bar chart compare categories.
Example:
# Average satisfaction score by property type
avg_satisfaction_by_prop = housing_df.groupby("property_type")["satisfaction_score"].mean().sort_values(ascending = False).reset_index()
#Plot
plt.figure(figsize = (6,3))
sns.barplot(data = avg_satisfaction_by_prop, x = "property_type", y = "satisfaction_score")
plt.title("Average Satisfaction Score by Property type")
plt.xlabel("Property Type")
plt.ylabel("Average Satisfaction Score")
plt.show()
The chart compares Average satisfaction per property type
Creating Pie Chart
Pie charts represents percentages.
Example:
furnishing_counts = housing_df["furnishing"].value_counts()
explode = (0.05, 0.05, 0.05)
plt.figure(figsize = (6, 6))
plt.pie(furnishing_counts, explode = explode, labels = furnishing_counts.index, autopct="%1.1f%%")
plt.title("Distribution Of the furnishing status")
plt.show()
Pie Chart:
Matplotlib is highly customizable and allows users to change colors, labels, chart sizes, and grid styles.
Using Seaborn for Statistical Visualizations
Seaborn: It is Python library built on top of Matplotlib. It provides more attractive and advanced statistical visualizations with less code.
Install Seaborns
pip install seaborn
Import it:
import seaborn as sns
Seaborn works smoothly with Pandas DataFrames.
Example dataset:
import pandas as pd
data = { "Student": ["John", "Mary", "Peter", "James"], "Score": [85, 90, 78, 88] }
df = pd.DataFrame(data)
Bar Plot
# Average monthly rent by property type
plt.figure(figsize=(6, 3))
sns.barplot(data=housing_df, x = "property_type", y = "monthly_rent_kes", estimator = "mean", palette = "bright")
plt.title("Average monthly rent by property type")
plt.xlabel("Property Type")
plt.ylabel("Average monthly rent")
plt.xticks(rotation=45)
plt.show()
Seaborn automatically applies better styling than in matplotlib.
Example of more styled Bar Plot:
Histogram
Histograms show data distribution.
Example:
# What is the distribution of monthly rent
plt.figure(figsize = (6, 3))
sns.histplot(data=housing_df, x = "monthly_rent_kes")
plt.title("Distribution of monthly rent")
plt.xlabel("Monthly rent")
plt.ylabel("Number of properties")
plt.show()
This helps determine distribution of Monthly Rent.
Scatter Plot
It reveals relationships between variables.
Example:
# Plotting relationship between bedrooms and bathrooms
plt.figure(figsize=(6, 3))
sns.scatterplot(data=housing_df, x="bathrooms", y="bedrooms")
plt.title("Bathrooms vs Bedrooms")
plt.xlabel("Bathrooms")
plt.ylabel("Bedrooms")
plt.show()
This chart helps to show the relationship between bathrooms and bedrooms.
Heatmap
Heatmaps show relationship between numerical values.
Example:
# Correlation analysis
correlation = housing_df[numerical_columns].corr()
plt.figure(figsize=(6,6))
sns.heatmap(correlation, annot=True, fmt=".2f")
plt.title("Correlation Heatmap for Numerical Variables")
plt.show()
They helps identify strong or weak relationships in datasets.
Heatmap Example
Matplotlib is suitable when detailed customization is required, while Seaborn is ideal for creating visually appealing statistical charts quickly.
In real world, most Analysts use both libraries together because seaborn is built on top of Matplotlib.
Best Practices and Common Mistakes.
Best Practices:
- Always clean data before analysis
- Use of appropriate charts
- Add labels and titles to charts
- Visualization should be simple and easy to read
- Check for missing or duplicate data
Common Mistakes:
- Forgetting labels and or legends
- Ignoring missing values
- Using wrong charts
- Overcrowding charts with too much information
Clear visualization communicates datasets information effectively without confusion.
Conclusion
For analyst to created clear and understandable visualizations, they must use Pandas, Matplotlib, and Seaborn.
Pandas simplifies data cleaning, manipulation, and analysis.
Matplotlib and Seaborn transform raw numbers into meaningful visual insights.
For data analyst or data science beginners, learning these libraries is essential because they are widely used in industries like finance, healthcare, marketing and business intelligence.
Mastering these tools is an important step toward becoming a skilled analyst or data scientist.






Top comments (0)