Tilda Udufo

Posted on Dec 12

How to Analyze CSV Files with Python and Pandas

#datascience #tutorial #beginners #python

CSV files (short for Comma-Separated Values) are one of the simplest and most common formats for storing data. You’ll find them everywhere, from bank statements to app logs.

At first glance, a CSV is just a plain text file with rows and columns. But when the file gets large, or when you want to ask meaningful questions like “How much did I spend last month?”, “Which category do I spend the most money on?” or “What trends can I see over time?” — spreadsheets can feel limiting.

That’s where Python and Pandas shine. Pandas is a Python library that makes it easy to load, clean, analyze, and visualize data.

In this tutorial, we’ll walk step by step through how to:

Load CSV data into Pandas
Summarize totals and averages
Group and filter data
Find key insights (like your biggest expenses)
Visualize results with charts

We’ll use a small expense-tracking dataset as our example, but the techniques you’ll learn apply to any CSV file, whether it’s sales data, student grades, or survey results.

Prerequisites

To follow along, you’ll need:

Python 3.x installed on your machine.

Note: Pandas requires Python 3. If your system still defaults to Python 2 when you type python, you may need to use python3 instead.

The Pandas and Matplotlib libraries. If you don’t have them yet, install them with:

pip install pandas matplotlib

The Sample Dataset

Let’s imagine you want to keep track of your daily spending. You log each transaction into a CSV file with four columns:

Date – when the expense happened
Category – what type of expense it was (Food, Transport, Utilities, etc.)
Description – a short note about the expense
Amount – how much it cost

We’ll use a sample file called expenses.csv as our dataset throughout this tutorial:

Date,Category,Description,Amount
2025-01-03,Food,Coffee,3.50
2025-01-03,Transport,Bus Ticket,2.75
2025-01-04,Entertainment,Movie,12.00
2025-01-05,Food,Lunch,15.20
2025-01-06,Utilities,Electricity Bill,45.00
2025-01-06,Transport,Train Ticket,6.50
2025-01-07,Food,Dinner,22.30

Save this data as expenses.csv in a new folder. We’ll soon create a Python script in that same folder to work with the file.

Loading Your CSV File into Pandas

Once you’ve installed Pandas and created your expenses.csv file, the first thing you’ll want to do is load it into a DataFrame. A DataFrame is Pandas’ main data structure; you can think of it like a spreadsheet inside Python, but one that’s easier to query and much more powerful.

Before we can load the CSV, let’s set up a Python script. In the same folder where you saved expenses.csv, create a new file called analyze_expenses.py. This will be the script we run throughout the tutorial.

Now open analyze_expenses.py in your code editor and add the following code:

import pandas as pd

# Load the CSV file into a DataFrame
df = pd.read_csv("expenses.csv")

# Display the first five rows
print(df.head())

Save the file, then run it from your terminal with:

python analyze_expenses.py

If everything is set up correctly, you should see the first five rows of your CSV printed in the terminal like this:

         Date       Category       Description  Amount
0  2025-01-03           Food            Coffee    3.50
1  2025-01-03      Transport        Bus Ticket    2.75
2  2025-01-04  Entertainment             Movie   12.00
3  2025-01-05           Food             Lunch   15.20
4  2025-01-06      Utilities  Electricity Bill   45.00

Summarizing Your Data with Pandas

With the CSV successfully loaded into a DataFrame, the next thing you’ll probably want to do is get a quick overview of your data. Pandas makes it easy to calculate totals, averages, and other summary statistics with just a few lines of code.

For example, you can calculate the total amount spent and the average transaction like this:

# Total amount spent
total = df["Amount"].sum()
print("Total spent:", total)

# Average transaction size
average = df["Amount"].mean()
print("Average transaction:", average)

Output:

Total spent: 107.25
Average transaction: 15.321428571428571

So across these seven transactions, you’ve spent a little over $107, and the average transaction size was about $15.

If you want a more detailed statistical summary, Pandas also provides the describe() method:

print(df.describe())

Which gives you something like this:

          Amount
count   7.000000
mean   15.321429
std    14.810859
min     2.750000
25%     5.000000
50%    12.000000
75%    18.750000
max    45.000000

This table shows you:

count: how many rows are in the column (7 transactions)
mean: the average amount
std: shows how spread out your expenses are from the average.
min and max: the smallest and largest expense
percentiles (25%, 50%, 75%) to show how the data is distributed

Breaking Down Data by Category

To dig deeper into our data beyond just totals and averages, you can group the data by category and see how much you’ve spent in each one.

Pandas makes this easy with the .groupby() method:

# Total spending by category
by_category = df.groupby("Category")["Amount"].sum()
print(by_category)

Output:

Category
Entertainment    12.00
Food             41.00
Transport         9.25
Utilities        45.00
Name: Amount, dtype: float64

This tells you that:

You spent $12 on Entertainment,
$41 on Food,
$9.25 on Transport,
and $45 on Utilities,

Grouping isn’t limited to totals — you can also calculate averages, counts, or other statistics. For example, to see the average transaction size per category:

avg_by_category = df.groupby("Category")["Amount"].mean()
print(avg_by_category)

Finding the Largest Values in Your Data

Another common question when analyzing expenses is: “What were my biggest transactions?” Pandas makes it easy to sort your data and pick out the largest values.

You can use sort_values() to order the rows by the Amount column, and then use head() to grab the top results:

# Sort by amount and show the top 5 expenses
top_expenses = df.sort_values("Amount", ascending=False).head(5)
print(top_expenses)

Output:

Date       Category       Description  Amount
4  2025-01-06      Utilities  Electricity Bill    45.0
6  2025-01-07           Food            Dinner    22.3
3  2025-01-05           Food             Lunch    15.2
2  2025-01-04  Entertainment             Movie    12.0
5  2025-01-06      Transport      Train Ticket     6.5

From this list, you can immediately see that your electricity bill was the single largest transaction, followed by dining out.

Sorting is one of the simplest but most powerful ways to surface insights. You can apply the same technique to find the smallest transactions by changing ascending=False to ascending=True.

Analyzing Trends Over Time

Totals and categories are helpful, but sometimes the most important insights come from seeing how your spending changes over time. With Pandas, you can group transactions by month (or even by day or year) once you convert the Date column into proper datetime objects.

First, let’s make sure Pandas recognizes the Date column as a date:

# Convert Date column to datetime
df["Date"] = pd.to_datetime(df["Date"])

Now you can group by month using dt.to_period:

# Total spending by month
monthly = df.groupby(df["Date"].dt.to_period("M"))["Amount"].sum()
print(monthly)

Output:

Date
2025-01    107.25
Freq: M, Name: Amount, dtype: float64

This dataset only covers a single month, but if you had multiple months of expenses, you’d see a line for each one.

You can also group by day to spot patterns in daily spending:

# Total spending by day
daily = df.groupby(df["Date"].dt.date)["Amount"].sum()
print(daily)

Visualizing Your Data with Charts

Numbers are useful, but sometimes it’s easier to spot patterns when you can actually see your data. Pandas works seamlessly with Matplotlib, a popular Python library for creating visualizations. Together, they make it easy to turn raw numbers into clear charts.

First, let’s calculate spending by category again:

# Total spending by category
by_category = df.groupby("Category")["Amount"].sum()

Now let’s plot it:

import matplotlib.pyplot as plt

# Plot total spending by category
by_category.plot(kind="bar", color="teal")

plt.title("Expenses by Category")
plt.ylabel("Total Spent ($)")
plt.xlabel("Category")
plt.show()

This produces a bar chart where each category (Food, Transport, Utilities, Entertainment) has its own bar, showing the total spent.

You can also visualize monthly trends. Since we grouped by month earlier, you can reuse the monthly variable to create a line chart:

# Plot monthly spending trend
monthly.plot(kind="line", marker="o", color="purple")

plt.title("Monthly Spending Trend")
plt.ylabel("Total Spent ($)")
plt.xlabel("Month")
plt.show()

You can immediately see which categories dominate your budget or how spending changes month to month.

Taking Your Analysis Further

So far, we’ve covered loading data, summarizing it, grouping by category, and plotting charts, but this is just the beginning of what Pandas can do. Once you’re comfortable with the basics, there are plenty of ways to extend this workflow.

Here are a few ideas to try next:

Save your cleaned data: After you’ve dropped missing values or renamed columns, you can save the results back into a new CSV file:

df.to_csv("cleaned_expenses.csv", index=False)

Calculate new insights: For example, compute the percentage of your budget that each category represents, or compare this month’s spending against last month’s.
Combine multiple CSVs: If your data comes in monthly files (e.g., january.csv, february.csv), you can merge them with pd.concat().
Try different visualizations: Pie charts, line charts, or stacked bar charts can help you see your data from new angles.

If you’d like to go further, the official Pandas documentation has detailed tutorials and guides that cover everything from advanced indexing to time series analysis. You can also explore the Matplotlib docs to learn more ways to create powerful visualizations.

Conclusion

In this tutorial, you saw how easy it is to turn a plain CSV file into meaningful insights with Python and Pandas. Starting from a simple file of expenses, you learned how to:

Load data into a Pandas DataFrame
Calculate totals and averages
Group transactions by category
Identify the largest values in your dataset
Track changes over time
Visualize results with charts using Matplotlib

What makes Pandas so powerful is its flexibility: the same few lines of code you used here can be applied to any CSV file, whether that’s your own finances, a sales report, a dataset from school, or a file you’ve downloaded online.

DEV Community

How to Analyze CSV Files with Python and Pandas

Prerequisites

The Sample Dataset

Loading Your CSV File into Pandas

Summarizing Your Data with Pandas

Breaking Down Data by Category

Finding the Largest Values in Your Data

Analyzing Trends Over Time

Visualizing Your Data with Charts

Taking Your Analysis Further

Conclusion

Top comments (0)