DEV Community

Cover image for How to Analyze CSV Files with Python and Pandas
Tilda Udufo
Tilda Udufo

Posted on

How to Analyze CSV Files with Python and Pandas

CSV files (short for Comma-Separated Values) are one of the simplest and most common formats for storing data. You’ll find them everywhere, from bank statements to app logs.

At first glance, a CSV is just a plain text file with rows and columns. But when the file gets large, or when you want to ask meaningful questions like “How much did I spend last month?”, “Which category do I spend the most money on?” or “What trends can I see over time?” — spreadsheets can feel limiting.

That’s where Python and Pandas shine. Pandas is a Python library that makes it easy to load, clean, analyze, and visualize data.

In this tutorial, we’ll walk step by step through how to:

  • Load CSV data into Pandas
  • Summarize totals and averages
  • Group and filter data
  • Find key insights (like your biggest expenses)
  • Visualize results with charts

We’ll use a small expense-tracking dataset as our example, but the techniques you’ll learn apply to any CSV file, whether it’s sales data, student grades, or survey results.

Prerequisites

To follow along, you’ll need:

  • Python 3.x installed on your machine.

Note: Pandas requires Python 3. If your system still defaults to Python 2 when you type python, you may need to use python3 instead.

  • The Pandas and Matplotlib libraries. If you don’t have them yet, install them with:
pip install pandas matplotlib
Enter fullscreen mode Exit fullscreen mode

The Sample Dataset

Let’s imagine you want to keep track of your daily spending. You log each transaction into a CSV file with four columns:

  • Date – when the expense happened
  • Category – what type of expense it was (Food, Transport, Utilities, etc.)
  • Description – a short note about the expense
  • Amount – how much it cost

We’ll use a sample file called expenses.csv as our dataset throughout this tutorial:

Date,Category,Description,Amount
2025-01-03,Food,Coffee,3.50
2025-01-03,Transport,Bus Ticket,2.75
2025-01-04,Entertainment,Movie,12.00
2025-01-05,Food,Lunch,15.20
2025-01-06,Utilities,Electricity Bill,45.00
2025-01-06,Transport,Train Ticket,6.50
2025-01-07,Food,Dinner,22.30
Enter fullscreen mode Exit fullscreen mode

Save this data as expenses.csv in a new folder. We’ll soon create a Python script in that same folder to work with the file.

Loading Your CSV File into Pandas

Once you’ve installed Pandas and created your expenses.csv file, the first thing you’ll want to do is load it into a DataFrame. A DataFrame is Pandas’ main data structure; you can think of it like a spreadsheet inside Python, but one that’s easier to query and much more powerful.

Before we can load the CSV, let’s set up a Python script. In the same folder where you saved expenses.csv, create a new file called analyze_expenses.py. This will be the script we run throughout the tutorial.

Now open analyze_expenses.py in your code editor and add the following code:

import pandas as pd

# Load the CSV file into a DataFrame
df = pd.read_csv("expenses.csv")

# Display the first five rows
print(df.head())
Enter fullscreen mode Exit fullscreen mode

Save the file, then run it from your terminal with:

python analyze_expenses.py
Enter fullscreen mode Exit fullscreen mode

If everything is set up correctly, you should see the first five rows of your CSV printed in the terminal like this:

         Date       Category       Description  Amount
0  2025-01-03           Food            Coffee    3.50
1  2025-01-03      Transport        Bus Ticket    2.75
2  2025-01-04  Entertainment             Movie   12.00
3  2025-01-05           Food             Lunch   15.20
4  2025-01-06      Utilities  Electricity Bill   45.00
Enter fullscreen mode Exit fullscreen mode

Summarizing Your Data with Pandas

With the CSV successfully loaded into a DataFrame, the next thing you’ll probably want to do is get a quick overview of your data. Pandas makes it easy to calculate totals, averages, and other summary statistics with just a few lines of code.

For example, you can calculate the total amount spent and the average transaction like this:

# Total amount spent
total = df["Amount"].sum()
print("Total spent:", total)

# Average transaction size
average = df["Amount"].mean()
print("Average transaction:", average)
Enter fullscreen mode Exit fullscreen mode

Output:

Total spent: 107.25
Average transaction: 15.321428571428571
Enter fullscreen mode Exit fullscreen mode

So across these seven transactions, you’ve spent a little over $107, and the average transaction size was about $15.

If you want a more detailed statistical summary, Pandas also provides the describe() method:

print(df.describe())
Enter fullscreen mode Exit fullscreen mode

Which gives you something like this:

          Amount
count   7.000000
mean   15.321429
std    14.810859
min     2.750000
25%     5.000000
50%    12.000000
75%    18.750000
max    45.000000
Enter fullscreen mode Exit fullscreen mode

This table shows you:

  • count: how many rows are in the column (7 transactions)
  • mean: the average amount
  • std: shows how spread out your expenses are from the average.
  • min and max: the smallest and largest expense
  • percentiles (25%, 50%, 75%) to show how the data is distributed

Breaking Down Data by Category

To dig deeper into our data beyond just totals and averages, you can group the data by category and see how much you’ve spent in each one.

Pandas makes this easy with the .groupby() method:

# Total spending by category
by_category = df.groupby("Category")["Amount"].sum()
print(by_category)
Enter fullscreen mode Exit fullscreen mode

Output:

Category
Entertainment    12.00
Food             41.00
Transport         9.25
Utilities        45.00
Name: Amount, dtype: float64
Enter fullscreen mode Exit fullscreen mode

This tells you that:

  • You spent $12 on Entertainment,
  • $41 on Food,
  • $9.25 on Transport,
  • and $45 on Utilities,

Grouping isn’t limited to totals — you can also calculate averages, counts, or other statistics. For example, to see the average transaction size per category:

avg_by_category = df.groupby("Category")["Amount"].mean()
print(avg_by_category)
Enter fullscreen mode Exit fullscreen mode

Finding the Largest Values in Your Data

Another common question when analyzing expenses is: “What were my biggest transactions?” Pandas makes it easy to sort your data and pick out the largest values.

You can use sort_values() to order the rows by the Amount column, and then use head() to grab the top results:

# Sort by amount and show the top 5 expenses
top_expenses = df.sort_values("Amount", ascending=False).head(5)
print(top_expenses)
Enter fullscreen mode Exit fullscreen mode

Output:

Date       Category       Description  Amount
4  2025-01-06      Utilities  Electricity Bill    45.0
6  2025-01-07           Food            Dinner    22.3
3  2025-01-05           Food             Lunch    15.2
2  2025-01-04  Entertainment             Movie    12.0
5  2025-01-06      Transport      Train Ticket     6.5
Enter fullscreen mode Exit fullscreen mode

From this list, you can immediately see that your electricity bill was the single largest transaction, followed by dining out.

Sorting is one of the simplest but most powerful ways to surface insights. You can apply the same technique to find the smallest transactions by changing ascending=False to ascending=True.

Analyzing Trends Over Time

Totals and categories are helpful, but sometimes the most important insights come from seeing how your spending changes over time. With Pandas, you can group transactions by month (or even by day or year) once you convert the Date column into proper datetime objects.

First, let’s make sure Pandas recognizes the Date column as a date:

# Convert Date column to datetime
df["Date"] = pd.to_datetime(df["Date"])
Enter fullscreen mode Exit fullscreen mode

Now you can group by month using dt.to_period:

# Total spending by month
monthly = df.groupby(df["Date"].dt.to_period("M"))["Amount"].sum()
print(monthly)
Enter fullscreen mode Exit fullscreen mode

Output:

Date
2025-01    107.25
Freq: M, Name: Amount, dtype: float64
Enter fullscreen mode Exit fullscreen mode

This dataset only covers a single month, but if you had multiple months of expenses, you’d see a line for each one.

You can also group by day to spot patterns in daily spending:

# Total spending by day
daily = df.groupby(df["Date"].dt.date)["Amount"].sum()
print(daily)
Enter fullscreen mode Exit fullscreen mode

Visualizing Your Data with Charts

Numbers are useful, but sometimes it’s easier to spot patterns when you can actually see your data. Pandas works seamlessly with Matplotlib, a popular Python library for creating visualizations. Together, they make it easy to turn raw numbers into clear charts.

First, let’s calculate spending by category again:

# Total spending by category
by_category = df.groupby("Category")["Amount"].sum()
Enter fullscreen mode Exit fullscreen mode

Now let’s plot it:

import matplotlib.pyplot as plt

# Plot total spending by category
by_category.plot(kind="bar", color="teal")

plt.title("Expenses by Category")
plt.ylabel("Total Spent ($)")
plt.xlabel("Category")
plt.show()
Enter fullscreen mode Exit fullscreen mode

This produces a bar chart where each category (Food, Transport, Utilities, Entertainment) has its own bar, showing the total spent.

A bar chart plotted with Matplotlib showing the total spent in each expense category.

You can also visualize monthly trends. Since we grouped by month earlier, you can reuse the monthly variable to create a line chart:

# Plot monthly spending trend
monthly.plot(kind="line", marker="o", color="purple")

plt.title("Monthly Spending Trend")
plt.ylabel("Total Spent ($)")
plt.xlabel("Month")
plt.show()
Enter fullscreen mode Exit fullscreen mode

You can immediately see which categories dominate your budget or how spending changes month to month.

A line chart showing the monthly spending trend.

Taking Your Analysis Further

So far, we’ve covered loading data, summarizing it, grouping by category, and plotting charts, but this is just the beginning of what Pandas can do. Once you’re comfortable with the basics, there are plenty of ways to extend this workflow.

Here are a few ideas to try next:

  • Save your cleaned data: After you’ve dropped missing values or renamed columns, you can save the results back into a new CSV file:
df.to_csv("cleaned_expenses.csv", index=False)
Enter fullscreen mode Exit fullscreen mode
  • Calculate new insights: For example, compute the percentage of your budget that each category represents, or compare this month’s spending against last month’s.
  • Combine multiple CSVs: If your data comes in monthly files (e.g., january.csv, february.csv), you can merge them with pd.concat().
  • Try different visualizations: Pie charts, line charts, or stacked bar charts can help you see your data from new angles.

If you’d like to go further, the official Pandas documentation has detailed tutorials and guides that cover everything from advanced indexing to time series analysis. You can also explore the Matplotlib docs to learn more ways to create powerful visualizations.

Conclusion

In this tutorial, you saw how easy it is to turn a plain CSV file into meaningful insights with Python and Pandas. Starting from a simple file of expenses, you learned how to:

  • Load data into a Pandas DataFrame
  • Calculate totals and averages
  • Group transactions by category
  • Identify the largest values in your dataset
  • Track changes over time
  • Visualize results with charts using Matplotlib

What makes Pandas so powerful is its flexibility: the same few lines of code you used here can be applied to any CSV file, whether that’s your own finances, a sales report, a dataset from school, or a file you’ve downloaded online.

Top comments (0)