DEV Community

Bahman Shadmehr
Bahman Shadmehr

Posted on

Analyzing Financial Data with Pandas: A Step-by-Step Guide

In the world of data analysis, Pandas stands out as a powerful tool for working with structured data. If you're interested in diving into financial data and extracting meaningful insights, this guide will walk you through the process of grouping and aggregating data using Pandas, with a practical example using historical stock data from Yahoo Finance.

Step 1: Install Necessary Libraries

Before we get started, make sure you have Pandas and yfinance installed. If not, you can install them by running:

pip install pandas yfinance
Enter fullscreen mode Exit fullscreen mode

Step 2: Import Libraries

import pandas as pd
import yfinance as yf
Enter fullscreen mode Exit fullscreen mode

Step 3: Download Financial Data

Let's start by downloading historical stock data for a specific ticker. For this example, we'll use Apple Inc. (AAPL).

ticker = "AAPL"
start_date = "2020-01-01"
end_date = "2023-01-01"

# Download data
data = yf.download(ticker, start=start_date, end=end_date)
Enter fullscreen mode Exit fullscreen mode

Step 4: Explore and Clean the Data

It's crucial to understand and clean your data before diving into analysis.

# Display the first few rows of the data
print(data.head())

# Check for missing values
print(data.isnull().sum())

# Drop rows with missing values
data = data.dropna()
Enter fullscreen mode Exit fullscreen mode

Step 5: Group Data and Perform Aggregations

Grouping by Date

# Group by date and calculate the average closing price for each day
daily_average = data.groupby(data.index)['Close'].mean()

# Display the result
print(daily_average)
Enter fullscreen mode Exit fullscreen mode

Grouping by Month

# Extract month from the Date index
data['Month'] = data.index.month

# Group by month and calculate the average closing price for each month
monthly_average = data.groupby('Month')['Close'].mean()

# Display the result
print(monthly_average)
Enter fullscreen mode Exit fullscreen mode

Grouping by Multiple Columns

# Extract month and year from the Date index
data['Month'] = data.index.month
data['Year'] = data.index.year

# Group by year and month, calculate the average closing price for each group
monthly_average_by_year = data.groupby(['Year', 'Month'])['Close'].mean()

# Display the result
print(monthly_average_by_year)
Enter fullscreen mode Exit fullscreen mode

Step 6: Advanced Aggregations

You can perform more advanced aggregations using the agg function.

# Define a dictionary with column-wise aggregation functions
agg_functions = {'Open': 'mean', 'High': 'max', 'Low': 'min', 'Close': 'mean', 'Volume': 'sum'}

# Group by month and apply the aggregation functions
monthly_aggregated = data.groupby('Month').agg(agg_functions)

# Display the result
print(monthly_aggregated)
Enter fullscreen mode Exit fullscreen mode

Conclusion

Congratulations! You've just learned how to download financial data from Yahoo Finance, clean and explore it, and perform various grouping and aggregation operations using Pandas. These skills are fundamental for anyone looking to analyze and derive insights from financial data.

Feel free to adapt and expand upon these examples based on your specific analysis goals. Happy coding!

Top comments (2)

Collapse
 
topofocus profile image
Hartmut B.

May i ask, why pandas is still used. I thought the rust-based polars is the common tool.

Collapse
 
bshadmehr profile image
Bahman Shadmehr

I think Pandas is still a widely used tool. But at the end we just need to remember that all tools like pandas and polars are tools and it's up to us to use them in a way that helps us to achieve our specific goal.