Introduction
This article focuses on building an equal-weight portfolio allocation strategy with Python. If you had $10,000 that you’d like to invest in the fifty top-performing companies in the S&P 500 index, how would you allocate capital across these stocks? In this article, you will learn how to extract value from the top S&P 500 companies by tabulating the Ticker, current trading price, 1-year % return, and calculating the number of shares to buy on the top 50 performing stocks.
Creating a new Jupyter Notebook.
Google Colab is a cloud-based Jupyter notebook that allows you to write and execute python code on the web.
Importing relevant Tickers and libraries.
Follow this link to download the S&P500 ticker symbols. This will make it easy for you to extract the data associated with each ticker.
Installing Yahoo Finance and importing libraries.
Use the following codes to import Yahoo-Finance data.
!pip install "yfinance".
When you are done, import the following libraries.
import pandas as pd
import numpy as np
import os
from datetime import datetime, timedelta
Pandas- present your data as dataframes and series, allowing you to clean, manipulate, and analyze data with in-built functionalities. Numpy is a library used for working with arrays and general mathematical functions. Importing os helps you manipulate file paths that will used further in the project. Finally, the Datetime class helps you work with dates and times and helps us manipulate dates and times in general. Timedelta, as the name implies, is used to find a duration within a time period, beginning and end.
Extracting the 1 year, 6 month, 3 month, and Monthly return on each stock in the s&p500 index.
These periods will help you extract the returns of each stock within each time frame. Comments are added to each chunk of code to explain what is happening.
import os
import pandas as pd
import yfinance as yf
from datetime import datetime
def get_first_last_trading_days(stocks_file, years):
"""
Retrieves and calculates the first and last trading days for a list of stocks over specified years.
Saves the results to CSV files and returns a dictionary containing the data.
Args:
stocks_file (str): Path to the CSV file containing stock tickers.
years (list): List of years to process.
Returns:
dict: A dictionary containing the calculated ratings for each stock.
"""
# Initialize an empty dictionary to store data
data = {}
# Read the stock tickers from the CSV file
stocks = pd.read_csv(stocks_file)['Ticker'].tolist()
# Create a directory to store the stock data if it doesn't exist
if not os.path.exists('stockss_dfs'):
os.makedirs('stockss_dfs')
def rating(df, startdate, enddate, freq):
"""
Calculates the percentage change in stock prices over a specified frequency.
Args:
df (pd.DataFrame): Stock data.
startdate (str): Start date for the analysis.
enddate (str): End date for the analysis.
freq (str): Frequency for grouping data ('Y', 'M', '3M', '6M').
Returns:
pd.Series: Percentage change in stock prices.
"""
# Define offset based on time frequency
if freq == 'Y':
offset = '366 days'
elif freq == 'M':
offset = '31 days'
elif freq == '3M':
offset = '93 days'
elif freq == '6M':
offset = '183 days'
else:
raise ValueError("Frequency not supported. Use 'Y', 'M', '3M', or '6M'.")
# Filter the dataframe and calculate the % change ratio
dff = df.loc[(df.index >= pd.Timestamp(startdate) - pd.Timedelta(offset)) & (df.index <= pd.Timestamp(enddate))]
dfy = dff.groupby(pd.Grouper(level='Date', freq=freq)).tail(1)
ratio = (dfy['Close'] / dfy['Close'].shift() - 1) * 100
return ratio
# Loop through each year
for year in years:
# Define start and end dates for the year
start_date = f"{year}-01-01"
end_date = f"{year}-12-31"
# Loop through each stock ticker
for stock in stocks:
# Define the file path to save/load the stock data
file_path = f'stockss_dfs/{stock}_{year}.csv'
# Check if the file already exists
if not os.path.exists(file_path):
try:
# Download the stock data using yfinance
df = yf.download(stock, start=start_date, end=end_date)
df.index = pd.to_datetime(df.index)
df.index = df.index.tz_localize(None)
# Proceed if the dataframe is not empty
if not df.empty:
# Calculate ratings for different frequencies
period_rating = rating(df, start_date, end_date, freq='Y')
period_rating_monthly = rating(df, start_date, end_date, freq='M')
period_rating_3months = rating(df, start_date, end_date, freq='3M')
period_rating_6months = rating(df, start_date, end_date, freq='6M')
# Store the ratings in the data dictionary
data[stock] = {
'Yearly': period_rating,
'Monthly': period_rating_monthly,
'3 Months': period_rating_3months,
'6 Months': period_rating_6months
}
# Save the results to a CSV file
df_results = pd.DataFrame({
'Yearly': period_rating,
'Monthly': period_rating_monthly,
'3 Months': period_rating_3months,
'6 Months': period_rating_6months
})
df_results.to_csv(file_path, index=True)
except Exception as e:
# Handle any errors that occur during processing
print(f"Error processing {stock} for year {year}: {e}")
continue
return data
# Get the current year and the previous year
current_year = datetime.now().year
years = [current_year - i for i in range(1, 2)]
# Retrieve the data
stocks_file = '/content/sp_500_stocks.csv'
data = get_first_last_trading_days(stocks_file, years)
When you run the code, you will get the following:
In the image above, you will observe that the 1-year column is empty. This is because we need the difference at the end of two years to get the return on a year. Alternatively, we can creatively add up the values of individual months for a 12 month cycle, and we trust that will give us the result of the 1 year result. The following code does just that for us.
import os
import pandas as pd
import yfinance as yf
def extract_sum_of_1_year_return(directory):
"""
Extracts the sum of the 'Monthly' column from each CSV file in the given directory.
Also extracts the ticker name from the file name.
Args:
directory (str): Path to the directory containing CSV files.
Returns:
list: A list of dictionaries containing the ticker name and the yearly sum.
"""
all_instruments = []
# List all the files in the directory
files = os.listdir(directory)
# Iterate over the files
for file in files:
file_path = os.path.join(directory, file)
# Check if the path is a file (not a directory)
if os.path.isfile(file_path):
# Read the CSV file into a DataFrame
df = pd.read_csv(file_path)
# Calculate the sum of the 'Monthly' column
df_sum = df['Monthly'].sum()
# Extract the ticker name from the file name
file_full_path = file_path.split('/')
real_file_path = file_full_path[3].split('_')
ticker_name = real_file_path[0]
# Append the ticker name and yearly sum to the list
all_instruments.append({
"ticker": ticker_name,
"yearly_sum": df_sum
})
return all_instruments
# Directory containing the CSV files
directory = "/content/stockss_dfs"
# Call the function to extract yearly sums and ticker names
results = extract_sum_of_1_year_return(directory)
def get_stocks(results):
"""
Fetches the current price for each stock ticker using Yahoo Finance API.
Only includes stocks where the current price is successfully retrieved.
Args:
results (list): List of dictionaries containing ticker names and yearly sums.
Returns:
list: A list of dictionaries containing ticker names, current prices, and yearly sums.
"""
# Initialize a list to hold the stock data that was successfully processed
successful_stocks = []
# Loop through the results
for stock in results:
try:
# Fetch stock information using Yahoo Finance API
api_url = yf.Ticker(stock['ticker'])
stock_instrument = api_url.info
# Get the current price of the stock
current_price = stock_instrument.get('currentPrice', None)
# Only add to successful_stocks if the current price is not None
if current_price is not None:
successful_stocks.append({
'ticker': stock['ticker'],
'current_price': current_price,
'yearly_sum': stock['yearly_sum']
})
except Exception as e:
# Skip the stock if an error occurs (e.g., invalid ticker or API issue)
continue
return successful_stocks
# Call the function to get stock data with current prices
final_stocks = get_stocks(results)
Output :
Selecting the top 50 performing stocks.
You will calculate the number of shares per stock you can buy with a certain amount in capital. First you have to select the first 50 stocks with the highest return within a one-year time frame.
# Ensure the 'yearly_sum' column is numeric
# Convert the 'yearly_sum' column to numeric, coercing any non-numeric values to NaN
final_stocks_df['yearly_sum'] = pd.to_numeric(final_stocks_df['yearly_sum'], errors='coerce')
# Drop rows with NaN values in the 'yearly_sum' column
# This removes any rows where the 'yearly_sum' value could not be converted to a number
final_stocks_df.dropna(subset=['yearly_sum'], inplace=True)
# Sort the dataframe by the 'yearly_sum' column in descending order
# This arranges the rows from highest to lowest based on the 'yearly_sum' values
final_stocks_df.sort_values('yearly_sum', ascending=False, inplace=True)
# Select the top 50 rows
# Keep only the first 50 rows after sorting
final_stocks_df = final_stocks_df[:50]
# Drop the 'level_0' column
# Remove the 'level_0' column as it is no longer needed
final_stocks_df.drop(columns=['level_0'], inplace=True)
# Display the dataframe
# Show the final processed dataframe
final_stocks_df
Output :
Calculating portfolio amount
Here, you choose an initial starting balance for your portfolio, this amount will be split in equal weights across all the stocks.
def portfolio_input():
"""
Prompts the user to enter the value of their portfolio.
Validates the input to ensure it is a number.
"""
global portfolio_size # Declare portfolio_size as a global variable
# Prompt the user to enter the portfolio value
portfolio_size = input('Enter the value of your portfolio: ')
try:
# Try to convert the input to a float
val = float(portfolio_size)
except ValueError:
# Handle the case where the input is not a number
print("That's not a number! \nPlease try again:")
# Prompt the user again for the portfolio value
portfolio_size = input('Enter the value of your portfolio: ')
val = float(portfolio_size) # Convert the new input to a float
# Call the portfolio_input function
portfolio_input()
# Print the portfolio size entered by the user
print(portfolio_size)
Output:
Calculating the number of shares to buy
Divide the portfolio size by the total number of stocks in the s&p500 index to get average amount of investable capital, then calculate the number of shares to buy by dividing the value you got by the current price the stock is trading at.
# Find the mean of the portfolio size.
position_size = float(portfolio_size) / len(final_stocks_df.index)
# Insert the result of 'Enterprise value' / 'Stock Price' into the column of 'Number of Shares to Buy'.
final_stocks_df['Number of Shares to Buy'] = np.floor(position_size / final_stocks_df['current_price']).astype(int)
final_stocks_df
Output:
Conclusion.
In this article, you learned how to allocate capital among the top 50 performing stocks in the S&P 500. You cleaned the data to drop NAN(not a number) data that would have messed with results. This article was inspired by freecode camp’s tutorial freecodecamp, but since much original thought went into writing the code, I decided to write and publish. I hope you learned a thing or two, see you next time.
Top comments (0)