DEV Community

Cover image for Building an Equal weight portfolio allocation strategy with Python
Pluri45
Pluri45

Posted on • Edited on

Building an Equal weight portfolio allocation strategy with Python

Introduction

This article focuses on building an equal-weight portfolio allocation strategy with Python. If you had $10,000 that you’d like to invest in the fifty top-performing companies in the S&P 500 index, how would you allocate capital across these stocks? In this article, you will learn how to extract value from the top S&P 500 companies by tabulating the Ticker, current trading price, 1-year % return, and calculating the number of shares to buy on the top 50 performing stocks.

Creating a new Jupyter Notebook.

Google Colab is a cloud-based Jupyter notebook that allows you to write and execute python code on the web.

Google Colab!

Importing relevant Tickers and libraries.

Follow this link to download the S&P500 ticker symbols. This will make it easy for you to extract the data associated with each ticker.

Installing Yahoo Finance and importing libraries.

Use the following codes to import Yahoo-Finance data.

!pip install "yfinance".
Enter fullscreen mode Exit fullscreen mode

When you are done, import the following libraries.

import pandas as pd

import numpy as np

import os

from datetime import datetime, timedelta

Enter fullscreen mode Exit fullscreen mode

Pandas- present your data as dataframes and series, allowing you to clean, manipulate, and analyze data with in-built functionalities. Numpy is a library used for working with arrays and general mathematical functions. Importing os helps you manipulate file paths that will used further in the project. Finally, the Datetime class helps you work with dates and times and helps us manipulate dates and times in general. Timedelta, as the name implies, is used to find a duration within a time period, beginning and end.

Extracting the 1 year, 6 month, 3 month, and Monthly return on each stock in the s&p500 index.

These periods will help you extract the returns of each stock within each time frame. Comments are added to each chunk of code to explain what is happening.


def get_first_last_trading_days(stocks_file, years):
    # Initialize an empty dictionary to store data
    data = {}

    # Read the stock tickers from the CSV file.
    stocks = pd.read_csv(stocks_file)['Ticker'].tolist()

    if not os.path.exists('stockss_dfs'):
        os.makedirs('stockss_dfs')

    def rating(df, startdate, enddate, freq):
        # Offset is defined based on the time frequency
        # Define offset based on time frequency
        if freq == 'Y':
            offset = '366 days'
        elif freq == 'M':
            offset = '31 days'
        elif freq == '3M':
            offset = '93 days'
        elif freq == '6M':
            offset = '183 days'
        else:
            raise ValueError("Frequency not supported. Use 'Y', 'M', '3M', or '6M'.")

        # Filter the dataframe and calculate the % change ratio, that ranks returns
        dff = df.loc[(df.index >= pd.Timestamp(startdate) - pd.Timedelta(offset)) & (df.index <= pd.Timestamp(enddate))]
        dfy = dff.groupby(pd.Grouper(level='Date', freq=freq)).tail(1)
        ratio = (dfy['Close'] / dfy['Close'].shift() - 1) * 100

        return ratio

    # For sake of scalability, we avoid hardcoding years and try to insert the specified year as a parameter.
    for year in years:
        # start and end dates for the year
        start_date = f"{year}-01-01"
        end_date = f"{year}-12-31"

        # Loop through each stock ticker
        for stock in stocks:
            # Download the data for each s&p stock and create a file for each stock if it's not already available.
            file_path = f'stockss_dfs/{stock}_{year}.csv'

            if not os.path.exists(file_path):
                try:
                    df = yf.download(stock, start=start_date, end=end_date)
                    df.index = pd.to_datetime(df.index)
                    df.index = df.index.tz_localize(None)

                    if not df.empty:
                        period_rating = rating(df, start_date, end_date, freq='Y')
                        period_rating_monthly = rating(df, start_date, end_date, freq='M')
                        period_rating_3months = rating(df, start_date, end_date, freq='3M')
                        period_rating_6months = rating(df, start_date, end_date, freq='6M')

                        # Store the ratings in the data dictionary
                        data[stock] = {
                            'Yearly': period_rating,
                            'Monthly': period_rating_monthly,
                            '3 Months': period_rating_3months,
                            '6 Months': period_rating_6months
                        }

                        # Save the results to CSV
                        df_results = pd.DataFrame({
                            'Yearly': period_rating,
                            'Monthly': period_rating_monthly,
                            '3 Months': period_rating_3months,
                            '6 Months': period_rating_6months
                        })

                        df_results.to_csv(file_path, index=True)

                except Exception as e:
                    print(f"Error processing {stock} for year {year}: {e}")
                    continue

    return data

# Get the current year and the previous year
current_year = datetime.now().year
years = [current_year - i for i in range(1, 2)]

# Retrieve the data
stocks_file = '/content/sp_500_stocks.csv'
data = get_first_last_trading_days(stocks_file, years)
Enter fullscreen mode Exit fullscreen mode

When you run the code, you will get the following:

get_first_last_trading_days!

In the image above, you will observe that the 1-year column is empty. This is because we need the difference at the end of two years to get the return on a year. Alternatively, we can creatively add up the values of individual months for a 12 month cycle, and we trust that will give us the result of the 1 year result. The following code does just that for us.


def extract_sum_of_1_year_return(directory):
    all_instruments = []
    # List all the files in the directory
    files = os.listdir(directory)
    # Iterate over the files
    for file in files:
        file_path = os.path.join(directory, file)
        if os.path.isfile(file_path):
            df = pd.read_csv(file_path)
            df_sum = df['Monthly'].sum()

            # split the file along the slash
            file_full_path = file_path.split('/')
            real_file_path = file_full_path[3].split('_')
            ticker_name = real_file_path[0]

            all_instruments.append({
                "ticker": ticker_name,
                "yearly_sum": df_sum
            })

    return all_instruments

# Directory containing the CSV files
directory = "/content/stockss_dfs"
# Call the function and print the result
results = extract_sum_of_1_year_return(directory)

# Read the stock tickers from the CSV file
# stocks = pd.read_csv('/content/sp_500_stocks.csv')['Ticker'].tolist()
def get_stocks(results):
    # Initialize a list to hold the stock data that was successfully processed
    successful_stocks = []
    # Loop through the first 10 stocks
    for stock in results:
        try:
            api_url = yf.Ticker(stock['ticker'])
            stock_instrument = api_url.info
            current_price = stock_instrument.get('currentPrice', None)
            # Only add to successful_stocks if both values are not None
            if current_price is not None:
                successful_stocks.append({
                    'ticker': stock['ticker'],
                    'current_price': current_price,
                    'yearly_sum': stock['yearly_sum']
                })
        except Exception as e:
            continue
    return successful_stocks

final_stocks = get_stocks(results)


Enter fullscreen mode Exit fullscreen mode

Output :

extract_sum_of_1_year_return!

Selecting the top 50 performing stocks.

You will calculate the number of shares per stock you can buy with a certain amount in capital. First you have to select the first 50 stocks with the highest return within a one-year time frame.


# Ensure the '1-year-return' column is numeric
final_stocks_df['yearly_sum'] = pd.to_numeric(final_stocks_df['yearly_sum'], errors='coerce')

# Drop rows with NaN values in 'yearly_sum'
final_stocks_df.dropna(subset=['yearly_sum'], inplace=True)

# Sort the dataframe by 'yearly_sum' in descending order
final_stocks_df.sort_values('yearly_sum', ascending=False, inplace=True)

# Select the top 50 rows
final_stocks_df = final_stocks_df[:50]

# Drop the 'level_0' column
final_stocks_df.drop(columns=['level_0'], inplace=True)

# Display the dataframe
final_stocks_df


Enter fullscreen mode Exit fullscreen mode

Output :

Removelowmomentumstocks!

Calculating portfolio amount

Here, you choose an initial starting balance for your portfolio, this amount will be split in equal weights across all the stocks.


def portfolio_input():
    global portfolio_size
    portfolio_size = input('Enter the value of your portfolio ')
    try:
       float (portfolio_size)
    except ValueError:
        print("That's not a number! \nPlease try again:")
        portfolio_size = input('Enter the value of yout portfolio: ')
    val = float(portfolio_size)

portfolio_input()
print (portfolio_size)



Enter fullscreen mode Exit fullscreen mode

Output:

Portfolio_amount!

Calculating the number of shares to buy

Divide the portfolio size by the total number of stocks in the s&p500 index to get average amount of investable capital, then calculate the number of shares to buy by dividing the value you got by the current price the stock is trading at.


# Find the mean of the portfolio size.
position_size = float(portfolio_size) / len(final_stocks_df.index)


# Insert the result of 'Enterprise value' / 'Stock Price' into the column of 'Number of Shares to Buy'.

final_stocks_df['Number of Shares to Buy'] = np.floor(position_size / final_stocks_df['current_price']).astype(int)

final_stocks_df



Enter fullscreen mode Exit fullscreen mode

Output:

Calculating the number of shares to buy!

Conclusion.

In this article, you learned how to allocate capital among the top 50 performing stocks in the S&P 500. You cleaned the data to drop NAN(not a number) data that would have messed with results. This article was inspired by freecode camp’s tutorial (https://www.youtube.com/watch?v=xfzGZB4HhEE), but since much original thought went into writing the code, I decided to write and publish. I hope you learned a thing or two, see you next time.

Top comments (0)