Factor Investing in Python: Implementing the Fama-French Models From Scratch

#investing #finance #beginners #productivity

I first encountered factor investing through a footnote in an academic paper. Eugene Fama and Kenneth French published their three-factor model in 1992, essentially arguing that stock returns are explained by three things: market exposure (beta), company size (small vs. large), and valuation (cheap vs. expensive). A few years later they added profitability and investment factors to make it five. The insight was not that these factors exist — investors have known about small-cap and value premiums for decades — but that they are systematic, measurable, and implementable in a regression model with publicly available data.

As a developer, the appeal was immediate: if returns can be decomposed into factor exposures, I can quantify why my portfolio performed the way it did. Am I beating the market because I am good at stock picking, or because I accidentally loaded up on small-cap value stocks during a small-cap value rally? The Fama-French model answers that question with coefficients and p-values. Here is how to implement it in Python without a Bloomberg terminal.

The Data: Where to Get It (Free)

The beauty of Fama-French: the factor returns are published for free by Ken French's data library at Dartmouth. You do not need an API key or a data subscription. The raw data is available as CSV files updated monthly:

https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_CSV.zip

Inside the zip, the CSV contains five columns: Mkt-RF (market return minus risk-free rate), SMB (small minus big), HML (high minus low book-to-market), and RF (risk-free rate). These are the three factors. If you want all five, grab the F-F_Research_Data_5_Factors_2x3_CSV.zip file, which adds RMW (robust minus weak profitability) and CMA (conservative minus aggressive investment).

Loading these into Python requires minor cleanup — the CSV starts with a header block and ends with a copyright notice:

import pandas as pd

url = ("https://mba.tuck.dartmouth.edu/pages/faculty/"
       "ken.french/ftp/F-F_Research_Data_Factors_CSV.zip")
ff3 = pd.read_csv(url, skiprows=3, index_col=0, parse_dates=True)
ff3.index = pd.to_datetime(ff3.index, format='%Y%m')
ff3 = ff3.iloc[:-1]  # Drop the copyright line at the bottom
ff3 = ff3 / 100      # Convert from percentage to decimal
print(ff3.tail())

The division by 100 is important. The raw data stores returns as percentages (e.g., 0.56 means 0.56%), and your regression will produce nonsense coefficients if you mix percentage-scale factors with decimal-scale stock returns. Always convert to decimal before modeling.

info

For stock-level return data to use as your dependent variable, Yahoo Finance via yfinance works for individual tickers. For portfolio-level analysis, calculate your own weighted returns from your holdings. The Fama-French factors are long-short portfolios, so your dependent variable should be in excess returns (stock return minus risk-free rate) to match the model specification.

Running the 3-Factor Regression

The statistical core is an ordinary least squares regression using statsmodels. If you have built linear models before, the equation will look familiar:

R_it - R_ft = α_i + β_i(R_mt - R_ft) + s_i(SMB_t) + h_i(HML_t) + ε_it

Implementing this in Python:

import statsmodels.api as sm
import yfinance as yf

# Get stock returns (dependent variable)
stock = yf.download('AAPL', start='2018-01-01', progress=False)
stock_returns = stock['Adj Close'].pct_change().resample('M').agg(
    lambda x: (1 + x).prod() - 1
)

# Align data by month
stock_returns.index = stock_returns.index.to_period('M')
ff3_monthly = ff3.resample('M').agg(lambda x: (1 + x).prod() - 1)
ff3_monthly.index = ff3_monthly.index.to_period('M')

# Merge and compute excess returns
data = pd.concat([stock_returns, ff3_monthly], axis=1).dropna()
data.columns = ['stock_ret', 'mkt_rf', 'smb', 'hml', 'rf']
data['excess_ret'] = data['stock_ret'] - data['rf']

# Regression
X = sm.add_constant(data[['mkt_rf', 'smb', 'hml']])
y = data['excess_ret']
model = sm.OLS(y, X).fit()
print(model.summary())

The output you care about: α (alpha) is the intercept coefficient. A positive, statistically significant alpha means the stock generated returns beyond what its factor exposures would predict. A negative alpha means it underperformed. The mkt_rf coefficient is your market beta — if it is 1.2, the stock moves 20% more than the market on average. The smb and hml loadings tell you the stock's size and value tilt.

Most individual stocks will show low R-squared values (0.2 to 0.4 is normal). Factor models are designed for diversified portfolios, not individual securities. The alpha on a single stock is mostly noise; the factor loadings are the useful signal.

Extending to the 5-Factor Model

Adding profitability and investment factors is a mechanical extension — add two more columns to the regression:

ff5 = pd.read_csv(
    "https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/"
    "ftp/F-F_Research_Data_5_Factors_2x3_CSV.zip",
    skiprows=3, index_col=0, parse_dates=True
)
ff5.index = pd.to_datetime(ff5.index, format='%Y%m')
ff5 = ff5.iloc[:-1] / 100  # Convert to decimal

# Regression with 5 factors
X5 = sm.add_constant(data[['mkt_rf', 'smb', 'hml', 'rmw', 'cma']])
model5 = sm.OLS(y, X5).fit()

The practical question is whether five factors meaningfully improve on three. For US large-cap stocks, the answer is usually no — the incremental R-squared from adding RMW and CMA is small. For small-cap portfolios and international equities, the five-factor model explains significantly more variance. The right model depends on your portfolio composition. Run both and compare the adjusted R-squared and AIC.

What Factor Loadings Actually Tell You

The coefficients are not abstract statistics — they describe your portfolio's behavior in language that maps directly to market regimes:

High market beta (>1.1): Your portfolio will outperform in bull markets and get wrecked in corrections. This is the dominant source of returns for most portfolios. If your alpha disappears when you add market beta to the model, you are not generating alpha — you are using leverage.
Positive SMB loading: You own smaller companies. Historically, small caps outperform large caps over long periods, but they also suffer deeper drawdowns in recessions. A portfolio with SMB > 0.5 is making a size bet whether you intended to or not.
Positive HML loading: Value tilt. You own stocks with low price-to-book ratios. This factor has underperformed since roughly 2007 — a 15-year drought that led some to declare value investing dead. It has also historically delivered the highest long-term premium of any factor besides market beta. Whether you believe the value premium still exists is a philosophical question; the HML loading tells you whether your portfolio is exposed to it.

warning

Factor loadings are backward-looking. A portfolio's historical SMB loading does not predict future small-cap exposure — it describes what already happened. If you changed your strategy six months ago, the regression on five years of data will obscure the regime shift. Re-run the model on rolling 36-month windows to detect changes in factor exposures over time.

The Factor Zoo Problem

One thing the academic literature does not tell you when you first read Fama-French: there are now over 400 published factors in the finance literature. Researchers have documented premiums for everything from share issuance to brand visibility to air quality. Most of these are statistical artifacts — data-mined patterns that disappear out of sample. This is the "factor zoo" critique.

For a developer building an investment process, the practical takeaway is to stick with the factors that have both a statistical premium and a behavioral or risk-based explanation. Market beta (risk), size (liquidity risk), value (behavioral overreaction), profitability (quality persistence), and momentum (underreaction to news) survive replication tests. The 12-month industry-adjusted gross profitability factor published in a 2016 working paper probably does not.

Start with Fama-French 3-factor on your portfolio. If the alpha is near zero and the R-squared is above 0.8, your returns are adequately explained by market, size, and value exposures. You are not a stock-picking genius — you are a factor allocator, which is a perfectly respectable thing to be and substantially easier to maintain than an edge in individual security selection.

Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.