DEV Community

Cover image for Changepoint Detection: Finding Regime Shifts in Financial Data
Berkan Sesen
Berkan Sesen

Posted on • Originally published at sesen.ai

Changepoint Detection: Finding Regime Shifts in Financial Data

Markets do not stay in one regime. The S&P 500 can cruise at 10% annualised volatility for months, then a crisis hits and volatility doubles overnight. Any model trained on the calm period is useless in the turbulent one, and you often do not realise the regime changed until the losses tell you.

The question isn't whether markets change regimes. They obviously do. The question is: can you detect when the shift happened, automatically and after the fact, so you can segment your data into homogeneous periods for backtesting, risk management, or model selection? That's what changepoint detection does. It finds the points in a time series where the statistical properties (mean, variance, or both) shift abruptly.

By the end of this post, you'll detect volatility regime shifts in 40 years of S&P 500 data using the PELT algorithm, compare it with Binary Segmentation, understand how the penalty parameter controls the trade-off between sensitivity and false positives, and see how detected changepoints align with real financial crises.

The Data: 40 Years of S&P 500 Returns

We use daily S&P 500 data from January 1985 to May 2025: over 10,000 trading days covering Black Monday, the dot-com bubble, the Global Financial Crisis, COVID, and the 2022 rate hike sell-off. This is the same index used in the original R analysis we're translating, extended to cover the full modern era.

S&P 500 price chart with colour-coded volatility regimes detected by PELT, showing green for calm periods and red for turbulent ones

The colour-coded background shows volatility regimes detected by PELT: green for calm periods, red/yellow for turbulent ones. The algorithm picks up every major crisis without being told when they occurred.

Quick Win: Detecting Regime Shifts

Click the badge to run this yourself:

Open In Colab

import numpy as np
import pandas as pd
import yfinance as yf
import ruptures as rpt

# Download 40 years of S&P 500 data
sp = yf.download("^GSPC", start="1984-12-25", end="2025-06-01",
                 auto_adjust=True, progress=False)
sp = sp.loc["1985-01-02":]
close = sp["Close"].squeeze()

# Compute log returns
log_ret = np.log(close / close.shift(1)).dropna()
signal = log_ret.values.reshape(-1)
print(f"{len(signal)} trading days, {log_ret.index[0].date()} to {log_ret.index[-1].date()}")
Enter fullscreen mode Exit fullscreen mode
10181 trading days, 1985-01-03 to 2025-05-30
Enter fullscreen mode Exit fullscreen mode

Now the key part: fit PELT and detect changepoints.

# PELT with normal model (detects mean + variance shifts)
algo = rpt.Pelt(model="normal", min_size=10, jump=5).fit(signal)
breakpoints = algo.predict(pen=20)
n_changepoints = len(breakpoints) - 1
print(f"Detected {n_changepoints} changepoints")
Enter fullscreen mode Exit fullscreen mode
Detected 45 changepoints
Enter fullscreen mode Exit fullscreen mode

PELT found 45 points where the statistical character of daily returns shifted. Each one marks the boundary between two volatility regimes.

What Just Happened?

The Changepoint Problem

A changepoint is a point in time where the data-generating process changes. Before the changepoint, the data follows one distribution (say, low-volatility returns). After it, the data follows a different distribution (high-volatility returns). The goal is to find all such points, given only the data.

Formally, we're partitioning the time series $y_1, \ldots, y_n$ into segments $[t_{k-1}+1, \ldots, t_k]$ such that the data within each segment is statistically homogeneous, but adjacent segments differ.

The Cost Function Approach

Changepoint detection frames this as an optimisation problem. Define a cost function $C(y_{a:b})$ that measures how well a single distribution fits the data from time $a$ to time $b$. For the normal model, this is the negative log-likelihood:

equation

where $\hat{\sigma}^2_{a:b}$ is the sample variance of the segment. Low-variance segments have low cost; segments that mix two regimes have high cost because the combined variance is inflated.

The total cost for a segmentation with $K$ changepoints is:

equation

where $\beta$ is the penalty per changepoint. Without the penalty, the optimal solution would put a changepoint at every observation. The penalty forces the algorithm to justify each additional changepoint by a sufficient reduction in cost.

Reading the Output

The predict() call returns a list of indices where the data shifts. Each index marks the end of a segment. The last element is always $n$ (the end of the series). So 45 changepoints means 46 segments, each with its own statistical character.

Daily log returns colour-coded by volatility regime, with high-volatility periods in red and calm periods in green

The regime colouring makes the structure visible at a glance. The 2008 financial crisis, COVID crash, and Black Monday are obvious high-volatility regimes (red). The long calm periods of the mid-1990s and 2013-2017 are green. The algorithm finds this structure purely from the data, with no knowledge of financial history.

Why Not Just Use a Rolling Window?

A rolling standard deviation can show volatility trends, but it has two problems. First, it smears the boundary: the window gradually incorporates new-regime data, so the transition is blurred over the window length. Second, you have to choose the window size, which is itself a regime-dependent parameter. Changepoint detection gives you sharp boundaries and determines the regime structure from the data itself.

Going Deeper

PELT: Linear-Time Exact Changepoint Detection

The naive approach to finding optimal changepoints is $O(Qn^2)$ where $Q$ is the number of changepoints and $n$ is the series length. For 10,000 data points, this is impractical.

Killick, Fearnhead, and Eckley (2012) introduced PELT (Pruned Exact Linear Time), which achieves exact optimality with $O(n)$ average complexity. The trick is a pruning rule: at each step, PELT checks whether a candidate changepoint can ever be part of the optimal solution. If not, it's discarded permanently. Under mild conditions (the penalty is at least logarithmic in $n$), most candidates are pruned early, giving linear runtime.

# PELT is the default and best choice for most problems
algo = rpt.Pelt(model="normal", min_size=10, jump=5).fit(signal)
breakpoints = algo.predict(pen=20)
Enter fullscreen mode Exit fullscreen mode

The jump=5 parameter means PELT considers every 5th index as a candidate changepoint. This trades a tiny amount of precision for a 5x speed improvement on long series. For 10,000+ points, this is a practical necessity.

Binary Segmentation: The Fast Approximation

Binary Segmentation (Scott and Knott, 1974) takes a different approach: find the single best changepoint, split the data there, then recursively find the best changepoint in each half. It's $O(n \log n)$ but only approximate: it can miss changepoints that are only visible when nearby changepoints are already accounted for.

algo_bs = rpt.Binseg(model="normal", min_size=10, jump=5).fit(signal)
bkps_bs = algo_bs.predict(n_bkps=45)  # fix the number of changepoints
Enter fullscreen mode Exit fullscreen mode

Comparison of PELT and Binary Segmentation on the same data, both with 45 changepoints, showing similar but not identical regime boundaries

With the same number of changepoints, PELT and Binary Segmentation produce similar but not identical segmentations. The key differences show up in transition periods (like 2001-2003) where PELT, being globally optimal, finds the true boundaries while BinSeg may misplace them by a few weeks.

The Penalty: Balancing Sensitivity and Specificity

The penalty parameter $\beta$ is the single most important choice. Too low, and you detect hundreds of spurious changepoints (noise). Too high, and you miss real regime shifts.

Penalty sensitivity curve showing how the number of changepoints drops from over 700 to near zero as the penalty increases from 1 to 1000 on a log scale

At $\beta = 1$, PELT finds over 700 changepoints (roughly one every two weeks). At $\beta = 200$, it finds only 8. Our choice of $\beta = 20$ gives 45 changepoints, which aligns well with known financial events.

Common penalty selection methods:

Method Formula When to use
BIC $\beta = p \cdot \log n$ Default for most problems; $p$ is the number of parameters per segment
AIC $\beta = 2p$ More liberal (more changepoints)
Manual Domain-specific When you know roughly how many regimes to expect

For 10,181 data points with a normal model ($p = 2$), BIC gives $\beta = 2 \times \log(10181) \approx 18.4$, which is close to our $\beta = 20$.

How Long Do Regimes Last?

The distribution of regime durations tells us something about market microstructure:

Histogram of regime durations showing a right-skewed distribution with median 152 days and mean 221 days

The median regime lasts about 152 trading days (~7 months), but the distribution is heavily right-skewed: a few calm regimes persist for years (the longest is 1,000 trading days, roughly 4 years), while crisis regimes tend to be short and sharp. The mean (221 days) exceeds the median because of these long calm tails.

Changepoints Aligned with Financial Events

The real test of a changepoint method is whether it finds meaningful structure. Here's PELT with a slightly higher penalty ($\beta = 50$, 23 changepoints) overlaid with known financial events:

Dual-panel chart showing S&P 500 price with changepoint lines and annotated financial events in the top panel, and colour-coded absolute returns in the bottom panel

The algorithm detects Black Monday (7 days early), the Lehman collapse (11 days early), the COVID crash (20 days early), and the 2011 US downgrade (8 days early). It "detects" these events early because the volatility regime actually shifts before the headline event. Markets often become turbulent in the days before the crash itself, as informed participants reposition.

The Exponential Distribution Connection

The original R code used squared returns (a proxy for variance) and treated them as exponentially distributed. This makes physical sense: if log returns are normally distributed with variance $\sigma^2$, then squared returns follow a scaled chi-squared distribution with one degree of freedom, which is well-approximated by an exponential.

Histogram of squared daily returns overlaid with an exponential distribution fit, showing the characteristic L-shape

The exponential fit captures the overall shape. The histogram is dominated by near-zero values (calm days), with a long right tail of extreme moves (crises). Changepoints in the rate parameter of this exponential correspond to shifts between low-variance and high-variance regimes.

Changepoints vs HMMs: Hard Boundaries vs Soft Probabilities

Our HMM post tackled regime detection from a different angle: hidden Markov models assign each time step a probability of being in each regime, with smooth transitions. Changepoint detection gives hard boundaries with no transition period.

Feature Changepoint (PELT) HMM
Boundaries Hard (exact day) Soft (probability of each regime)
Regime count Penalty-controlled Fixed by model specification
Stationarity Assumes segments are stationary Allows within-regime dynamics
Speed $O(n)$ with PELT $O(nK^2)$ where $K$ = states
Best for Post-hoc analysis, segmentation Online filtering, forecasting

Use PELT when you want to segment historical data into clean periods (for backtesting, risk measurement, or training separate models per regime). Use HMMs when you need real-time regime probabilities or when transitions are gradual.

Hyperparameter Choices

Parameter Value Why
Data S&P 500, 1985-2025 Original R code index; 40 years covering major crises
Returns Log returns Standard in academic finance; continuous, symmetric
Model normal Detects changes in both mean and variance
Penalty 20 Close to BIC for n=10181; gives 45 changepoints aligned with events
min_size 10 Minimum 2 weeks between changepoints; avoids spurious detections
jump 5 Checks every 5th candidate; 5x speedup with negligible precision loss

Where This Comes From

Killick, Fearnhead, and Eckley (2012): The PELT Paper

Rebecca Killick, Paul Fearnhead, and Idris Eckley introduced PELT in their 2012 paper "Optimal Detection of Changepoints with a Linear Computational Cost", published in the Journal of the American Statistical Association.

The key insight is a pruning inequality. At time $t$, if adding a changepoint at some earlier time $s$ can never improve on the best segmentation up to $t$, then $s$ can be permanently removed from consideration. Formally, if:

equation

where $F(t)$ is the optimal cost up to time $t$, then $s$ can never be an optimal last changepoint for any future time $t' > t$. Under the condition that the cost function satisfies a "sufficient change" property, the number of candidates that survive pruning grows linearly with $n$.

"The resulting method, which we call PELT, is shown through simulation studies to be substantially more accurate than existing methods while being computationally more efficient."
-- Killick, Fearnhead & Eckley (2012)

The Broader Landscape

Changepoint detection has a long history. Scott and Knott (1974) introduced Binary Segmentation for grouping means in ANOVA. Bai and Perron (2003) developed efficient algorithms for multiple structural breaks in econometric models. Truong, Oudre, and Vayatis (2020) provide a comprehensive review of offline changepoint methods in their paper accompanying the ruptures library.

The field continues to evolve. Recent work includes Bayesian online changepoint detection (Adams and MacKay, 2007) for streaming data, and kernel-based methods that detect distributional changes beyond mean and variance.

Further Reading

Interactive Tools

Related Posts

Frequently Asked Questions

What is a changepoint in a time series?

A changepoint is a point in time where the statistical properties of the data shift abruptly. Before the changepoint, the data follows one distribution (for example, low-volatility returns), and after it, the data follows a different distribution (high-volatility returns). Changepoint detection algorithms find all such boundaries automatically, partitioning the series into statistically homogeneous segments.

How does PELT differ from Binary Segmentation?

PELT finds the globally optimal set of changepoints by considering all possible segmentations, using a pruning rule to achieve linear average runtime. Binary Segmentation is a greedy approximation that finds one changepoint at a time and recursively splits the data. PELT is exact and generally more accurate, while Binary Segmentation is faster in the worst case but can miss changepoints that are only visible when nearby boundaries are already accounted for.

How do I choose the penalty parameter?

The penalty controls the trade-off between sensitivity and false positives. A common default is the BIC penalty, which equals p times ln(n) where p is the number of parameters per segment and n is the series length. Lower penalties produce more changepoints (higher sensitivity), while higher penalties produce fewer (higher specificity). Domain knowledge about the expected number of regime shifts can also guide the choice.

Can PELT detect changes in mean, variance, or both?

Yes, depending on the cost model you specify. The "normal" model detects changes in both mean and variance simultaneously. You can also use models that target only the mean ("l2") or only the variance. For financial data, the normal model is typically the best choice because crises often shift both the average return and the volatility at the same time.

What is the difference between changepoint detection and Hidden Markov Models?

Changepoint detection produces hard boundaries at specific time points and is best suited for post-hoc analysis where you want to segment historical data into clean periods. Hidden Markov Models assign each time step a probability of belonging to each regime, allowing for soft transitions and real-time filtering. Use changepoint detection for backtesting and retrospective segmentation, and HMMs when you need online regime probabilities.

Why does PELT sometimes detect regime shifts before the actual crisis event?

PELT detects the point where the statistical character of returns changes, not the headline event itself. Markets often become turbulent in the days before a crash as informed participants reposition and implied volatility rises. The algorithm correctly identifies this pre-crisis volatility expansion as the true start of the new regime, even though the most dramatic price move comes later.

Top comments (0)