Berkan Sesen

Posted on May 24 • Originally published at sesen.ai

Cointegration and Pairs Trading: When Time Series Move Together

#timeseries #quantfinance #statistics

Pairs trading rests on a simple idea: find two assets that move together, wait for them to diverge, and bet on convergence. The hard part is defining "move together." Two commodity ETFs, EWA (Australia) and EWC (Canada), show a 0.95 correlation over a multi-year window. A mean-reversion trader sees that number and assumes the spread will keep snapping back. But then the spread drifts apart and stays apart for months. The correlation was real; the strategy still bled money. The problem is that correlation tells you whether two series tend to move in the same direction. Cointegration tells you whether they are bound together by a long-run equilibrium, so that any deviation is temporary and will correct itself.

The distinction matters because most financial time series are non-stationary (they wander without a fixed mean). Two non-stationary series can be highly correlated by pure coincidence (the "spurious regression" problem identified by Granger and Newbold, 1974). Cointegration is the formal test for whether their difference (or some linear combination) is stationary, meaning it genuinely reverts to a mean.

By the end of this post, you'll test for cointegration using both the Engle-Granger and Johansen methods, understand when and why they disagree, and build a simple pairs trading strategy on real ETF data.

The Data: Country ETF Pairs

We use two iShares country ETFs: EWA (Australia) and EWC (Canada). Both countries are commodity exporters with similar economic drivers (mining, energy, agriculture), so there's a fundamental reason to expect a long-run relationship. This is the same pair used in the original R analysis we're translating.

For comparison, we also test GLD (gold) and GDX (gold miners). Despite the obvious connection, gold miners have idiosyncratic risks (management, costs, leverage) that can break cointegration.

The two ETFs clearly track each other over 17 years. They crash together in 2008, recover together, and diverge temporarily during COVID before reconverging. But visual similarity isn't proof of cointegration. We need a formal test.

Quick Win: Testing for Cointegration

Click the badge to run this yourself:

import numpy as np
import pandas as pd
import yfinance as yf
from statsmodels.tsa.stattools import adfuller

# Download EWA and EWC adjusted close prices
ewa = yf.download("EWA", start="2007-01-01", end="2023-12-31",
                   auto_adjust=True, progress=False)["Close"]
ewc = yf.download("EWC", start="2007-01-01", end="2023-12-31",
                   auto_adjust=True, progress=False)["Close"]

# Align on common trading days
common = ewa.index.intersection(ewc.index)
ewa, ewc = ewa.loc[common], ewc.loc[common]
print(f"{len(ewa)} trading days, {ewa.index[0].date()} to {ewa.index[-1].date()}")

4278 trading days, 2007-01-03 to 2023-12-29

The Engle-Granger test is two steps: regress one series on the other, then test whether the residuals are stationary.

from statsmodels.regression.linear_model import OLS

# Regress EWC on EWA (no intercept, following the original R code)
model = OLS(ewc.values, ewa.values).fit()
spread = model.resid
print(f"Hedge ratio: {model.params[0]:.4f}")

# ADF test on the residuals
adf_stat, adf_pval, _, _, crit_vals, _ = adfuller(spread, regression="n")
print(f"ADF statistic: {adf_stat:.4f}")
print(f"p-value: {adf_pval:.4f}")

Hedge ratio: 1.5674
ADF statistic: -3.1704
p-value: 0.0015

The ADF test rejects the unit root null hypothesis at the 1% level (p = 0.0015). The spread between EWC and 1.57 times EWA is stationary, meaning these two ETFs are cointegrated. Any deviation from the long-run relationship tends to correct itself.

The spread looks like this:

The spread wanders but always returns to the mean. It doesn't drift permanently in one direction like a random walk would. This mean-reverting property is exactly what makes cointegration useful for trading.

What Just Happened?

Stationarity: The Key Idea

A stationary time series has a constant mean and variance over time. If you pick any window, the statistics look roughly the same. Stock prices are almost never stationary (they trend up or down), but the spread between two cointegrated stocks can be.

The Augmented Dickey-Fuller (ADF) test checks whether a series has a unit root (non-stationary). The null hypothesis is "this series has a unit root" (bad for us). A small p-value means we can reject the null and conclude the series is stationary (good for us).

The Engle-Granger Two-Step Method

Engle and Granger (1987) proposed a beautifully simple procedure:

Regress one time series on the other: $\text{EWC}_t = \beta \cdot \text{EWA}_t + \varepsilon_t$
Test the residuals $\varepsilon_t$ for stationarity using the ADF test

If the residuals are stationary, the two series are cointegrated with cointegrating vector $[1, -\beta]$ . The coefficient $\beta = 1.57$ is the hedge ratio: for every dollar of EWC, you hold $1.57 of EWA to neutralise the common trend.

One subtlety: which series is the dependent variable matters. The original R code runs both directions (EWC on EWA and EWA on EWC) and picks the regression with the most negative ADF statistic. In our case, both directions give similar results.

Why Not Just Use Correlation?

Two series can have a correlation of 0.99 and still not be cointegrated. Imagine two random walks that happen to trend upward over the same period. Their correlation will be high, but their spread will drift without bound. Conversely, two cointegrated series can have low short-term correlation if they temporarily diverge before snapping back. Correlation measures co-movement; cointegration measures co-wandering with a leash.

The Johansen Test: A Multivariate Approach

The Engle-Granger method is limited to pairs. The Johansen test, introduced by Johansen (1991), handles any number of time series simultaneously. It works through a vector autoregression (VAR) framework and estimates the cointegration rank: how many independent cointegrating relationships exist among the series.

from statsmodels.tsa.vector_ar.vecm import coint_johansen

data = np.column_stack([ewa.values, ewc.values])
result = coint_johansen(data, det_order=0, k_ar_diff=1)

print(f"Trace statistic (r=0): {result.lr1[0]:.2f}")
print(f"95% critical value:    {result.cvt[0, 1]:.2f}")

Trace statistic (r=0): 16.66
95% critical value:    15.49

The trace statistic (16.66) exceeds the 95% critical value (15.49), so Johansen also rejects the null of no cointegration. Both methods agree for EWA/EWC.

When Tests Disagree

The original R code used a shorter date range where Engle-Granger found marginal cointegration (p = 7%) but Johansen did not. This highlights an important practical point: cointegration tests are sensitive to sample period, structural breaks, and lag selection. The 2008 financial crisis, for instance, can distort the relationship. When the two tests disagree, it's usually a sign that cointegration is weak or regime-dependent, not a reason to pick the more favourable result.

Going Deeper

Pairs Trading: Exploiting Mean Reversion

If the spread is stationary, we can trade its mean-reversion. The strategy is simple:

Compute a rolling z-score of the spread: $z_t = \frac{s_t - \bar{s}_{60}}{\sigma_{60}}$
Buy the spread when $z < -2$ (spread is unusually cheap)
Sell the spread when $z > +2$ (spread is unusually expensive)
Close when $z$ crosses zero (spread has reverted)

"Buying the spread" means going long EWC and short EWA (proportional to the hedge ratio). "Selling the spread" means the opposite.

The z-score oscillates between roughly -4 and +4, regularly crossing the trading thresholds. Each crossing is a potential trade entry or exit.

Backtest Results

Running this simple strategy over 17 years of EWA/EWC data:

The strategy generates a cumulative PnL of about $19 per unit of spread, with 135 trades and an annualised Sharpe ratio of 0.69. The equity curve is mostly upward-sloping, with a significant drawdown during 2012-2014 when the spread drifted for an extended period.

This is a toy backtest (no transaction costs, slippage, or financing costs). Real implementation requires careful attention to execution, but the core signal (mean-reverting spread) is genuine.

A Pair That Fails: GLD vs GDX

To see what non-cointegration looks like, consider GLD (gold) and GDX (gold miners). Despite the intuitive connection, gold miners have company-specific risks that break the long-run equilibrium.

The ADF test statistic for EWA/EWC (-3.17) is well past all critical values. For GLD/GDX (-1.64), it fails even the 10% level. The Johansen test confirms: GLD/GDX shows no evidence of cointegration (trace stat 13.38 < 15.49 critical value).

This is why fundamental reasoning alone isn't enough. You need the statistical test.

Autocorrelation: Visual Evidence of Stationarity

The autocorrelation function (ACF) of the spread provides visual confirmation:

The ACF decays slowly from 1.0, which is typical for a stationary but highly persistent process. A truly non-stationary series would show autocorrelations that barely decay at all. The gradual decline confirms the spread reverts, but slowly (mean half-life of roughly 3 months based on the decay rate).

Hyperparameter Choices

Parameter	Value	Why
ETF pair	EWA/EWC	Original R code pair; commodity exporters with fundamental economic link
Date range	2007-2023	17 years covering multiple market regimes (GFC, COVID)
ADF regression	No intercept	Matches original R code (`type="nc"`); spread should be zero-mean
Johansen settings	`det_order=0, k_ar_diff=1`	Matches R `ecdet="none", K=2`
Z-score window	60 days	~3 months; balances responsiveness with stability
Entry threshold	±2σ	Standard for pairs trading; ~5% of observations in tails
Exit threshold	0	Close when spread returns to rolling mean

Where This Comes From

Engle and Granger (1987): The Nobel Prize Paper

Robert Engle and Clive Granger introduced cointegration in their 1987 paper "Co-Integration and Error Correction: Representation, Estimation, and Testing", published in Econometrica. The work earned Granger the Nobel Prize in Economics in 2003 (shared with Engle, who was recognised for ARCH models).

Their key insight was that while individual economic time series may be non-stationary (integrated of order 1, or I(1)), linear combinations of them can be stationary (I(0)). This formalised the intuition that certain economic variables are "tied together" by equilibrium forces, even though each variable wanders on its own.

"A test for cointegration can be thought of as a pre-test to avoid 'spurious regression' situations."
-- Engle & Granger (1987)

The two-step procedure we implemented (regress, then test residuals) is their original method. It's simple, intuitive, and remains the most widely used cointegration test for pairs.

Johansen (1991): The Multivariate Extension

Soren Johansen's 1991 paper "Estimation and Hypothesis Testing of Cointegration Vectors in Gaussian Vector Autoregressive Models" extended cointegration testing to any number of variables. Instead of running pairwise regressions, Johansen's trace test estimates the rank of the cointegration matrix directly using eigenvalue decomposition.

For two variables, the Johansen test and Engle-Granger usually agree. For three or more (e.g., a basket of commodity ETFs), Johansen is the only practical option.

The Dickey-Fuller Foundation

Both methods ultimately rely on the Augmented Dickey-Fuller test (Dickey & Fuller, 1979) to detect unit roots. The ADF test fits the model $\Delta y_t = \alpha y_{t-1} + \sum \gamma_i \Delta y_{t-i} + \varepsilon_t$ and tests whether $\alpha = 0$ (unit root) vs $\alpha < 0$ (stationary). The test statistic doesn't follow a standard t-distribution, so special critical values (tabulated by Dickey and Fuller) are needed.

Pairs Trading in Practice

The academic foundation for pairs trading was established by Gatev, Goetzmann, and Rouwenhorst (2006) in "Pairs Trading: Performance of a Relative-Value Arbitrage Rule". They analysed pairs trading on US equities from 1962 to 2002 and found average annualised returns of about 11% for the best pairs.

For a comprehensive treatment, Vidyamurthy (2004) Pairs Trading: Quantitative Methods and Analysis covers the full pipeline from pair selection to execution.

Interactive Tools

Black-Scholes Calculator — Price options on the assets in your pairs trades
Kelly Criterion Calculator — Determine optimal position sizing for your trading strategy
Drawdown Calculator — Analyse portfolio drawdowns and risk metrics

Hidden Markov Models: When Clusters Have Memory (regime detection in time series)
MCMC for Mixture Models: Inferring Earthquake Regimes (detecting hidden regimes in count data)
Linear Regression Five Ways (the regression foundation that Engle-Granger builds on)
Maximum Likelihood Estimation from Scratch (the estimation framework underlying ADF tests)

Frequently Asked Questions

What is the difference between correlation and cointegration?

Correlation measures whether two series tend to move in the same direction over short periods. Cointegration tests whether a linear combination of two series is stationary, meaning deviations from their long-run relationship are temporary and self-correcting. Two highly correlated series can drift apart permanently, while two cointegrated series are bound by an equilibrium that pulls them back together.

Can cointegration break down over time?

Yes. Cointegration is not permanent. Structural changes in the economy, shifts in industry dynamics, or regulatory events can destroy a previously stable relationship. This is why practitioners re-test cointegration periodically using rolling windows and monitor spread behaviour for signs of regime change.

Why does the Engle-Granger test sometimes disagree with the Johansen test?

The two tests use different methodologies. Engle-Granger runs a single regression and tests the residuals, while Johansen uses a vector autoregression framework. They can disagree when cointegration is weak, when the sample period includes structural breaks, or when lag selection differs. Disagreement is usually a warning sign that the relationship is fragile rather than robust.

What is a hedge ratio and why does it matter for pairs trading?

The hedge ratio is the coefficient from the cointegrating regression. It tells you how many units of one asset to hold against the other so that the combined position has a stationary spread. Getting the hedge ratio wrong means your spread will drift rather than revert, defeating the purpose of the strategy.

Is pairs trading still profitable in modern markets?

Academic evidence suggests that pairs trading returns have declined since the strategy became widely known in the early 2000s. However, it can still be profitable when applied to less liquid markets, when combined with fundamental analysis to select pairs, or when enhanced with more sophisticated signal generation. Transaction costs and execution quality are critical factors.

Do I need to difference the price series before testing for cointegration?

No. Cointegration testing specifically requires the original (undifferenced) price series. The whole point is to find a linear combination of non-stationary I(1) series that produces a stationary I(0) result. If you difference first, you remove the very relationship you are trying to detect.

DEV Community