The Loop That Took 47 Seconds
Running cointegration tests across 500 stock pairs shouldn't take 47 seconds. But there I was, staring at a progress bar that moved like it was stuck in molasses. The bottleneck? A nested Python loop computing the Engle-Granger test for every pair in my watchlist.
The fix took the runtime from 47 seconds to 5.8 seconds. No fancy libraries, no Cython, no multiprocessing — just NumPy vectorization done properly.
Here's what the slow version looked like. This is representative of code I've seen in dozens of pairs trading implementations:
import numpy as np
from statsmodels.tsa.stattools import coint
import time
def slow_cointegration_matrix(price_matrix):
"""price_matrix: shape (n_days, n_assets)"""
n_assets = price_matrix.shape[1]
pvalues = np.zeros((n_assets, n_assets))
for i in range(n_assets):
for j in range(i + 1, n_assets):
# statsmodels coint returns (t-stat, pvalue, crit_values)
_, pval, _ = coint(price_matrix[:, i], price_matrix[:, j])
pvalues[i, j] = pval
pvalues[j, i] = pval
return pvalues
For 500 assets, that's 124,750 pairs. Each coint() call runs an OLS regression, computes residuals, then performs an ADF test on those residuals. The Python interpreter overhead on 124,750 iterations adds up fast.
Continue reading the full article on TildAlice
Top comments (0)