DEV Community

TildAlice
TildAlice

Posted on • Originally published at tildalice.io

NumPy Vectorization Cuts Cointegration Test Time by 8x

The Loop That Took 47 Seconds

Running cointegration tests across 500 stock pairs shouldn't take 47 seconds. But there I was, staring at a progress bar that moved like it was stuck in molasses. The bottleneck? A nested Python loop computing the Engle-Granger test for every pair in my watchlist.

The fix took the runtime from 47 seconds to 5.8 seconds. No fancy libraries, no Cython, no multiprocessing — just NumPy vectorization done properly.

Here's what the slow version looked like. This is representative of code I've seen in dozens of pairs trading implementations:

import numpy as np
from statsmodels.tsa.stattools import coint
import time

def slow_cointegration_matrix(price_matrix):
    """price_matrix: shape (n_days, n_assets)"""
    n_assets = price_matrix.shape[1]
    pvalues = np.zeros((n_assets, n_assets))

    for i in range(n_assets):
        for j in range(i + 1, n_assets):
            # statsmodels coint returns (t-stat, pvalue, crit_values)
            _, pval, _ = coint(price_matrix[:, i], price_matrix[:, j])
            pvalues[i, j] = pval
            pvalues[j, i] = pval

    return pvalues
Enter fullscreen mode Exit fullscreen mode

For 500 assets, that's 124,750 pairs. Each coint() call runs an OLS regression, computes residuals, then performs an ADF test on those residuals. The Python interpreter overhead on 124,750 iterations adds up fast.


Continue reading the full article on TildAlice

Top comments (0)