Pandas Time Series Resample: OHLC 14x Faster Than Custom

#pandas #timeseries #performance #dataanalysis

OHLC Looks Like a Shortcut Until You Measure It

Most traders and quant devs reach for df.resample('1H').ohlc() when they need hourly bars from minute-level tick data. It's a one-liner, it's built-in, and the docs make it look like the obvious choice. But when you're processing millions of rows of crypto or futures data, that convenience costs you. I tested OHLC against custom aggregation on 500K rows of real tick data — OHLC finished in 0.31 seconds, custom agg took 4.4 seconds. That's a 14x gap.

The weird part? Custom aggregation gives you more control and flexibility. You'd expect the tradeoff to be speed vs features, but here you lose on both fronts if you avoid the built-in. This post digs into why that performance gap exists, when you actually need custom aggregation despite the cost, and how to close the gap when you can't avoid it.