I still remember the chart.
Perfect equity curve, tiny drawdowns, beautiful Sharpe. The kind of backtest that makes you want to quit your job tomorrow.
I’ve built and backtested dozens of strategies — most of them failed quietly, but one almost fooled me publicly.
And that was the problem.
If you spend any time on r/algotrading, you see the same warning on every “too good” backtest:
Watch out for lookahead bias.
Watch out for indicators that effectively repaint.
Make sure you are not using information that didn’t exist at decision time.
So instead of celebrating, I got suspicious.
The Moment It Felt Wrong
The strategy itself was boringly simple: daily bars, a couple of indicators, nothing exotic.
Yet the equity curve looked like a brochure for a quant hedge fund — smooth, relentless, almost no pain. In real trading, even good strategies spend a lot of time hurting.
So I started asking a few uncomfortable questions:
- Am I resampling data and then trading on a lower timeframe?
- Am I merging datasets assuming perfect timestamp alignment?
- Is future
closeorhighsneaking into my decision logic?
Spoiler: yes.
Here’s the Trap in One Image
→ A bar labeled at 10:00
→ But filled with data up to 10:59
→ Your strategy decides at 10:00
→ Using info that only exists at 11:00.
That’s not forecasting — it’s time travel.
Once you realize this, the “too good” curve suddenly looks less like alpha and more like cheating.
backtest-guard: A 12-Line Linter for Backtest Honesty
After fixing this once, I wrote a minimal checker — think pylint, but for backtest integrity.
| Checks | What It Catches | Example |
|---|---|---|
| Timestamp Sanity | Future data in same-row features |
close_5min used at t=10:00 but filled at 10:05
|
| Merge Integrity | Joins that leak future values | Merging daily OHLC into 1-min bars without shift(1)
|
| Signal Causality | Decisions using non-available data | Signal based on resample('H').max() without shift(1)
|
It doesn’t prove your backtest is perfect — but it catches the 80% of mistakes that make equity curves “too good to be true.”
From the outside, it’s deliberately minimal:
-
Input:
.pystrategy or.csvbacktest log -
Output: Plain-text report with actionable flags:
⚠️ Rolling window without .shift(1) — likely lookahead⚠️ Non-chronological timestamps. Example: row 42 (2025-01-01 10:00) < row 41 (2025-01-01 10:01)
No installation. No dependencies beyond pandas.
Just truth.
Why This Works
- Catches most disasters in <1 sec (they stem from timestamps, resampling, merges — not models)
- Works on code and data — framework-agnostic
- Pinpoints exact columns/rows for fast debugging or CI integration
Now, this runs before every serious backtest — like pytest, but for honesty.
Try It
bash
python backtest-guard.py my_strategy.py
python backtest-guard.py backtest_trades.csv
# Or pipe: cat strategy.py | python backtest-guard.py -
If your equity curve drops 20% after running this — great.
You just saved weeks of chasing ghosts.
Breathe easier — or fix the leak before it’s too late.
Top comments (0)