DEV Community

LogicDev-tools
LogicDev-tools

Posted on

My Backtest Was Too Good — Here’s How I Caught the Lie

I still remember the chart.

Perfect equity curve, tiny drawdowns, beautiful Sharpe. The kind of backtest that makes you want to quit your job tomorrow.

I’ve built and backtested dozens of strategies — most of them failed quietly, but one almost fooled me publicly.

And that was the problem.

If you spend any time on r/algotrading, you see the same warning on every “too good” backtest:

Watch out for lookahead bias.

Watch out for indicators that effectively repaint.

Make sure you are not using information that didn’t exist at decision time.

So instead of celebrating, I got suspicious.

The Moment It Felt Wrong

The strategy itself was boringly simple: daily bars, a couple of indicators, nothing exotic.

Yet the equity curve looked like a brochure for a quant hedge fund — smooth, relentless, almost no pain. In real trading, even good strategies spend a lot of time hurting.

So I started asking a few uncomfortable questions:

  • Am I resampling data and then trading on a lower timeframe?
  • Am I merging datasets assuming perfect timestamp alignment?
  • Is future close or high sneaking into my decision logic?

Spoiler: yes.

Here’s the Trap in One Image

→ A bar labeled at 10:00

→ But filled with data up to 10:59

→ Your strategy decides at 10:00

→ Using info that only exists at 11:00.

That’s not forecasting — it’s time travel.

Once you realize this, the “too good” curve suddenly looks less like alpha and more like cheating.

backtest-guard: A 12-Line Linter for Backtest Honesty

After fixing this once, I wrote a minimal checker — think pylint, but for backtest integrity.

Checks What It Catches Example
Timestamp Sanity Future data in same-row features close_5min used at t=10:00 but filled at 10:05
Merge Integrity Joins that leak future values Merging daily OHLC into 1-min bars without shift(1)
Signal Causality Decisions using non-available data Signal based on resample('H').max() without shift(1)

It doesn’t prove your backtest is perfect — but it catches the 80% of mistakes that make equity curves “too good to be true.”

From the outside, it’s deliberately minimal:

  • Input: .py strategy or .csv backtest log
  • Output: Plain-text report with actionable flags: ⚠️ Rolling window without .shift(1) — likely lookahead ⚠️ Non-chronological timestamps. Example: row 42 (2025-01-01 10:00) < row 41 (2025-01-01 10:01)

No installation. No dependencies beyond pandas.

Just truth.

Why This Works

  • Catches most disasters in <1 sec (they stem from timestamps, resampling, merges — not models)
  • Works on code and data — framework-agnostic
  • Pinpoints exact columns/rows for fast debugging or CI integration

Now, this runs before every serious backtest — like pytest, but for honesty.


Try It

📥 Gist: backtest-guard.py


bash
python backtest-guard.py my_strategy.py
python backtest-guard.py backtest_trades.csv
# Or pipe: cat strategy.py | python backtest-guard.py -
If your equity curve drops 20% after running this — great.
You just saved weeks of chasing ghosts.

Breathe easier — or fix the leak before it’s too late.
Enter fullscreen mode Exit fullscreen mode

Top comments (0)