LogicDev-tools

Posted on Dec 23, 2025

My Backtest Was Too Good — Here’s How I Caught the Lie

#python #trading #datascience #backtesting

I still remember the chart.

Perfect equity curve, tiny drawdowns, beautiful Sharpe. The kind of backtest that makes you want to quit your job tomorrow.

I’ve built and backtested dozens of strategies — most of them failed quietly, but one almost fooled me publicly.

And that was the problem.

If you spend any time on r/algotrading, you see the same warning on every “too good” backtest:

Watch out for lookahead bias.

Watch out for indicators that effectively repaint.

Make sure you are not using information that didn’t exist at decision time.

So instead of celebrating, I got suspicious.

The Moment It Felt Wrong

The strategy itself was boringly simple: daily bars, a couple of indicators, nothing exotic.

Yet the equity curve looked like a brochure for a quant hedge fund — smooth, relentless, almost no pain. In real trading, even good strategies spend a lot of time hurting.

So I started asking a few uncomfortable questions:

Am I resampling data and then trading on a lower timeframe?
Am I merging datasets assuming perfect timestamp alignment?
Is future close or high sneaking into my decision logic?

Spoiler: yes.

Here’s the Trap in One Image

→ A bar labeled at 10:00

→ But filled with data up to 10:59

→ Your strategy decides at 10:00

→ Using info that only exists at 11:00.

That’s not forecasting — it’s time travel.

Once you realize this, the “too good” curve suddenly looks less like alpha and more like cheating.

`backtest-guard`: A 12-Line Linter for Backtest Honesty

After fixing this once, I wrote a minimal checker — think pylint, but for backtest integrity.

Checks	What It Catches	Example
Timestamp Sanity	Future data in same-row features	`close_5min` used at `t=10:00` but filled at `10:05`
Merge Integrity	Joins that leak future values	Merging daily OHLC into 1-min bars without `shift(1)`
Signal Causality	Decisions using non-available data	Signal based on `resample('H').max()` without `shift(1)`

It doesn’t prove your backtest is perfect — but it catches the 80% of mistakes that make equity curves “too good to be true.”

From the outside, it’s deliberately minimal:

Input: .py strategy or .csv backtest log
Output: Plain-text report with actionable flags: ⚠️ Rolling window without .shift(1) — likely lookahead ⚠️ Non-chronological timestamps. Example: row 42 (2025-01-01 10:00) < row 41 (2025-01-01 10:01)

No installation. No dependencies beyond pandas.

Just truth.

Why This Works

Catches most disasters in <1 sec (they stem from timestamps, resampling, merges — not models)
Works on code and data — framework-agnostic
Pinpoints exact columns/rows for fast debugging or CI integration

Now, this runs before every serious backtest — like pytest, but for honesty.

Try It

📥 Gist: backtest-guard.py


bash
python backtest-guard.py my_strategy.py
python backtest-guard.py backtest_trades.csv
# Or pipe: cat strategy.py | python backtest-guard.py -
If your equity curve drops 20% after running this — great.
You just saved weeks of chasing ghosts.

Breathe easier — or fix the leak before it’s too late.

DEV Community

My Backtest Was Too Good — Here’s How I Caught the Lie

The Moment It Felt Wrong

Here’s the Trap in One Image

`backtest-guard`: A 12-Line Linter for Backtest Honesty

From the outside, it’s deliberately minimal:

Why This Works

Try It

Top comments (0)

The Moment It Felt Wrong

Here’s the Trap in One Image

backtest-guard: A 12-Line Linter for Backtest Honesty

From the outside, it’s deliberately minimal:

Why This Works

Try It

`backtest-guard`: A 12-Line Linter for Backtest Honesty