My AI trading bot was showing 83% annualized returns in backtesting. I knew something was wrong.
No strategy consistently returns 83% annualized. So I audited my own backtest engine and found 4 distinct biases inflating results.
Bias 1: Signal-at-Close, Entry-at-Close (Lookahead Bias)
My original entry logic:
# WRONG
if signals[i] == 'BUY':
entry_price = bars[i].close # Using bar i's close for bar i's signal
A buy signal at the close of bar i cannot be acted on until the open of bar i+1.
# CORRECT
if signals[i] == 'BUY':
entry_price = bars[i + 1].open # Enter at next bar's open
This single fix knocked returns down substantially.
Bias 2: Monte Carlo Over Bars Instead of Trades
Shuffling bars destroys time series properties. The fix: shuffle trade P&L outcomes, not bars.
# CORRECT
trade_pnls = run_backtest(bars)
random.shuffle(trade_pnls) # Bootstrap trades, not bars
portfolio_return = sum(trade_pnls)
Bias 3: Survivorship Bias in Universe Selection
Testing on today's S&P 500 composition means testing on today's winners. Fix: point-in-time universe construction, or exclude stocks added in the last 18 months from historical backtests.
Bias 4: Parameter Fitting on Out-of-Sample Data
train_bars = bars[:int(len(bars) * 0.7)]
test_bars = bars[int(len(bars) * 0.7):]
best_params = optimize(strategy, train_bars) # Optimize on train only
test_returns = run_backtest(strategy, test_bars, best_params) # Evaluate on test
Results After Fixing All 4
| Metric | Before | After |
|---|---|---|
| Annualized Return | 83% | 12-18% |
| Sharpe Ratio | 4.2 | 0.9-1.3 |
| Max Drawdown | 4% | 18-22% |
Less exciting, but actually believable.
The Meta-Lesson
If your backtest shows exceptional returns, it's almost certainly wrong:
- Returns >30% annualized: lookahead bias likely
- Returns >50% annualized: multiple biases, audit everything
- Sharpe >2.0: suspicious
- Max drawdown <5%: almost certainly wrong
I built TradeSight (github.com/rmbell09-lang/tradesight) to run AI strategy tournaments and surface these issues automatically. Self-hosted, runs locally, no cloud needed.
Have you hit backtesting biases in your own projects? What did you find?
Top comments (0)