I Found 4 Backtesting Biases in My AI Trading Bot (83% Returns to Realistic)

#trading #algorithms #finance #python

My AI trading bot was showing 83% annualized returns in backtesting. I knew something was wrong.

No strategy consistently returns 83% annualized. So I audited my own backtest engine and found 4 distinct biases inflating results.

Bias 1: Signal-at-Close, Entry-at-Close (Lookahead Bias)

My original entry logic:

# WRONG
if signals[i] == 'BUY':
    entry_price = bars[i].close  # Using bar i's close for bar i's signal

A buy signal at the close of bar i cannot be acted on until the open of bar i+1.

# CORRECT
if signals[i] == 'BUY':
    entry_price = bars[i + 1].open  # Enter at next bar's open

This single fix knocked returns down substantially.

Bias 2: Monte Carlo Over Bars Instead of Trades

Shuffling bars destroys time series properties. The fix: shuffle trade P&L outcomes, not bars.

# CORRECT
trade_pnls = run_backtest(bars)
random.shuffle(trade_pnls)  # Bootstrap trades, not bars
portfolio_return = sum(trade_pnls)

Bias 3: Survivorship Bias in Universe Selection

Testing on today's S&P 500 composition means testing on today's winners. Fix: point-in-time universe construction, or exclude stocks added in the last 18 months from historical backtests.

Bias 4: Parameter Fitting on Out-of-Sample Data

train_bars = bars[:int(len(bars) * 0.7)]
test_bars  = bars[int(len(bars) * 0.7):]
best_params = optimize(strategy, train_bars)  # Optimize on train only
test_returns = run_backtest(strategy, test_bars, best_params)  # Evaluate on test

Results After Fixing All 4

Metric	Before	After
Annualized Return	83%	12-18%
Sharpe Ratio	4.2	0.9-1.3
Max Drawdown	4%	18-22%

Less exciting, but actually believable.

The Meta-Lesson

If your backtest shows exceptional returns, it's almost certainly wrong:

Returns >30% annualized: lookahead bias likely
Returns >50% annualized: multiple biases, audit everything
Sharpe >2.0: suspicious
Max drawdown <5%: almost certainly wrong

I built TradeSight (github.com/rmbell09-lang/tradesight) to run AI strategy tournaments and surface these issues automatically. Self-hosted, runs locally, no cloud needed.

Have you hit backtesting biases in your own projects? What did you find?