DEV Community

Ray
Ray

Posted on

I Found 4 Backtesting Biases in My AI Trading Bot (83% Returns to Realistic)

My AI trading bot was showing 83% annualized returns in backtesting. I knew something was wrong.

No strategy consistently returns 83% annualized. So I audited my own backtest engine and found 4 distinct biases inflating results.

Bias 1: Signal-at-Close, Entry-at-Close (Lookahead Bias)

My original entry logic:

# WRONG
if signals[i] == 'BUY':
    entry_price = bars[i].close  # Using bar i's close for bar i's signal
Enter fullscreen mode Exit fullscreen mode

A buy signal at the close of bar i cannot be acted on until the open of bar i+1.

# CORRECT
if signals[i] == 'BUY':
    entry_price = bars[i + 1].open  # Enter at next bar's open
Enter fullscreen mode Exit fullscreen mode

This single fix knocked returns down substantially.

Bias 2: Monte Carlo Over Bars Instead of Trades

Shuffling bars destroys time series properties. The fix: shuffle trade P&L outcomes, not bars.

# CORRECT
trade_pnls = run_backtest(bars)
random.shuffle(trade_pnls)  # Bootstrap trades, not bars
portfolio_return = sum(trade_pnls)
Enter fullscreen mode Exit fullscreen mode

Bias 3: Survivorship Bias in Universe Selection

Testing on today's S&P 500 composition means testing on today's winners. Fix: point-in-time universe construction, or exclude stocks added in the last 18 months from historical backtests.

Bias 4: Parameter Fitting on Out-of-Sample Data

train_bars = bars[:int(len(bars) * 0.7)]
test_bars  = bars[int(len(bars) * 0.7):]
best_params = optimize(strategy, train_bars)  # Optimize on train only
test_returns = run_backtest(strategy, test_bars, best_params)  # Evaluate on test
Enter fullscreen mode Exit fullscreen mode

Results After Fixing All 4

Metric Before After
Annualized Return 83% 12-18%
Sharpe Ratio 4.2 0.9-1.3
Max Drawdown 4% 18-22%

Less exciting, but actually believable.

The Meta-Lesson

If your backtest shows exceptional returns, it's almost certainly wrong:

  • Returns >30% annualized: lookahead bias likely
  • Returns >50% annualized: multiple biases, audit everything
  • Sharpe >2.0: suspicious
  • Max drawdown <5%: almost certainly wrong

I built TradeSight (github.com/rmbell09-lang/tradesight) to run AI strategy tournaments and surface these issues automatically. Self-hosted, runs locally, no cloud needed.

Have you hit backtesting biases in your own projects? What did you find?

Top comments (0)