What 180 Generations of Genetic Algorithm Trading Taught Me About Overfitting
I've been building an open-source genetic algorithm engine that evolves trading strategies. The idea is simple: instead of manually picking indicators and thresholds, let evolution find the optimal combination from 484 technical factors.
After 180 generations of evolution, here's what I learned.
The Setup
- 484 factors: RSI variants, MACD, volume patterns, order flow proxies, Bollinger derivatives, candlestick patterns, and more
- Walk-forward validation: train/test split per generation — no peeking at future data
- Multi-objective optimization: NSGA-III balancing return, drawdown, and turnover
- Running on: 500 A-share stocks and 17 crypto pairs simultaneously
Each generation takes about 2 hours. Three engines run in parallel on a single machine, pure Python, zero cloud cost.
The Bug That Showed 34,000% Returns
Around generation 69, the engine produced a strategy claiming 34,889% annual returns with only 7.38% max drawdown. Sharpe ratio of 8.36.
If you've done any backtesting, you know these numbers are insane. I was skeptical but excited.
Then I found the bug: look-ahead bias.
My backtest engine was using the closing price of day T to make buy decisions on day T. In reality, you can't know the closing price until the market closes — you'd have to buy at next day's open.
After fixing it, the same strategy dropped to ~1,000% annualized. Still suspicious. I'm still investigating.
Three Invariant Guards I Built
After that experience, I added runtime assertions that catch logical errors during evolution:
1. No Look-Ahead (assert_no_lookahead)
def assert_no_lookahead(decision_day: int, data_day: int):
"""You can't use data from the future."""
if data_day > decision_day:
raise InvariantViolation(
f"Look-ahead: decision at day {decision_day} "
f"uses data from day {data_day}"
)
2. Return Sanity Check (assert_return_reasonable)
def assert_return_reasonable(annual_return_pct: float):
"""No strategy returns 10,000% per year in real markets."""
if annual_return_pct > 500:
raise InvariantViolation(
f"Unreasonable return: {annual_return_pct:.1f}%"
)
3. Factor Output Range (assert_factor_output_range)
def assert_factor_output_range(values, name="factor"):
"""All factor outputs must be in [0, 1] or NaN."""
if isinstance(values, (int, float)):
if not (0.0 <= values <= 1.0) and not math.isnan(values):
raise InvariantViolation(f"{name} out of range: {values}")
return
for i, v in enumerate(values):
if not math.isnan(v) and not (0.0 <= v <= 1.0):
raise InvariantViolation(f"{name}[{i}] out of range: {v}")
What Actually Matters in Factor-Based Evolution
After running 180 generations across multiple markets:
Walk-forward kills 90% of "good" strategies. Most GA-discovered strategies are overfitted garbage that only works on training data.
Volume factors consistently outperform price momentum. Across both crypto and equities, volume-based signals (accumulation/distribution, volume breakout, OBV divergence) persist through walk-forward better than pure price patterns.
Factor count doesn't equal edge. Going from 50 to 484 factors improved fitness by maybe 15%. The GA quickly learns to zero out irrelevant factors. Most runs converge to using 30-60 active factors.
The biggest risk is trusting your own backtest. Every non-trivial backtest has bugs. The question is whether you've found them yet.
Current Status
The engine is currently paper-trading on 16 crypto pairs via OKX (dry-run mode). Three evolution engines run 24/7 searching for better strategies.
I'm not publishing any DNA (strategy parameters) — those stay private. But the engine itself is fully open source: github.com/NeuZhou/finclaw
If you're doing evolutionary trading, I'd love to hear:
- Do you use walk-forward or just train/test split?
- How do you handle the "promising backtest, terrible live" gap?
- What's your overfitting detection approach?
This is part of an ongoing experiment. I'll post updates as paper trading results come in — good or bad.
Top comments (0)