Kang

Posted on Mar 30

What 180 Generations of Genetic Algorithm Trading Taught Me About Overfitting

#python #machinelearning #trading #opensource

What 180 Generations of Genetic Algorithm Trading Taught Me About Overfitting

I've been building an open-source genetic algorithm engine that evolves trading strategies. The idea is simple: instead of manually picking indicators and thresholds, let evolution find the optimal combination from 484 technical factors.

After 180 generations of evolution, here's what I learned.

The Setup

484 factors: RSI variants, MACD, volume patterns, order flow proxies, Bollinger derivatives, candlestick patterns, and more
Walk-forward validation: train/test split per generation — no peeking at future data
Multi-objective optimization: NSGA-III balancing return, drawdown, and turnover
Running on: 500 A-share stocks and 17 crypto pairs simultaneously

Each generation takes about 2 hours. Three engines run in parallel on a single machine, pure Python, zero cloud cost.

The Bug That Showed 34,000% Returns

Around generation 69, the engine produced a strategy claiming 34,889% annual returns with only 7.38% max drawdown. Sharpe ratio of 8.36.

If you've done any backtesting, you know these numbers are insane. I was skeptical but excited.

Then I found the bug: look-ahead bias.

My backtest engine was using the closing price of day T to make buy decisions on day T. In reality, you can't know the closing price until the market closes — you'd have to buy at next day's open.

After fixing it, the same strategy dropped to ~1,000% annualized. Still suspicious. I'm still investigating.

Three Invariant Guards I Built

After that experience, I added runtime assertions that catch logical errors during evolution:

1. No Look-Ahead (`assert_no_lookahead`)

def assert_no_lookahead(decision_day: int, data_day: int):
    """You can't use data from the future."""
    if data_day > decision_day:
        raise InvariantViolation(
            f"Look-ahead: decision at day {decision_day} "
            f"uses data from day {data_day}"
        )

2. Return Sanity Check (`assert_return_reasonable`)

def assert_return_reasonable(annual_return_pct: float):
    """No strategy returns 10,000% per year in real markets."""
    if annual_return_pct > 500:
        raise InvariantViolation(
            f"Unreasonable return: {annual_return_pct:.1f}%"
        )

3. Factor Output Range (`assert_factor_output_range`)

def assert_factor_output_range(values, name="factor"):
    """All factor outputs must be in [0, 1] or NaN."""
    if isinstance(values, (int, float)):
        if not (0.0 <= values <= 1.0) and not math.isnan(values):
            raise InvariantViolation(f"{name} out of range: {values}")
        return
    for i, v in enumerate(values):
        if not math.isnan(v) and not (0.0 <= v <= 1.0):
            raise InvariantViolation(f"{name}[{i}] out of range: {v}")

What Actually Matters in Factor-Based Evolution

After running 180 generations across multiple markets:

Walk-forward kills 90% of "good" strategies. Most GA-discovered strategies are overfitted garbage that only works on training data.
Volume factors consistently outperform price momentum. Across both crypto and equities, volume-based signals (accumulation/distribution, volume breakout, OBV divergence) persist through walk-forward better than pure price patterns.
Factor count doesn't equal edge. Going from 50 to 484 factors improved fitness by maybe 15%. The GA quickly learns to zero out irrelevant factors. Most runs converge to using 30-60 active factors.
The biggest risk is trusting your own backtest. Every non-trivial backtest has bugs. The question is whether you've found them yet.

Current Status

The engine is currently paper-trading on 16 crypto pairs via OKX (dry-run mode). Three evolution engines run 24/7 searching for better strategies.

I'm not publishing any DNA (strategy parameters) — those stay private. But the engine itself is fully open source: github.com/NeuZhou/finclaw

If you're doing evolutionary trading, I'd love to hear:

Do you use walk-forward or just train/test split?
How do you handle the "promising backtest, terrible live" gap?
What's your overfitting detection approach?

This is part of an ongoing experiment. I'll post updates as paper trading results come in — good or bad.

DEV Community

What 180 Generations of Genetic Algorithm Trading Taught Me About Overfitting

What 180 Generations of Genetic Algorithm Trading Taught Me About Overfitting

The Setup

The Bug That Showed 34,000% Returns

Three Invariant Guards I Built

1. No Look-Ahead (`assert_no_lookahead`)

2. Return Sanity Check (`assert_return_reasonable`)

3. Factor Output Range (`assert_factor_output_range`)

What Actually Matters in Factor-Based Evolution

Current Status

Top comments (0)

What 180 Generations of Genetic Algorithm Trading Taught Me About Overfitting

The Setup

The Bug That Showed 34,000% Returns

Three Invariant Guards I Built

1. No Look-Ahead (assert_no_lookahead)

2. Return Sanity Check (assert_return_reasonable)

3. Factor Output Range (assert_factor_output_range)

What Actually Matters in Factor-Based Evolution

Current Status

1. No Look-Ahead (`assert_no_lookahead`)

2. Return Sanity Check (`assert_return_reasonable`)

3. Factor Output Range (`assert_factor_output_range`)