Kang

Posted on Mar 26

I Let an Algorithm Evolve Trading Strategies for 127 Generations — Here's What Happened

#python #trading #algorithms #datascience

Generation 0 was garbage.

Thirty randomly generated trading strategies, each a different combination of RSI thresholds, MACD weights, stop-loss percentages, and holding periods. Most of them lost money. A few managed to break even by pure luck. The best one had a fitness score of 12.

I hit enter and went to make coffee.

What I Was Running

The setup: a genetic algorithm that breeds trading strategies. Each strategy is a "DNA" — a vector of ~40 numerical parameters that control how stocks are scored, when to buy, when to sell, how much risk to take. The algorithm mutates these numbers, backtests the mutations against 500 stocks of Chinese A-share market data, keeps the top performers, and repeats.

Every parameter is readable. Here's what the scoring weights look like — each one controls how much a particular signal matters:

# From actual StrategyDNA — 40+ built-in weights
w_momentum = 0.0842       # RSI + slope
w_mean_reversion = 0.1205  # RSI oversold signals
w_bollinger = 0.0974       # Bollinger Band position
w_macd = 0.0631            # MACD golden/death cross
w_kdj = 0.0518             # KDJ golden cross
w_obv = 0.0293             # On-Balance Volume trend
w_mfi = 0.0334             # Money Flow Index
w_cci = 0.0688             # Commodity Channel Index
w_atr = 0.0412             # Volatility (Average True Range)
# ... plus ~30 more dimensions

Each generation: mutate these numbers by ±10-30%, backtest on a 70/30 walk-forward split, compute fitness, keep the top 5, breed the next 30.

Simple in concept. What happened over 127 generations was not simple.

Gen 0 → Gen 20: The Drunk Walk Phase

The first 20 generations were the algorithm flailing. Strategies would get lucky with one parameter combination, then their children would mutate in the wrong direction and die. Fitness scores bounced between 10 and 80 with no clear trend.

The algorithm was doing what evolution always does at the start: exploring. Trying wild combinations. RSI buy threshold at 10? Sure, let's try it. Hold period of 19 days? Why not. Stop-loss at 0.5%? That'll get stopped out every single day, but the algorithm doesn't know that yet.

Around generation 15, something shifted. A strategy stumbled onto a combination that worked: short holding periods (3-5 days), moderate stop-losses (2-3%), and heavy weighting on mean reversion signals. The fitness jumped to 200.

Its children immediately started inheriting those traits.

Gen 20 → Gen 50: Convergence

This is where it got interesting. The population started converging on a "body plan" — like how mammals all share the same basic skeleton even though a mouse and a whale look nothing alike.

Every top strategy had:

Holding periods between 3 and 7 days
Stop-losses between 2% and 4%
w_mean_reversion consistently in the top 3 weights
w_momentum as a secondary signal
Most fundamental factor weights (PE, PB, ROE) near zero

That last point surprised me. I'd given the algorithm access to fundamental data — price-to-earnings, return on equity, revenue growth — and it basically ignored all of it. After thinking about it, this makes sense. Fundamental data changes quarterly. If you're holding stocks for 3-5 days, PE ratios are noise.

The algorithm figured this out by itself. It didn't "know" that fundamentals are slow-moving. It just found that giving them weight didn't improve fitness, so evolution zeroed them out.

Fitness climbed to around 1,500 by generation 50.

Gen 50 → Gen 89: The Plateau and Breakthrough

Generations 50 through 75 were painful to watch. Fitness barely moved. The algorithm was stuck — trapped in what optimization people call a local optimum. Every small mutation made things slightly worse.

I'd built in a stagnation detector for exactly this situation. After 15 generations of no improvement, the engine replaces the two worst strategies in the population with completely random DNA. Fresh genetic material. It's the algorithmic equivalent of a new species arriving on an island.

This kicked in twice. The first injection did nothing useful — the random strategies were terrible and got eliminated quickly. The second injection, around generation 72, brought in a DNA with an unusually high weight on w_bollinger (Bollinger Bands position) and w_support (support/resistance proximity).

One of its children combined that high Bollinger weight with the existing mean-reversion body plan.

Generation 89 hit fitness 3,253.

Annual return: 3,060%. Sharpe ratio: 6.36. Max drawdown: 18.4%.

I stared at the numbers for a while.

What Gen 89 Actually Discovered

Here's what the best strategy looked like once I decoded the DNA:

The scoring was dominated by mean reversion + volatility signals. Bollinger Band position (how close to the lower band), CCI (oversold readings), and mean reversion together made up about 35% of the scoring weight. The strategy was primarily a "buy the oversold dip in trending stocks" system.

Trend confirmation was a gatekeeper. The w_trend and w_r_squared weights were moderate — not dominant, but consistently non-zero. The strategy wouldn't buy a dip unless there was evidence of an underlying uptrend (high R² on the regression line, positive slope). It was buying temporary pullbacks in structurally bullish stocks.

Volume didn't matter much. w_volume and w_volume_profile were both below 0.02 — nearly zero. The algorithm decided that volume was mostly noise for short-term A-share trading. This goes against a lot of conventional technical analysis wisdom.

The stop-loss was tight. 2.8%. Get out fast if it goes against you. But the take-profit was wide: 18.7%. Asymmetric risk-reward. Classic momentum/mean-reversion behavior, but the algorithm found the exact numbers through brute-force exploration.

Gen 89 → Gen 127: Diminishing Returns

After the peak at generation 89, improvements got marginal. The algorithm continued to refine parameters — shaving half a point off the stop-loss, tweaking weight distributions by 0.01 — but the big discoveries were done.

By generation 127, fitness was still around 3,253. The algorithm had essentially converged. Every mutation that improved one metric hurt another.

This is actually a good sign. It means the strategy had found a robust region in parameter space, not a razor-thin edge that would collapse with any small change. Robustness matters more than raw performance in live trading.

Then I Got Greedy: The Crypto Experiment

Emboldened by the A-share results, I spun up the same engine on cryptocurrency data. Four coins: BTC, ETH, BNB, SOL.

Generation 74. Fitness: 291,623. Annual return: 25,300%. Sharpe: 13.37.

I'm not going to pretend those numbers are real.

Four coins is not a market. It's a hand-picked sample. The algorithm found a strategy that worked beautifully on those four specific assets during the backtest period. It's overfitted. The returns are meaningless without out-of-sample validation on assets the algorithm has never seen.

So I'm restarting the crypto engine with 17 coins — BTC, ETH, BNB, SOL, ADA, XRP, DOGE, AVAX, DOT, LINK, UNI, LTC, ATOM, FIL, OP, ARB, APT. Same algorithm, same DNA structure, but with enough diversity that the results might actually mean something.

I don't have results yet. That's the honest answer.

What Surprised Me

The thing I keep coming back to: the algorithm ignored fundamentals entirely. I gave it PE ratios, ROE, revenue growth, debt ratios — 14 different fundamental factors. After 127 generations, every single one had a weight near zero. The algorithm ran 3,800+ backtests and concluded that for 3-5 day holding periods, fundamentals are noise. I didn't decide that. The math did.

The other thing was how breakthroughs worked. Gradual mutation found good hills. Random DNA injection — the stagnation escape mechanism — found better hills. The gen 89 breakthrough came from random genetic material, not incremental improvement. That's humbling if you think about it.

Walk-forward validation turned out to be the single most important design choice. The 70/30 split with 60% validation weighting, plus the harsh 0.3x penalty when validation fitness drops below 30% of training fitness, killed thousands of overfitted strategies before they could contaminate the population. Without it, every result would have been fantasy.

And 127 generations? Took about 90 minutes. Each generation is 30 strategies tested across 500 stocks. You could run this overnight and have results with coffee the next morning. The bottleneck isn't compute — it's staring at the output trying to figure out if the numbers are real or if you fooled yourself again.

The Code

All of this runs on FinClaw, an open-source Python engine I'm building for exactly this kind of evolutionary strategy development. 484 factors across 33 categories, walk-forward validation, dynamic factor discovery where the system can even generate new factors during evolution.

It's a work in progress. The A-share side is solid. The crypto side needs the 17-coin rerun before I trust any numbers. Star it if you want to follow along.

If you want to run your own evolution, pip install finclaw and point it at your CSV data. The engine handles the rest.

DEV Community