We usually hear about Backpropagation and Gradient Descent when talking about training Neural Networks. They are the industry standards, the "calculus" way of learning.
But what if we took a step back and looked at how nature learns? 🌿
In this post, we are going to solve the classic XOR problem (a problem that a single neuron cannot solve) using a Genetic Algorithm (GA). We won't calculate a single gradient. Instead, we will simulate "survival of the fittest."
Let's breed some brains! ðŸ§
What is the XOR Problem?
Before we dive into the code, let's remember our target. The XOR (Exclusive OR) gate is a non-linear problem.
A simple linear line cannot separate these outputs. We need a Neural Network.
The Evolutionary Recipe
In a Genetic Algorithm, we treat our Neural Networks as "individuals" in a population.
- Fitness: How well does the network solve XOR?
- Selection: The best networks get to reproduce.
- Crossover: We mix the weights of two parents.
- Mutation: We add random noise to keep genetic diversity.
Let's look at the Python implementation.
(Note: This code assumes you have a basic NeuralNetwork class setup with a .forward() method and weight matrices W1, W2, etc.)
1. Determining "Fitness"
In nature, fitness might mean "speed" or "strength." In our code, fitness is the inverse of error. We want the Mean Squared Error (MSE) to be as low as possible.
import numpy as np
from basic_neural_network import NeuralNetwork
def calculate_fitness(nn):
"""
Measures the XOR performance of the network.
The lower the error, the higher the Fitness (Score).
"""
X_xor = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y_target = np.array([[0], [1], [1], [0]])
predictions = nn.forward(X_xor)
mse = np.mean((y_target - predictions) ** 2) # Mean Squared Error
# We add 1e-9 to avoid dividing by zero if we hit a perfect score
return 1 / (mse + 1e-9)
- Mutation: The Spark of InnovationIf we only bred the best existing networks, we would eventually get stuck. We need random "oopsies" to discover new, better weights. This is Mutation.We iterate through the weights ($W1, W2...$) and occasionally nudge them with random values.
def mutate(nn, mutation_rate=0.1, strength=0.5):
"""
Adds random noise to the network's weights.
"""
parameters = [nn.W1, nn.b1, nn.W2, nn.b2, nn.W3, nn.b3]
for param in parameters:
if np.random.rand() < mutation_rate:
# Add random Gaussian noise
noise = np.random.randn(*param.shape) * strength
param += noise
- Crossover: Breeding This is where the magic happens. We take two parent networks and create a child. In this simple implementation, the child's brain is literally the average of its parents.
def crossover(parent1, parent2):
"""
Produces a child by taking the average of two parents' weights.
"""
child = NeuralNetwork()
# Averaging weights acts as a simple way to combine traits
child.W1 = (parent1.W1 + parent2.W1) / 2
child.b1 = (parent1.b1 + parent2.b1) / 2
child.W2 = (parent1.W2 + parent2.W2) / 2
child.b2 = (parent1.b2 + parent2.b2) / 2
child.W3 = (parent1.W3 + parent2.W3) / 2
child.b3 = (parent1.b3 + parent2.b3) / 2
return child
The Main Loop: Evolution in Action
Now we run the simulation. We are using Elitism here—meaning the top 5 performers move to the next generation automatically. This guarantees our AI never gets "dumber" across generations.
Here is the strategy:
Sort population by score.
Keep the top 5 (Elites).
Fill the rest of the population by breeding the top 20%.
Mutate the children.
Repeat!
# --- MAIN LOOP ---
# 1. Initial Population
population_size = 50
population = [NeuralNetwork() for _ in range(population_size)]
generation_count = 1000
print("Evolution starting...")
for gen in range(generation_count):
# 2. Calculate scores for everyone and sort (Highest score first)
population = sorted(population, key=lambda x: calculate_fitness(x), reverse=True)
best_individual = population[0]
best_error = 1 / calculate_fitness(best_individual)
if gen % 100 == 0:
print(f"Gen {gen}: Lowest Error (MSE): {best_error:.5f}")
# Early exit if error is low enough
if best_error < 0.001:
print("Solution found!")
break
# 3. Create New Generation (Elitism + Crossover)
new_population = []
# Carry over the top 5 individuals directly (Elitism)
new_population.extend(population[:5])
# Fill the remaining 45 by breeding from the top performers
while len(new_population) < population_size:
# Select 2 parents randomly from the top 10 (top 20%)
parent1 = population[np.random.randint(0, 10)]
parent2 = population[np.random.randint(0, 10)]
child = crossover(parent1, parent2)
# High mutation rate helps explore the search space faster
mutate(child, mutation_rate=0.3, strength=0.2)
new_population.append(child)
population = new_population
The Result?
After running this, the "Best Individual" should have weights perfectly tuned to solve XOR without ever calculating a derivative!
# --- FINAL TEST ---
print("\n--- Training Complete ---")
X_test = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
result = population[0].forward(X_test)
print("XOR Inputs:")
print(X_test)
print("Trained Network Predictions:")
print(np.round(result, 3))
Why use this over Backpropagation?
To be honest, for XOR, Backpropagation is faster. However, Genetic Algorithms shine in scenarios where:
The reward is sparse: (e.g., You only know if you won the game at the very end).
The problem is not differentiable: You can't calculate gradients on discrete logic or complex physical simulations.
Conclusion
We just simulated natural selection to teach a computer logic. If that isn't cool, I don't know what is!
Have you experimented with Evolutionary code? Let me know in the comments below! 👇
Happy coding! 🚀
Photo by Warren Umoh on Unsplash


Top comments (0)