DEV Community

Cover image for Solving XOR without Backpropagation: A Genetic Algorithm Approach 🧬
İbrahim SEZER
İbrahim SEZER

Posted on

Solving XOR without Backpropagation: A Genetic Algorithm Approach 🧬

We usually hear about Backpropagation and Gradient Descent when talking about training Neural Networks. They are the industry standards, the "calculus" way of learning.

But what if we took a step back and looked at how nature learns? 🌿

In this post, we are going to solve the classic XOR problem (a problem that a single neuron cannot solve) using a Genetic Algorithm (GA). We won't calculate a single gradient. Instead, we will simulate "survival of the fittest."

Let's breed some brains! 🧠

What is the XOR Problem?

Before we dive into the code, let's remember our target. The XOR (Exclusive OR) gate is a non-linear problem.

Image of XOR logic gate truth table

A simple linear line cannot separate these outputs. We need a Neural Network.

The Evolutionary Recipe

In a Genetic Algorithm, we treat our Neural Networks as "individuals" in a population.

  1. Fitness: How well does the network solve XOR?
  2. Selection: The best networks get to reproduce.
  3. Crossover: We mix the weights of two parents.
  4. Mutation: We add random noise to keep genetic diversity.

Let's look at the Python implementation.

(Note: This code assumes you have a basic NeuralNetwork class setup with a .forward() method and weight matrices W1, W2, etc.)

1. Determining "Fitness"

In nature, fitness might mean "speed" or "strength." In our code, fitness is the inverse of error. We want the Mean Squared Error (MSE) to be as low as possible.

Fitness

import numpy as np
from basic_neural_network import NeuralNetwork

def calculate_fitness(nn):
    """
    Measures the XOR performance of the network.
    The lower the error, the higher the Fitness (Score).
    """
    X_xor = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
    y_target = np.array([[0], [1], [1], [0]])

    predictions = nn.forward(X_xor)
    mse = np.mean((y_target - predictions) ** 2)  # Mean Squared Error

    # We add 1e-9 to avoid dividing by zero if we hit a perfect score
    return 1 / (mse + 1e-9)
Enter fullscreen mode Exit fullscreen mode
  1. Mutation: The Spark of InnovationIf we only bred the best existing networks, we would eventually get stuck. We need random "oopsies" to discover new, better weights. This is Mutation.We iterate through the weights ($W1, W2...$) and occasionally nudge them with random values.
def mutate(nn, mutation_rate=0.1, strength=0.5):
    """
    Adds random noise to the network's weights.
    """
    parameters = [nn.W1, nn.b1, nn.W2, nn.b2, nn.W3, nn.b3]

    for param in parameters:
        if np.random.rand() < mutation_rate:
            # Add random Gaussian noise
            noise = np.random.randn(*param.shape) * strength
            param += noise
Enter fullscreen mode Exit fullscreen mode
  1. Crossover: Breeding This is where the magic happens. We take two parent networks and create a child. In this simple implementation, the child's brain is literally the average of its parents.
def crossover(parent1, parent2):
    """
    Produces a child by taking the average of two parents' weights.
    """
    child = NeuralNetwork()
    # Averaging weights acts as a simple way to combine traits
    child.W1 = (parent1.W1 + parent2.W1) / 2
    child.b1 = (parent1.b1 + parent2.b1) / 2
    child.W2 = (parent1.W2 + parent2.W2) / 2
    child.b2 = (parent1.b2 + parent2.b2) / 2
    child.W3 = (parent1.W3 + parent2.W3) / 2
    child.b3 = (parent1.b3 + parent2.b3) / 2
    return child
Enter fullscreen mode Exit fullscreen mode

The Main Loop: Evolution in Action
Now we run the simulation. We are using Elitism here—meaning the top 5 performers move to the next generation automatically. This guarantees our AI never gets "dumber" across generations.

Here is the strategy:

  • Sort population by score.

  • Keep the top 5 (Elites).

  • Fill the rest of the population by breeding the top 20%.

  • Mutate the children.

  • Repeat!

# --- MAIN LOOP ---

# 1. Initial Population
population_size = 50
population = [NeuralNetwork() for _ in range(population_size)]
generation_count = 1000

print("Evolution starting...")

for gen in range(generation_count):
    # 2. Calculate scores for everyone and sort (Highest score first)
    population = sorted(population, key=lambda x: calculate_fitness(x), reverse=True)

    best_individual = population[0]
    best_error = 1 / calculate_fitness(best_individual)

    if gen % 100 == 0:
        print(f"Gen {gen}: Lowest Error (MSE): {best_error:.5f}")
        # Early exit if error is low enough
        if best_error < 0.001:
            print("Solution found!")
            break

    # 3. Create New Generation (Elitism + Crossover)
    new_population = []

    # Carry over the top 5 individuals directly (Elitism)
    new_population.extend(population[:5])

    # Fill the remaining 45 by breeding from the top performers
    while len(new_population) < population_size:
        # Select 2 parents randomly from the top 10 (top 20%)
        parent1 = population[np.random.randint(0, 10)]
        parent2 = population[np.random.randint(0, 10)]

        child = crossover(parent1, parent2)

        # High mutation rate helps explore the search space faster
        mutate(child, mutation_rate=0.3, strength=0.2)
        new_population.append(child)

    population = new_population
Enter fullscreen mode Exit fullscreen mode

The Result?
After running this, the "Best Individual" should have weights perfectly tuned to solve XOR without ever calculating a derivative!

# --- FINAL TEST ---
print("\n--- Training Complete ---")
X_test = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
result = population[0].forward(X_test)

print("XOR Inputs:")
print(X_test)
print("Trained Network Predictions:")
print(np.round(result, 3))
Enter fullscreen mode Exit fullscreen mode

Why use this over Backpropagation?
To be honest, for XOR, Backpropagation is faster. However, Genetic Algorithms shine in scenarios where:

The reward is sparse: (e.g., You only know if you won the game at the very end).

The problem is not differentiable: You can't calculate gradients on discrete logic or complex physical simulations.

Conclusion
We just simulated natural selection to teach a computer logic. If that isn't cool, I don't know what is!

Have you experimented with Evolutionary code? Let me know in the comments below! 👇

Happy coding! 🚀

Photo by Warren Umoh on Unsplash

Top comments (0)