How to Differentiate Through a Simulation with Mutable State

#opensource #machinelearning #programming #python

If you've tried to differentiate through a physics simulation in JAX, you've hit this:

x = jnp.array([1.0, 2.0, 3.0])
x[0] = 10.0  # ERROR: JAX arrays are immutable

JAX requires pure functions. No mutation. No in-place updates.

But simulations are mutation. Particles have positions that change. Temperatures evolve. Pressures update. That's how physics works.

I kept hitting this wall. So I built a language that solves it.

The Wall

I work in computational science. I write simulations — time-stepping loops where state updates every iteration. When I need to optimize parameters, I need gradients through those simulations.

Here's what that looks like with today's tools:

JAX — Rewrite your entire simulation in a functional style. Every x += delta becomes x = x.at[i].set(...). Every loop becomes lax.scan. Months of refactoring. The code becomes unrecognizable.

Zygote (Julia) — Handles some mutation, but produces silent incorrect gradients on other patterns. You don't get an error. You get a wrong answer. Worse.

Hand-coded adjoint — Write 10,000 lines of backward-pass code maintained in parallel with your forward solver. Every bug exists twice.

LAMMPS / OpenFOAM / GROMACS — Industry-standard simulators. Black boxes. No gradients come out.

None of these options are acceptable. So I built a fifth one.

Meridian

A programming language where grad works through mutation:

fn simulate(k: Float64) -> Float64 {
    var x = 1.0
    var v = 0.0
    for step in 0..200 {
        let a = -k * x
        v += a * 0.01          // mutation
        x += v * 0.01          // mutation
    }
    x
}

fn main() -> Float64 {
    grad(simulate, wrt: k)(1.0)
}

This compiles. The gradient matches finite differences to 10⁻¹⁰.

No functional rewrite. No tape management. No special syntax for the mutable parts. You write your simulation naturally. You call grad. It works.

How It Actually Works

The trick is SSA transformation — a standard compiler technique, repurposed for automatic differentiation.

When you write:

var x = 1.0
x += delta
x *= 3.0

The compiler transforms it to Static Single Assignment form:

%x.0 = const 1.0
%x.1 = add %x.0, %delta
%x.2 = mul %x.1, const(3.0)

In SSA, there is no mutation. Every "update" creates a new variable. The compiler then applies reverse-mode AD on this SSA code, producing new code that computes gradients.

The key insight: because the AD output is code (not a runtime tape), you can differentiate the differentiated code. This means:

Second derivatives work: d²/dx²(sin(x)) correctly returns -sin(x)
Cross-variable nested grad works: differentiate forces with respect to potential parameters
Arbitrary nesting: differentiate as many times as you want

These are exactly the things that break in tape-based AD systems (PyTorch, the prototype I built first). The compiler approach fixes them by design.

What Broke Along the Way

I built a prototype first — a Python package with tape-based AD. It passed 75 out of 80 tests. But it failed on the most important pattern:

# The NNP force-matching pattern:
# Inner grad: forces = -d(energy)/d(positions)
# Outer grad: d(loss)/d(model_parameters)

def loss(weight):
    forces = grad(energy, wrt=positions)(pos, weight)  # inner
    return mse(forces, target)

grad(loss, wrt=weight)  # outer — BROKE

The tape couldn't see through the inner grad. The inner backward pass returned a plain number, disconnected from the outer tape. The gradient was wrong.

The compiler fixes this. The inner grad produces SSA code. The outer grad transforms that code. No information is lost. The result:

Test	Prototype	Compiler
d²/dx²(sin(x)) at x=1	0.540 (wrong)	-0.841 (correct)
d(force)/d(weight)	wrong	correct
MD loss through 10 steps	not testable	correct

93 Tests, 8 Domains

I validated the language specification with 8 simulated domain experts:

Neural network potentials — optimize force field parameters through MD
Topology optimization — structural design with density gradients
Climate science — calibrate cloud physics against satellite data
Robotics — differentiate through contact-rich simulation
Protein design — end-to-end differentiable sequence optimization
Inverse rendering — material estimation from photographs
CFD — shape optimization through Navier-Stokes
Teaching — a professor who wants to assign grad as homework

Each expert reviewed the spec twice. 16 reviews total. The compiler passes domain-specific tests for all 8.

Try It Right Now

git clone https://github.com/alfredopalconit/meridian.git
cd meridian

# Hello, Derivative: f(x) = x² + 3x + 1, f'(2) = 7
python3 compiler/main.py programs/01_hello_derivative.mer

# Grad through 100 timesteps of a pendulum simulation
python3 compiler/main.py programs/02_pendulum.mer

# Second derivative (broke the prototype, works in compiler)
python3 compiler/main.py programs/03_second_derivative.mer

# Cross-variable nested grad (the NNP pattern)
python3 compiler/main.py programs/04_cross_variable.mer

# Run all tests
python3 tests/test_mer_files.py

What's Next

This is Phase 0. Scalars only. Interpreted. No GPU. The architecture is proven — now it needs to become useful.

Milestone	Target
Tensor operations	Month 3
Native binary via Cranelift	Month 4
GPU via MLIR	Month 6
First external user runs a real simulation	Month 12

What I'm Looking For

I need computational scientists who have a simulation they can't differentiate. Not people who want a faster NumPy. People who have tried JAX and hit the mutation wall. People who maintain hand-coded adjoint solvers. People who gave up on getting gradients through their time-stepping loop.

If that's you — or you know someone — I'd love to hear what you'd try to differentiate first.

GitHub: github.com/alfredopalconit/meridian

Meridian is open source (MIT). Built from a blank page to a working compiler in one very long conversation with an AI. The spec, the prototype, the compiler, and all 93 tests are in the repo.