Mr. 0x1

Posted on Dec 24, 2025

VOPR: The Multiverse Machine That Kills Production Bugs

#testing #distributedsystems #zig #programming

How deterministic simulation testing lets you control time, fork reality, and hunt heisenbugs like a cosmic bounty hunter.

TL;DR: Your tests pass because they live in one boring universe. VOPR-style testing explores thousands of universes (with simulated time + deterministic replay + fault injection), so those "only happens in prod" bugs can't hide anymore. We'll prove it by building a test harness that catches a bug in 17 universes that normal tests would never find.

What You'll Learn

By the end of this article, you'll understand:

Why many production bugs are actually schedule bugs
What deterministic simulation testing (DST) is
How VOPR (from TigerBeetle) "forks reality"
How "alternative universes" makes bugs reproducible
A working proof-of-concept you can run yourself

Cold Open: The Bug That Only Happens When You're Not Looking

There's a certain kind of production bug that isn't a bug.

It's a cryptid.

It appears only in production, usually at 3:07 AM, and it feeds exclusively on the emotional stability of whoever's on-call.

Unit tests pass ✅
Integration tests pass ✅
CI is green like it's powered by pure optimism ✅

Then production hits you with:

"lol no."

You try to reproduce it and the bug disappears like it heard you bought a camera.

This phenomenon has a name: the Heisenbug — a bug that seems to disappear or change behavior when you try to observe it. As the Ministry of Testing notes: "any attempt to observe or debug the issue could potentially change the behaviour of the application code."

So you do the traditional debugging ritual:

add logs
add retries
add timeouts
add more logs
sacrifice a Kubernetes pod

And the elders gather around the glowing terminal and say the ancient words:

"Works on my machine."

But what if your machine is the problem?

What if your test suite is green… because it only explores one timeline?

What if the bug lives in a different universe?

Welcome to VOPR: the multiverse machine that kills production bugs.

Why Time Is a Trash Goblin (And Your Tests Are Lying)

Most production-only bugs aren't logic bugs.

They're:

timing bugs
ordering bugs
concurrency bugs
retry bugs
timeout bugs
race conditions
scheduler gremlins

According to Wikipedia's article on race conditions:

"A race condition can be difficult to reproduce and debug because the end result is nondeterministic and depends on the relative timing between interfering threads. Problems of this nature can therefore disappear when running in debug mode, adding extra logging, or attaching a debugger."

Distributed systems don't just run your code. They run your code plus whatever order reality decides to deliver events in:

messages arrive late
packets reorder
disks stall
clocks drift
a leader election happens at the worst possible moment
your thread gets scheduled slightly differently

That means a lot of "random prod bugs" are actually:

Schedule bugs — they occur only when events line up in a rare order.

Your test suite runs one nice, stable ordering.

Production runs the entire multiverse.

Deterministic Simulation Testing (DST): The Big Reveal

Deterministic simulation testing is what happens when you stop begging reality to behave and start owning it.

Instead of running your system in the real world, you run it in a controlled simulation where:

time is simulated ✅
randomness is seeded ✅
events are scheduled deterministically ✅
faults are injected intentionally ✅
failures are replayable ✅

As Antithesis explains:

"In DST, some or all layers of the testing stack are made deterministic, including sources of non-determinism like clocks, thread interleaving, and system-provided sources of randomness."

If it fails once, it fails again the same way.

Already a game changer.

But DST gets weirder…

DST lets you fork reality.

The "Alternative Universes" Part (aka Forking Timelines)

At any moment in the simulation, you can:

Checkpoint the entire system state
Choose a different event ordering or fault
Run forward again

That creates alternate timelines:

Universe	What Happens
A	packet arrives before timeout
B	packet arrives after timeout
C	node crashes mid-commit
D	disk returns partial write
E	leader election happens during retry
F	scheduler wakes up cranky

Every universe is plausible.

And if a prod bug hides in one rare timeline…

DST will find it by exploring timelines systematically.

Instead of waiting months for cosmic rays to align, you generate the cursed timeline on purpose.

Enter VOPR: The Multiverse Machine

VOPR is TigerBeetle's deterministic simulator for distributed system testing.

Think: Chaos engineering + time travel + a microscope… but deterministic.

From TigerBeetle's safety documentation:

"TigerBeetle is tested in the VOPR — a simulated environment where an entire cluster, running real code, is subjected to all kinds of network, storage and process faults, at 1000x speed."

The name stands for Viewstamped Operation Replicator, and according to TigerBeetle's architecture docs, it was inspired by:

The movie WarGames
Years of fuzzing experience
Dropbox's Nucleus testing
FoundationDB's deterministic simulation testing

VOPR can:

delay messages
reorder messages
drop packets
crash nodes
restart mid-operation
corrupt disk writes
truncate writes
skew time
pause time
accelerate time

And crucially…

VOPR makes failures reproducible.

As Jepsen's analysis of TigerBeetle confirms:

"Tests which perform reproducible, pseudo-random operations against the system and ensure that some property holds."

It records the seed and the exact event schedule, so you can replay the failure perfectly.

It catches the bug. Puts it in a jar. Then shakes the jar.

Let's Prove It: Building Gremlin-DST

Enough theory. Let's build a DST harness that actually catches a bug that normal tests would miss.

We'll call it Gremlin-DST — a bank simulation where chaos gremlins inject faults, and we catch bugs by exploring thousands of universes.

copyleftdev / gremlin-dst

🧟 Deterministic Simulation Testing demo - A VOPR-style test harness that hunts bugs across 10,000 universes

🧟 Gremlin-DST: The Multiverse Bug Hunter

A Deterministic Simulation Testing (DST) proof-of-concept that demonstrates how VOPR-style testing finds bugs that hide in rare schedules.

"Normal tests run 1 universe. VOPR runs thousands."

"So those rare universes stop being rare. They become inevitable."

What This Demonstrates

This project is a companion to the article "VOPR: The Multiverse Machine That Kills Production Bugs" and showcases all the key DST concepts:

Concept	Implementation
Simulated Time	`clock.zig` - Deterministic clock that we control
Seeded Randomness	`prng.zig` - Reproducible RNG from a single seed
Deterministic Scheduling	`scheduler.zig` - Event queue with controlled ordering
Fault Injection	`gremlins.zig` - Chaos agents that attack the system
Invariant Checking	`bank.zig` - Laws that must hold across ALL universes
Multiverse Exploration	`main.zig` - Run 10,000 universes to find rare bugs

The Bug

The GremlinBank contains an intentional bug: a race condition in the transfer() function where a gremlin…

View on GitHub

The Setup: Gremlin Financial Services

Our "system under test" is a simple bank with:

5 accounts, each with $10,000
A transfer() function that moves money between accounts
An intentional bug: a race condition between debit and credit

// The bug: If a "gremlin" (simulated crash) attacks between 
// debiting the source and crediting the destination...
// money vanishes into the void.

// Debit source account
from_ptr.?.balance -= amount;

// THE BUG: Gremlin attacks here = money lost forever
if (self.bug_enabled and self.gremlins.maybeCrashNode()) {
    return .gremlin_interference;
}

// Credit destination account  
to_ptr.?.balance += amount;

This bug:

Passes unit tests (they don't trigger the race) ✅
Passes integration tests (same problem) ✅
Fails in production when the rare schedule occurs ❌

The Four Pillars of Our DST Harness

1. Simulated Clock (clock.zig)

As Java's Clock documentation states: "The main use case for this is in testing, where the fixed clock ensures tests are not dependent on the current clock."

pub const SimulatedClock = struct {
    current_time_ns: u64,

    pub fn advance(self: *SimulatedClock, duration_ns: u64) void {
        self.current_time_ns += duration_ns;
    }

    pub fn now(self: *const SimulatedClock) u64 {
        return self.current_time_ns;  // Deterministic!
    }
};

We control time like cosmic overlords. No more time.Now() — we decide when "now" is.

2. Seeded Randomness (prng.zig)

All chaos flows from a single seed. Same seed = same chaos = reproducible bugs.

pub const ReplayableRng = struct {
    seed: u64,  // Write this down when bugs appear!
    rng: std.Random.Xoshiro256,

    pub fn init(seed: u64) ReplayableRng {
        return .{
            .seed = seed,
            .rng = std.Random.Xoshiro256.init(seed),
        };
    }
};

If you can't replay a failing test, you don't have a failing test. You have a ghost story.

3. The Gremlin Horde (gremlins.zig)

Our fault injection engine. These little chaos agents represent everything that goes wrong in distributed systems:

Gremlin	What It Does
Network Delay	"Your packet? Stuck in the tubes."
Network Drop	"Your packet? I ate it."
Disk Fail	"Storage said 'no'."
Disk Corrupt	"Bits got scrambled."
Clock Skew	"Time is relative anyway."
Process Crash	"Segfault surprise!"

pub fn maybeCrashNode(self: *GremlinHorde) bool {
    return self.rng.chance(self.chaos_level * 0.02); // 2% at max chaos
}

In production, these gremlins attack randomly at 3 AM. In DST, we summon them deliberately.

4. Deterministic Scheduler (scheduler.zig)

Instead of letting the OS schedule things randomly, we control exactly when every event happens:

pub fn scheduleAt(self: *DeterministicScheduler, time_ns: u64, payload: EventPayload) !void {
    const event = Event{
        .id = self.next_event_id,
        .scheduled_time_ns = time_ns,
        .payload = payload,
    };
    try self.event_queue.add(event);
}

No thread timing. No vibes. Just deterministic ordering.

The Invariant: Laws of Physics

Instead of testing scenarios, we test physics — laws that must hold across ALL universes:

pub fn checkInvariants(self: *const GremlinBank) !void {
    var current_total: i64 = 0;

    // Invariant 1: No negative balances
    for (self.accounts) |account| {
        if (account.balance < 0) {
            return error.NegativeBalance;
        }
        current_total += account.balance;
    }

    // Invariant 2: Conservation of money
    // (total should NEVER change)
    if (current_total != self.initial_total_balance) {
        return error.MoneyConservationViolated;  // BUG FOUND!
    }
}

Running the Multiverse

Now we explore 10,000 universes, each with a different seed:

var seed: u64 = 42;
while (seed < 42 + 10_000) : (seed += 1) {
    const result = exploreUniverse(allocator, seed);

    if (!result.success) {
        // BUG FOUND! The seed makes it reproducible.
        print("Bug in Universe #{d}, Seed: {d}\n", .{seed, result.seed});
    }
}

The Results

GREMLIN-DST: The Multiverse Bug Hunter
Exploring 10000 universes to find bugs...

Exploring universes...

BUG FOUND in Universe #42!
   Seed: 42 (save this for replay!)
   Error: money_conservation_violated
   Expected balance: 50000
   Actual balance:   49298
   Money lost:       702

BUG FOUND in Universe #47!
   Seed: 47 (save this for replay!)
   Money lost: 594

BUG FOUND in Universe #50!
BUG FOUND in Universe #54!
BUG FOUND in Universe #58!

SIMULATION RESULTS

  Universes explored:    17
  Bugs found:            5
  Bug rate:              29.41%

  SEEDS FOR REPLAY:
     - 42
     - 47
     - 50
     - 54
     - 58

5 bugs found in just 17 universes.

The bug occurs in ~29% of universes at our chaos level. But a normal test suite? It runs one universe — probably a "nice" one where the gremlin doesn't attack mid-transfer.

That's why the bug survives to production.

Why This Kills Production Bugs

Many production bugs live in rare schedules. They only happen when:

A happens before B
a timeout fires during leadership change
a retry overlaps a stale response
a crash happens between two "uninterruptible" steps
a message is delayed just long enough

These bugs are basically:

"Only happens in 1 out of 10,000 universes."

Normal tests run 1 universe.

VOPR runs thousands.

So those rare universes stop being rare. They become inevitable.

And that's the point.

The 3 Superpowers of VOPR-Style Testing

1) Reproducibility

No more:

"can't reproduce"
"it's flaky"
"it passed when I re-ran it"
"CI is haunted"

With DST: Seed 42 always produces the same bug.

2) Controllability

Time becomes a lever
Randomness becomes an input
Scheduling becomes observable

3) Coverage Through Universes

Instead of testing a few happy paths, you test the system across a massive range of plausible realities.

As Amplify Partners notes:

"TigerBeetle is one of the most pioneering startups on the planet when it comes to DST."

What You Can Steal Today (Even Without VOPR)

You don't need to build TigerBeetle to use this mindset.

WarpStream reports that FoundationDB "spent 18 months building a deterministic simulation framework for their database before ever letting it write or read data from an actual physical disk."

You don't need 18 months. Here's a progression:

Level 1: Inject a Clock

Stop doing this in core logic:

time.Now()
time.Sleep(...)

Instead, create:

a Clock interface
a real clock in prod
a fake clock in tests

As one engineer notes: "Inject dependencies that introduce non-determinism (time, randomness, I/O). Benefit: 100% deterministic time-based tests."

Level 2: Seed and Log Randomness

If you use randomness anywhere:

seed it
log it
replay it

If you can't replay a failing test, you have a ghost story.

Level 3: Use a Deterministic Event Loop

Instead of thread timing, drive your system with:

an event queue
a deterministic scheduler
a simulated clock

Now your tests don't depend on "vibes."

Level 4: Multiverse Lite

Checkpoint state, then explore permutations:

different message orderings
different fault timings
different retry timing

This is the beginning of "forking reality."

The Mindset Shift: Stop Testing Scenarios, Start Testing Physics

This is where testing levels up.

Instead of writing tests like:

"Does it work in this situation?"

You write invariants like:

committed data is never lost
operations are idempotent
balances never go negative
total money in system never changes
state machines never enter invalid states
replicas converge
recovery never violates correctness

Then you throw the system into 10,000 universes.

If the laws hold: your system is robust.

If the laws break: you found a production bug before production did.

Try It Yourself

The complete Gremlin-DST proof-of-concept is available on GitHub:

# Clone and run with Zig
git clone https://github.com/copyleftdev/gremlin-dst.git
cd gremlin-dst
zig build run

# Or with Docker
docker build -t gremlin-dst .
docker run --rm gremlin-dst

The code demonstrates:

File	Concept
`clock.zig`	Simulated time
`prng.zig`	Seeded randomness
`scheduler.zig`	Deterministic event scheduling
`gremlins.zig`	Fault injection
`bank.zig`	System under test + invariants
`main.zig`	Multiverse explorer

Final Words from the Graybeard Cave

Production bugs survive in the dark.

They survive in rare schedules.

They survive in weird timing.

VOPR-style deterministic simulation testing turns on the lights.

And once you can fork reality…

The bug has nowhere left to hide.

Let's Discuss

Have you ever had a bug that:

only appeared in production?
vanished when you added logging?
depended on weird timing or retries?

What's the worst "scheduler gremlin" story you've got?

DEV Community