How deterministic simulation testing lets you control time, fork reality, and hunt heisenbugs like a cosmic bounty hunter.
TL;DR: Your tests pass because they live in one boring universe. VOPR-style testing explores thousands of universes (with simulated time + deterministic replay + fault injection), so those "only happens in prod" bugs can't hide anymore. We'll prove it by building a test harness that catches a bug in 17 universes that normal tests would never find.
What You'll Learn
By the end of this article, you'll understand:
- Why many production bugs are actually schedule bugs
- What deterministic simulation testing (DST) is
- How VOPR (from TigerBeetle) "forks reality"
- How "alternative universes" makes bugs reproducible
- A working proof-of-concept you can run yourself
Cold Open: The Bug That Only Happens When You're Not Looking
There's a certain kind of production bug that isn't a bug.
It's a cryptid.
It appears only in production, usually at 3:07 AM, and it feeds exclusively on the emotional stability of whoever's on-call.
- Unit tests pass ✅
- Integration tests pass ✅
- CI is green like it's powered by pure optimism ✅
Then production hits you with:
"lol no."
You try to reproduce it and the bug disappears like it heard you bought a camera.
This phenomenon has a name: the Heisenbug — a bug that seems to disappear or change behavior when you try to observe it. As the Ministry of Testing notes: "any attempt to observe or debug the issue could potentially change the behaviour of the application code."
So you do the traditional debugging ritual:
- add logs
- add retries
- add timeouts
- add more logs
- sacrifice a Kubernetes pod
And the elders gather around the glowing terminal and say the ancient words:
"Works on my machine."
But what if your machine is the problem?
What if your test suite is green… because it only explores one timeline?
What if the bug lives in a different universe?
Welcome to VOPR: the multiverse machine that kills production bugs.
Why Time Is a Trash Goblin (And Your Tests Are Lying)
Most production-only bugs aren't logic bugs.
They're:
- timing bugs
- ordering bugs
- concurrency bugs
- retry bugs
- timeout bugs
- race conditions
- scheduler gremlins
According to Wikipedia's article on race conditions:
"A race condition can be difficult to reproduce and debug because the end result is nondeterministic and depends on the relative timing between interfering threads. Problems of this nature can therefore disappear when running in debug mode, adding extra logging, or attaching a debugger."
Distributed systems don't just run your code. They run your code plus whatever order reality decides to deliver events in:
- messages arrive late
- packets reorder
- disks stall
- clocks drift
- a leader election happens at the worst possible moment
- your thread gets scheduled slightly differently
That means a lot of "random prod bugs" are actually:
Schedule bugs — they occur only when events line up in a rare order.
Your test suite runs one nice, stable ordering.
Production runs the entire multiverse.
Deterministic Simulation Testing (DST): The Big Reveal
Deterministic simulation testing is what happens when you stop begging reality to behave and start owning it.
Instead of running your system in the real world, you run it in a controlled simulation where:
- time is simulated ✅
- randomness is seeded ✅
- events are scheduled deterministically ✅
- faults are injected intentionally ✅
- failures are replayable ✅
"In DST, some or all layers of the testing stack are made deterministic, including sources of non-determinism like clocks, thread interleaving, and system-provided sources of randomness."
If it fails once, it fails again the same way.
Already a game changer.
But DST gets weirder…
DST lets you fork reality.
The "Alternative Universes" Part (aka Forking Timelines)
At any moment in the simulation, you can:
- Checkpoint the entire system state
- Choose a different event ordering or fault
- Run forward again
That creates alternate timelines:
| Universe | What Happens |
|---|---|
| A | packet arrives before timeout |
| B | packet arrives after timeout |
| C | node crashes mid-commit |
| D | disk returns partial write |
| E | leader election happens during retry |
| F | scheduler wakes up cranky |
Every universe is plausible.
And if a prod bug hides in one rare timeline…
DST will find it by exploring timelines systematically.
Instead of waiting months for cosmic rays to align, you generate the cursed timeline on purpose.
Enter VOPR: The Multiverse Machine
VOPR is TigerBeetle's deterministic simulator for distributed system testing.
Think: Chaos engineering + time travel + a microscope… but deterministic.
From TigerBeetle's safety documentation:
"TigerBeetle is tested in the VOPR — a simulated environment where an entire cluster, running real code, is subjected to all kinds of network, storage and process faults, at 1000x speed."
The name stands for Viewstamped Operation Replicator, and according to TigerBeetle's architecture docs, it was inspired by:
- The movie WarGames
- Years of fuzzing experience
- Dropbox's Nucleus testing
- FoundationDB's deterministic simulation testing
VOPR can:
- delay messages
- reorder messages
- drop packets
- crash nodes
- restart mid-operation
- corrupt disk writes
- truncate writes
- skew time
- pause time
- accelerate time
And crucially…
VOPR makes failures reproducible.
As Jepsen's analysis of TigerBeetle confirms:
"Tests which perform reproducible, pseudo-random operations against the system and ensure that some property holds."
It records the seed and the exact event schedule, so you can replay the failure perfectly.
It catches the bug. Puts it in a jar. Then shakes the jar.
Let's Prove It: Building Gremlin-DST
Enough theory. Let's build a DST harness that actually catches a bug that normal tests would miss.
We'll call it Gremlin-DST — a bank simulation where chaos gremlins inject faults, and we catch bugs by exploring thousands of universes.
copyleftdev
/
gremlin-dst
🧟 Deterministic Simulation Testing demo - A VOPR-style test harness that hunts bugs across 10,000 universes
🧟 Gremlin-DST: The Multiverse Bug Hunter
A Deterministic Simulation Testing (DST) proof-of-concept that demonstrates how VOPR-style testing finds bugs that hide in rare schedules.
"Normal tests run 1 universe. VOPR runs thousands."
"So those rare universes stop being rare. They become inevitable."
What This Demonstrates
This project is a companion to the article "VOPR: The Multiverse Machine That Kills Production Bugs" and showcases all the key DST concepts:
| Concept | Implementation |
|---|---|
| Simulated Time |
clock.zig - Deterministic clock that we control |
| Seeded Randomness |
prng.zig - Reproducible RNG from a single seed |
| Deterministic Scheduling |
scheduler.zig - Event queue with controlled ordering |
| Fault Injection |
gremlins.zig - Chaos agents that attack the system |
| Invariant Checking |
bank.zig - Laws that must hold across ALL universes |
| Multiverse Exploration |
main.zig - Run 10,000 universes to find rare bugs |
The Bug
The GremlinBank contains an intentional bug: a race condition in the transfer() function where a gremlin…
The Setup: Gremlin Financial Services
Our "system under test" is a simple bank with:
- 5 accounts, each with $10,000
- A
transfer()function that moves money between accounts - An intentional bug: a race condition between debit and credit
// The bug: If a "gremlin" (simulated crash) attacks between
// debiting the source and crediting the destination...
// money vanishes into the void.
// Debit source account
from_ptr.?.balance -= amount;
// THE BUG: Gremlin attacks here = money lost forever
if (self.bug_enabled and self.gremlins.maybeCrashNode()) {
return .gremlin_interference;
}
// Credit destination account
to_ptr.?.balance += amount;
This bug:
- Passes unit tests (they don't trigger the race) ✅
- Passes integration tests (same problem) ✅
- Fails in production when the rare schedule occurs ❌
The Four Pillars of Our DST Harness
1. Simulated Clock (clock.zig)
As Java's Clock documentation states: "The main use case for this is in testing, where the fixed clock ensures tests are not dependent on the current clock."
pub const SimulatedClock = struct {
current_time_ns: u64,
pub fn advance(self: *SimulatedClock, duration_ns: u64) void {
self.current_time_ns += duration_ns;
}
pub fn now(self: *const SimulatedClock) u64 {
return self.current_time_ns; // Deterministic!
}
};
We control time like cosmic overlords. No more time.Now() — we decide when "now" is.
2. Seeded Randomness (prng.zig)
All chaos flows from a single seed. Same seed = same chaos = reproducible bugs.
pub const ReplayableRng = struct {
seed: u64, // Write this down when bugs appear!
rng: std.Random.Xoshiro256,
pub fn init(seed: u64) ReplayableRng {
return .{
.seed = seed,
.rng = std.Random.Xoshiro256.init(seed),
};
}
};
If you can't replay a failing test, you don't have a failing test. You have a ghost story.
3. The Gremlin Horde (gremlins.zig)
Our fault injection engine. These little chaos agents represent everything that goes wrong in distributed systems:
| Gremlin | What It Does |
|---|---|
| Network Delay | "Your packet? Stuck in the tubes." |
| Network Drop | "Your packet? I ate it." |
| Disk Fail | "Storage said 'no'." |
| Disk Corrupt | "Bits got scrambled." |
| Clock Skew | "Time is relative anyway." |
| Process Crash | "Segfault surprise!" |
pub fn maybeCrashNode(self: *GremlinHorde) bool {
return self.rng.chance(self.chaos_level * 0.02); // 2% at max chaos
}
In production, these gremlins attack randomly at 3 AM. In DST, we summon them deliberately.
4. Deterministic Scheduler (scheduler.zig)
Instead of letting the OS schedule things randomly, we control exactly when every event happens:
pub fn scheduleAt(self: *DeterministicScheduler, time_ns: u64, payload: EventPayload) !void {
const event = Event{
.id = self.next_event_id,
.scheduled_time_ns = time_ns,
.payload = payload,
};
try self.event_queue.add(event);
}
No thread timing. No vibes. Just deterministic ordering.
The Invariant: Laws of Physics
Instead of testing scenarios, we test physics — laws that must hold across ALL universes:
pub fn checkInvariants(self: *const GremlinBank) !void {
var current_total: i64 = 0;
// Invariant 1: No negative balances
for (self.accounts) |account| {
if (account.balance < 0) {
return error.NegativeBalance;
}
current_total += account.balance;
}
// Invariant 2: Conservation of money
// (total should NEVER change)
if (current_total != self.initial_total_balance) {
return error.MoneyConservationViolated; // BUG FOUND!
}
}
Running the Multiverse
Now we explore 10,000 universes, each with a different seed:
var seed: u64 = 42;
while (seed < 42 + 10_000) : (seed += 1) {
const result = exploreUniverse(allocator, seed);
if (!result.success) {
// BUG FOUND! The seed makes it reproducible.
print("Bug in Universe #{d}, Seed: {d}\n", .{seed, result.seed});
}
}
The Results
GREMLIN-DST: The Multiverse Bug Hunter
Exploring 10000 universes to find bugs...
Exploring universes...
BUG FOUND in Universe #42!
Seed: 42 (save this for replay!)
Error: money_conservation_violated
Expected balance: 50000
Actual balance: 49298
Money lost: 702
BUG FOUND in Universe #47!
Seed: 47 (save this for replay!)
Money lost: 594
BUG FOUND in Universe #50!
BUG FOUND in Universe #54!
BUG FOUND in Universe #58!
SIMULATION RESULTS
Universes explored: 17
Bugs found: 5
Bug rate: 29.41%
SEEDS FOR REPLAY:
- 42
- 47
- 50
- 54
- 58
5 bugs found in just 17 universes.
The bug occurs in ~29% of universes at our chaos level. But a normal test suite? It runs one universe — probably a "nice" one where the gremlin doesn't attack mid-transfer.
That's why the bug survives to production.
Why This Kills Production Bugs
Many production bugs live in rare schedules. They only happen when:
- A happens before B
- a timeout fires during leadership change
- a retry overlaps a stale response
- a crash happens between two "uninterruptible" steps
- a message is delayed just long enough
These bugs are basically:
"Only happens in 1 out of 10,000 universes."
Normal tests run 1 universe.
VOPR runs thousands.
So those rare universes stop being rare. They become inevitable.
And that's the point.
The 3 Superpowers of VOPR-Style Testing
1) Reproducibility
No more:
- "can't reproduce"
- "it's flaky"
- "it passed when I re-ran it"
- "CI is haunted"
With DST: Seed 42 always produces the same bug.
2) Controllability
- Time becomes a lever
- Randomness becomes an input
- Scheduling becomes observable
3) Coverage Through Universes
Instead of testing a few happy paths, you test the system across a massive range of plausible realities.
"TigerBeetle is one of the most pioneering startups on the planet when it comes to DST."
What You Can Steal Today (Even Without VOPR)
You don't need to build TigerBeetle to use this mindset.
WarpStream reports that FoundationDB "spent 18 months building a deterministic simulation framework for their database before ever letting it write or read data from an actual physical disk."
You don't need 18 months. Here's a progression:
Level 1: Inject a Clock
Stop doing this in core logic:
time.Now()
time.Sleep(...)
Instead, create:
- a
Clockinterface - a real clock in prod
- a fake clock in tests
As one engineer notes: "Inject dependencies that introduce non-determinism (time, randomness, I/O). Benefit: 100% deterministic time-based tests."
Level 2: Seed and Log Randomness
If you use randomness anywhere:
- seed it
- log it
- replay it
If you can't replay a failing test, you have a ghost story.
Level 3: Use a Deterministic Event Loop
Instead of thread timing, drive your system with:
- an event queue
- a deterministic scheduler
- a simulated clock
Now your tests don't depend on "vibes."
Level 4: Multiverse Lite
Checkpoint state, then explore permutations:
- different message orderings
- different fault timings
- different retry timing
This is the beginning of "forking reality."
The Mindset Shift: Stop Testing Scenarios, Start Testing Physics
This is where testing levels up.
Instead of writing tests like:
"Does it work in this situation?"
You write invariants like:
- committed data is never lost
- operations are idempotent
- balances never go negative
- total money in system never changes
- state machines never enter invalid states
- replicas converge
- recovery never violates correctness
Then you throw the system into 10,000 universes.
If the laws hold: your system is robust.
If the laws break: you found a production bug before production did.
Try It Yourself
The complete Gremlin-DST proof-of-concept is available on GitHub:
# Clone and run with Zig
git clone https://github.com/copyleftdev/gremlin-dst.git
cd gremlin-dst
zig build run
# Or with Docker
docker build -t gremlin-dst .
docker run --rm gremlin-dst
The code demonstrates:
| File | Concept |
|---|---|
clock.zig |
Simulated time |
prng.zig |
Seeded randomness |
scheduler.zig |
Deterministic event scheduling |
gremlins.zig |
Fault injection |
bank.zig |
System under test + invariants |
main.zig |
Multiverse explorer |
Final Words from the Graybeard Cave
Production bugs survive in the dark.
They survive in rare schedules.
They survive in weird timing.
VOPR-style deterministic simulation testing turns on the lights.
And once you can fork reality…
The bug has nowhere left to hide.
Further Reading
- TigerBeetle Safety Documentation — How VOPR works
- Jepsen Analysis of TigerBeetle — Independent verification
- Antithesis: What is DST? — DST explained
- FoundationDB Testing — The OG DST implementation
- Phil Eaton: What's the Big Deal About DST? — Excellent overview
- Amplify Partners: DST Primer — Technical deep-dive
- Awesome DST Repository — Curated resources
Let's Discuss
Have you ever had a bug that:
- only appeared in production?
- vanished when you added logging?
- depended on weird timing or retries?
What's the worst "scheduler gremlin" story you've got?

Top comments (0)