Static diagrams don't fail. Systems do.
That was the problem I kept running into when practicing system design. I'd draw boxes and arrows, convince myself the architecture was solid, and then an interviewer would ask "what happens to your read traffic if the cache goes down?" - and I'd be narrating from memory, not from evidence.
I wanted to watch my architecture break. So I built SysSimulator - a free browser-based tool that lets you simulate real traffic, inject chaos scenarios, and watch cascade failures in real time. No install. No signup. No backend required.
The interesting engineering decision was the foundation: the simulation engine is written in Rust, compiled to WebAssembly, and runs entirely in your browser.
Here's why, and what I learned.
Why not just use JavaScript?
The obvious choice for a browser-based simulator is JavaScript. It's already in the browser. You don't need a compilation step. Every tutorial on "build a simulation in the browser" uses it.
But simulation engines have a specific performance profile that JavaScript handles badly.
A discrete-event simulation (DES) processes thousands of events per second - request arrivals, processing completions, timeout triggers, state transitions. Each event modifies shared state (component queues, error counts, latency distributions) and may produce new events. At 100,000 RPS with 10 components, you're processing hundreds of thousands of state mutations per second.
JavaScript's garbage collector will pause the world mid-simulation. At high event rates, those pauses become visible - the particle animation stutters, the metrics bar freezes, the simulation loses time fidelity. It's not fatal for a toy, but it breaks the sense of "real" that makes the tool actually useful for building intuition.
Rust gives you:
- Deterministic memory management - no GC pauses, no stop-the-world
- Predictable performance - the simulation advances at wall-clock speed without hitches
- Zero-cost abstractions - rich type system and pattern matching with no runtime overhead
-
Direct WASM compilation via
wasm-packwith minimal boilerplate The tradeoff is compile time and complexity. It's worth it.
How the DES engine works
A discrete-event simulation has three core concepts:
1. Events - things that happen at a specific simulated time. In SysSimulator, events are things like:
RequestArrived { component_id, timestamp, request_id }ProcessingComplete { component_id, timestamp, latency_ms }ChaosInjected { scenario, target_component, severity }
2. The event queue - a priority queue ordered by timestamp. The engine always processes the earliest event first. This is what makes DES "discrete" - time jumps forward in steps, not continuously.
3. State - the current condition of every component. Queue depth, active connections, error rates, latency distributions. Each event reads and writes state.
The core loop in Rust is approximately:
pub fn step(&mut self) -> SimulationResult {
while let Some(event) = self.event_queue.pop() {
if event.timestamp > self.clock + self.tick_duration {
break;
}
self.clock = event.timestamp;
let new_events = self.process_event(&event);
for e in new_events {
self.event_queue.push(e);
}
self.update_metrics(&event);
}
self.collect_metrics()
}
The real complexity is in process_event - each component type (load balancer, cache, database, message queue) has its own behaviour model. A cache hit generates a fast response event. A cache miss cascades to a database read. A database under memory pressure starts dropping connections. The interactions are what make simulation genuinely useful.
Compiling Rust to WASM with wasm-pack
The compilation pipeline is simpler than I expected.
Cargo.toml:
[lib]
crate-type = ["cdylib"]
[dependencies]
wasm-bindgen = "0.2"
js-sys = "0.3"
serde = { version = "1.0", features = ["derive"] }
serde-wasm-bindgen = "0.6"
[profile.release]
opt-level = "s" # optimise for size, not speed
The opt-level = "s" is important - WASM bundles transferred over the network should be small. Size optimisation also tends to reduce instruction count, which helps in the WASM runtime.
Exposing functions to JavaScript:
use wasm_bindgen::prelude::*;
#[wasm_bindgen]
pub struct SimEngine {
state: SimulationState,
event_queue: BinaryHeap<SimEvent>,
clock: f64,
}
#[wasm_bindgen]
impl SimEngine {
#[wasm_bindgen(constructor)]
pub fn new(topology: JsValue) -> SimEngine {
let parsed: Topology = serde_wasm_bindgen::from_value(topology).unwrap();
SimEngine::from_topology(parsed)
}
pub fn step(&mut self, tick_ms: f64) -> JsValue {
let result = self.advance(tick_ms);
serde_wasm_bindgen::to_value(&result).unwrap()
}
pub fn inject_chaos(&mut self, scenario: JsValue) {
let chaos: ChaosScenario = serde_wasm_bindgen::from_value(scenario).unwrap();
self.apply_chaos(chaos);
}
}
Build command:
wasm-pack build --target bundler --release
This produces a pkg/ directory with:
-
syssimulator_bg.wasm- the compiled WASM binary -
syssimulator.js- the JS glue code generated by wasm-bindgen - TypeScript type definitions
The generated JS handles the memory bridge between JavaScript and WASM automatically. You call Rust functions like they're regular JS functions. The
serde-wasm-bindgencrate handles serialisation of complex types (the topology JSON, metrics output) across the boundary.
The WASM loading strategy
First load time is the main UX risk with WASM. The binary needs to be fetched, compiled, and instantiated before the simulation can run. On a slow connection this can be several seconds.
My approach:
1. Async loading with a visible loading state. The UI renders immediately from static HTML. The simulation controls are shown but disabled. A loading indicator shows "Initialising simulation engine..." so users know what's happening.
2. Streaming compilation. Modern browsers can compile WASM while it's still downloading via WebAssembly.instantiateStreaming. This is enabled automatically when you serve WASM with the correct Content-Type: application/wasm header. On Vercel, this is handled automatically.
3. Persistent caching. The WASM binary is served with a content-hash filename and long Cache-Control headers. After the first visit, subsequent loads are instant - the binary comes from the browser cache.
// The generated wasm-bindgen glue handles this, but conceptually:
const { instance } = await WebAssembly.instantiateStreaming(
fetch('/pkg/syssimulator_bg.wasm'),
importObject
);
Modelling the 18 component types
Every component in the simulator has a behaviour model that determines:
- Processing latency - a latency distribution (P50, P95, P99), derived from real-world measurements for that component type
- Concurrency limits - max simultaneous requests before queuing begins
- Failure modes - what happens under chaos injection (node crash, memory pressure, etc.) Here are a few interesting ones:
Cache (Redis model)
fn process_cache_request(&mut self, req: &Request) -> ProcessResult {
let hit = self.rng.gen::<f64>() < self.hit_rate;
if hit {
// Cache hit: fast response, no downstream call
ProcessResult::complete(req, self.hit_latency_dist.sample())
} else {
// Cache miss: forward to origin with cache miss overhead
let miss_latency = self.miss_overhead_dist.sample();
ProcessResult::forward_to_origin(req, miss_latency)
}
}
Under a cache stampede chaos scenario, the hit rate drops to near zero and hundreds of requests simultaneously hit the origin - which is where the cascade failure becomes visible in the simulation.
Load balancer (round-robin model)
fn route_request(&mut self, req: Request) -> Option<ComponentId> {
// Round-robin with health check
let healthy_backends: Vec<_> = self.backends
.iter()
.filter(|b| b.is_healthy())
.collect();
if healthy_backends.is_empty() {
return None; // All backends unhealthy - request fails
}
let idx = self.counter % healthy_backends.len();
self.counter += 1;
Some(healthy_backends[idx].id)
}
When you inject a node failure on one of the app servers, the load balancer's health check detects it and routes around it - but if enough backends fail, capacity drops and latency rises. This is the exact behaviour pattern that shows up in production incidents.
The chaos engine - 28 scenarios
The chaos system is separate from the simulation engine. Each scenario is a function that modifies component state:
pub fn apply_chaos(&mut self, scenario: ChaosScenario) {
match scenario.kind {
ChaosKind::NetworkPartition => {
// Drop all requests between two components
self.add_connection_filter(
scenario.source,
scenario.target,
ConnectionFilter::DropAll
);
},
ChaosKind::LatencyInjection { p50_ms, p99_ms } => {
// Add artificial latency distribution to a component
self.components[scenario.target]
.add_latency_overhead(LatencyDist::new(p50_ms, p99_ms));
},
ChaosKind::CacheStampede => {
// Force cache hit rate to near zero
if let Component::Cache(ref mut cache) = self.components[scenario.target] {
cache.override_hit_rate(0.02);
}
},
ChaosKind::NodeFailure => {
// Take component offline - load balancers detect and route around
self.components[scenario.target].set_health(ComponentHealth::Down);
},
// ... 24 more scenarios
}
}
The interesting design decision was making chaos composable. You can inject a network partition AND a memory pressure event simultaneously and watch the compounding failure. In production, incidents are rarely single-cause - this teaches engineers to think in terms of failure combinations.
AWS cost estimation
This was the feature I was most uncertain about including, and it turned out to be one of the most useful.
Every component maps to an AWS service and a pricing model:
pub fn estimate_monthly_cost(&self, topology: &Topology, rps: f64) -> CostBreakdown {
let mut compute = 0.0;
let mut storage = 0.0;
let mut networking = 0.0;
let mut requests = 0.0;
for component in &topology.components {
match component.kind {
ComponentKind::WebServer => {
// EC2 t3.medium equivalent based on configured throughput
let instance_count = (rps / component.throughput_limit).ceil();
compute += instance_count * EC2_T3_MEDIUM_HOURLY * 730.0;
},
ComponentKind::Serverless => {
// Lambda pricing: per-request + duration
let monthly_requests = rps * 86400.0 * 30.0;
requests += monthly_requests * LAMBDA_PER_REQUEST;
requests += monthly_requests * (component.avg_duration_ms / 1000.0)
* LAMBDA_PER_GB_SECOND * component.memory_gb;
},
ComponentKind::Database => {
// RDS db.t3.medium for the configured storage tier
compute += RDS_T3_MEDIUM_HOURLY * 730.0;
storage += component.storage_gb * RDS_STORAGE_PER_GB;
},
// ... other component types
}
}
CostBreakdown { compute, storage, networking, requests }
}
The numbers are rough-order estimates, not exact billing. But "adding 3 more app servers costs approximately $280/month" is the right answer to "why not just scale horizontally indefinitely?" - which is exactly the kind of cost-awareness question that separates senior engineers from mid-level in system design interviews.
What surprised me about WASM in production
The good:
- Performance exceeded expectations. At 100,000 simulated RPS with 10+ components, the engine advances simulation time faster than wall clock time - there's headroom to spare.
-
Debugging is better than expected.
wasm-pack test --chromeruns your Rust unit tests in an actual browser. Source maps work reasonably well with the right setup. - The memory model forced better design. Rust's ownership rules pushed me toward an architecture where simulation state is clearly separated from UI state. The resulting code is more correct.
The hard parts:
-
Serialisation overhead is real. Every call across the JS/WASM boundary that involves complex types goes through serialisation. Calling
step()60 times per second is fine. Passing large topology objects on every frame would not be. -
Error handling across the boundary is awkward. Rust's
Result<T, E>doesn't cross the boundary cleanly. I ended up encoding errors as optional fields in the return value rather than using WASM exceptions. - Bundle size management is ongoing. The WASM binary is currently ~280KB gzipped. Acceptable, but I'm tracking it.
Results — what the simulator shows that a whiteboard can't
When you inject a cache stampede on a 10,000 RPS e-commerce architecture, you see:
- Cache hit rate drops from 98% → 2%
- Database connections saturate within 400ms
- App server queue depth climbs until requests start timing out
- Error rate spikes from 0.1% → 34%
- P99 latency goes from 48ms → 2,400ms That sequence - and the ability to narrate exactly what happened and why - is what interviewers at FAANG are evaluating when they ask "what happens to your read traffic if the cache goes down?"
A static diagram cannot show you that. A simulator built on a proper DES engine can.
Try it
SysSimulator is free, runs in your browser, no account required.
57 architecture blueprints (e-commerce, chat, payment systems, Kafka pipelines, MCP AI agents), 28 chaos scenarios, real-time AWS cost estimation.
The source of the WASM simulation engine is something I'm considering open-sourcing - leave a comment if that's interesting to you.
What questions do you have about the Rust/WASM approach? Specifically curious if others have tackled the serialisation overhead problem differently - would love to compare notes.


Top comments (0)