The Day Our Game Server Choked on a 10,000-player Treasure Hunt

#webdev #programming #rust #performance

The Problem We Were Actually Solving

Our treasure hunt engine was a Node.js microservice that handled player positions, loot drops, and real-time leaderboards over WebSockets. The hot path was a tight loop:

// Node 18, iojs build
clients.forEach((socket) => {
 socket.write(JSON.stringify(positionUpdate));
});

Under load, clients.forEach ballooned into a heap-allocated array of 100 kB per player, causing 100 MB/s GC pressure and 200 ms p95 latency spikes. The profiler snapshot from 0x showed 42 % of CPU time inside v8s incremental-marking phase and 1.4 GB of heap allocated for a single request batch.

We could scale horizontally, but each container still burned 300 MB RSS and needed 30 % more CPU than the equivalent C++ service the infra team had benchmarked last quarter. Worse, every time the garbage collector kicked in, WebSocket heartbeats dropped and players saw a frozen map.

What We Tried First (And Why It Failed)

First idea: switch to Go. A simple rewrite of the broadcast loop took two days.

for _, conn := range conns {
 conn.Write(broadcastBuffer)
}

Benchmark with wrk -t12 -c4000: p95 latency 120 ms, RSS 110 MB. Great—until we noticed that each conn was a pointer in a slice, and keeping 25 k pointers alive in a single array still triggered a 70 ms GC pause every 200 ms. We had traded one garbage collector for another.

Second idea: hand-roll a C++ service using Boost.Asio and a custom arena allocator. We shaved 80 % off latency and dropped RSS to 45 MB. But the build pipeline required Docker multi-stage builds that added 45 seconds to CI, and the infra team refused to host native binaries in production.

We were stuck between a Node heap that couldnt scale and a C++ binary that couldnt deploy.

The Architecture Decision

Then the head of infra dropped the Rust RFC on the table.

I fought it. I had written Rust for two hobby projects and burned a weekend debugging lifetime errors for a 200-line parser. One of the senior backend engineers, though, showed me tokio::sync::mpsc::unbounded_channel and a single allocation per player instead of per message.

We rewrote the broadcast core in Rust:

// 300-line service, tokio 1.75, rustc 1.76
let (tx, mut rx) = tokio::sync::mpsc::unbounded_channel::<Vec<u8>>();
tokio::spawn(async move {
 let mut conns = Vec::with_capacity(25_000);
 while let Some(buf) = rx.recv().await {
 for conn in &mut conns {
 if conn.write_all(&buf).is_err() {
 tx.send(buf).ok(); // retry
 }
 }
 }
});

The compiler screamed at me for 48 hours. I learned to read rustc --explain E0502 like a daily horoscope. Finally, after 2 weeks of pair programming with the infra team, we shipped a Docker image that:

allocated 32 MB RSS base
kept p95 latency under 30 ms at 25 k connections
handled 700 k messages per second on a c6g.2xlarge spot instance

The GC pressure vanished; we had 0 ms GC pauses in production.

What The Numbers Said After

After the swap:

Metric	Node (1.8 k players)	Rust (25 k players)
RSS per instance	300 MB	32 MB
p95 latency	210 ms	28 ms
GC pauses	42 % CPU	0 %
Build time	35 s	110 s (debug build) / 28 s (release)

Prometheus graphs showed zero tail latency after 20 minutes. The SRE team removed two horizontal-pod-autoscaler rules because the service now handled peak without scaling.

What I Would Do Differently

I would not have started with Rust two weeks before launch. The learning curve cost us three late nights and a partial rollback when timestamps wrapped inside Instant::now(). A blended approach—Node for the API gateway, Rust for the broadcast core via gRPC—would have been safer.

I would also instrument flame graphs earlier. We discovered the bottleneck only after enabling perf record -g --call-graph dwarf on a c6g instance; the Node flame graph was 3 MB of JSON that no tool could parse quickly. The Rust version pushed the same data through flamegraph in 0.4 seconds.

Finally, I would budget two full sprints for Rust migration, not two weeks. The borrow checker is a strict reviewer, and fighting it in production is like debugging with the compiler looking over your shoulder.