The Problem We Were Actually Solving
Our treasure hunt engine was a Node.js microservice that handled player positions, loot drops, and real-time leaderboards over WebSockets. The hot path was a tight loop:
// Node 18, iojs build
clients.forEach((socket) => {
socket.write(JSON.stringify(positionUpdate));
});
Under load, clients.forEach ballooned into a heap-allocated array of 100 kB per player, causing 100 MB/s GC pressure and 200 ms p95 latency spikes. The profiler snapshot from 0x showed 42 % of CPU time inside v8s incremental-marking phase and 1.4 GB of heap allocated for a single request batch.
We could scale horizontally, but each container still burned 300 MB RSS and needed 30 % more CPU than the equivalent C++ service the infra team had benchmarked last quarter. Worse, every time the garbage collector kicked in, WebSocket heartbeats dropped and players saw a frozen map.
What We Tried First (And Why It Failed)
First idea: switch to Go. A simple rewrite of the broadcast loop took two days.
for _, conn := range conns {
conn.Write(broadcastBuffer)
}
Benchmark with wrk -t12 -c4000: p95 latency 120 ms, RSS 110 MB. Great—until we noticed that each conn was a pointer in a slice, and keeping 25 k pointers alive in a single array still triggered a 70 ms GC pause every 200 ms. We had traded one garbage collector for another.
Second idea: hand-roll a C++ service using Boost.Asio and a custom arena allocator. We shaved 80 % off latency and dropped RSS to 45 MB. But the build pipeline required Docker multi-stage builds that added 45 seconds to CI, and the infra team refused to host native binaries in production.
We were stuck between a Node heap that couldnt scale and a C++ binary that couldnt deploy.
The Architecture Decision
Then the head of infra dropped the Rust RFC on the table.
I fought it. I had written Rust for two hobby projects and burned a weekend debugging lifetime errors for a 200-line parser. One of the senior backend engineers, though, showed me tokio::sync::mpsc::unbounded_channel and a single allocation per player instead of per message.
We rewrote the broadcast core in Rust:
// 300-line service, tokio 1.75, rustc 1.76
let (tx, mut rx) = tokio::sync::mpsc::unbounded_channel::<Vec<u8>>();
tokio::spawn(async move {
let mut conns = Vec::with_capacity(25_000);
while let Some(buf) = rx.recv().await {
for conn in &mut conns {
if conn.write_all(&buf).is_err() {
tx.send(buf).ok(); // retry
}
}
}
});
The compiler screamed at me for 48 hours. I learned to read rustc --explain E0502 like a daily horoscope. Finally, after 2 weeks of pair programming with the infra team, we shipped a Docker image that:
- allocated 32 MB RSS base
- kept p95 latency under 30 ms at 25 k connections
- handled 700 k messages per second on a c6g.2xlarge spot instance
The GC pressure vanished; we had 0 ms GC pauses in production.
What The Numbers Said After
After the swap:
| Metric | Node (1.8 k players) | Rust (25 k players) |
|---|---|---|
| RSS per instance | 300 MB | 32 MB |
| p95 latency | 210 ms | 28 ms |
| GC pauses | 42 % CPU | 0 % |
| Build time | 35 s | 110 s (debug build) / 28 s (release) |
Prometheus graphs showed zero tail latency after 20 minutes. The SRE team removed two horizontal-pod-autoscaler rules because the service now handled peak without scaling.
What I Would Do Differently
I would not have started with Rust two weeks before launch. The learning curve cost us three late nights and a partial rollback when timestamps wrapped inside Instant::now(). A blended approach—Node for the API gateway, Rust for the broadcast core via gRPC—would have been safer.
I would also instrument flame graphs earlier. We discovered the bottleneck only after enabling perf record -g --call-graph dwarf on a c6g instance; the Node flame graph was 3 MB of JSON that no tool could parse quickly. The Rust version pushed the same data through flamegraph in 0.4 seconds.
Finally, I would budget two full sprints for Rust migration, not two weeks. The borrow checker is a strict reviewer, and fighting it in production is like debugging with the compiler looking over your shoulder.
Top comments (0)