DEV Community

Cover image for The Day We Discovered the Config Parser Was the Bottleneck Instead of the Game Logic
pretty ncube
pretty ncube

Posted on

The Day We Discovered the Config Parser Was the Bottleneck Instead of the Game Logic

The Problem We Were Actually Solving

In late 2025, Veltrix experienced a 5.2x player load spike during the holiday season without any visible change in the game logic. The load balancer showed 89% CPU saturation on the game servers, but the profiler reported only 11% CPU time inside the game simulation. Digging into the flame graph from perf, I found 67% of the time was spent in libyaml parsing user configuration files—specifically, thousands of leaderboard threshold files that looked like this:

thresholds:
 bronze: 100
 silver: 500
 gold: 2000
Enter fullscreen mode Exit fullscreen mode

The files were gzipped but still took 4.3 MB each after decompression. At peak load, each server was reading and parsing 47 of these files per second. The YAML parser was written in Go and used reflection under the hood, so every field lookup involved a hash table traversal. The Go runtimes GC would pause for 8–12 ms every 400 ms, exactly matching the latency spikes.

What We Tried First (And Why It Failed)

First, we tried gzip compression level 6 instead of 3. That saved 1.1 MB per file, but the parsing time remained the same. Next, we moved the files into Redis with a 5-minute TTL. Redis has a single-threaded event loop, so at 12000 QPS the network RTT plus Redis contention added 18 ms latency. Finally, we rewrote the parser in C using libfyaml and got the parsing down to 1.4 ms per file, but the GC pauses from Go persisted in the caller code. It was clear the language runtime was the real constraint.

The Architecture Decision

We migrated the configuration hot path to Rust, targeting wasm32-wasip1 so the same binary could run inside the Go host process as a plugin. We used serde_yaml with a custom visitor that deserialized only the fields we actually used: bronze, silver, gold. The resulting Wasm module was 72 KB and parsed a file in 0.12 ms on average. The GC pauses vanished. The Go host kept the existing filesystem and network layers, so the change was localized to the hot path.

What The Numbers Said After

Before:

  • Config parsing latency P99: 18 ms
  • GC pauses per minute: 150
  • RSS after 1 hour at 5000 players: 6.8 GB

After:

  • Config parsing latency P99: 0.42 ms
  • GC pauses per minute: 0
  • RSS after 1 hour at 5000 players: 3.2 GB

The game loop latency P99 improved from 24 ms to 9 ms, letting us scale to 26000 players before hitting the same saturation. Memory usage dropped because the Go runtimes arena allocator was no longer holding onto temporary strings.

What I Would Do Differently

I would not have tried to optimize the YAML itself before realizing the runtime was the bottleneck. Moving to C introduced memory safety risks that we mitigated with fuzz testing, but the Go host still had to marshal data across the boundary, costing us 15% throughput. A pure Rust service with gRPC instead of a Wasm plugin would have simplified deployment and removed the marshaling overhead, but we lacked the infra to run a second process per game server at the time.

The Wasm plugin approach bought us six months of headroom without rewriting the entire fleet, but Id choose a separate Rust service today if the infra team green-lit it.

Top comments (0)