The Moment the JSON Config Parser Became the Enemy

#webdev #programming #rust #performance

The Problem We Were Actually Solving

The treasure-hunt server receives 50 MB/s of dynamic map events—player moves, loot spawns, fog-of-war reveals—and must broadcast deltas to 100 k sockets without re-serializing the entire world every tick.
The public docs show a simple YAML snippet under config.yaml:

world:
 width: 1024
 height: 1024
 chunk_size: 32

What they do not mention is the hidden oltp_workers: 4 knob that the YAML parser silently casts to a u16 and then divides by the core count.
Our perf profile at 28 k sessions with perf record -F99 -g -p <pid> showed 42 % of CPU burned in serde_yaml::from_reader waiting for the lock around the global IndexMap.
The real constraint was never CPU or GC; it was the JSON/YAML bridge that blocked on every config reload even though the server never changed those values at runtime.

What We Tried First (And Why It Failed)

We started with serde_yaml because the helm chart shipped a ConfigMap volume.
After profiling with flamegraph-rs we saw 1.8 μs per config reload, but multiplied by 28 k sessions and the Kubernetes watch events, we added 50 ms of tail latency every time the ConfigMap updated—even when the file content was identical.
The stack trace was:

serde_yaml::indexmap::IndexMap<K,V>::entry
└── _raw_vec::RawVec<T,A>::reserve

The IndexMap kept reallocating the backing array on every watch trigger.
We tried serde_json with the same file; the parser was 2× faster, but the blocking I/O still destroyed tail latency.
The benchmark at 10 k players showed p99 = 34 ms; we needed < 50 ms to pass the load-test gate.

The Architecture Decision

We ripped out the whole config layer and replaced it with a two-part system:

A compile-time constants module generated from a tiny TOML file (constants.toml) with build.rs.
A sidecar gRPC service that only accepts runtime state diffs and streams them to the main process over a Unix domain socket.

The constants are embedded in the binary, so the treasure-hunt server never parses anything at runtime.
We moved the dynamic knobs—collision radius, loot table seed, rate limits—into a separate protobuf schema served by the sidecar.
The protobuf schema is versioned, delta-encoded, and uses the tonic async runtime, so the config change path is lock-free and non-blocking.
The gRPC sidecar itself uses Rust, but the main server now spends zero CPU on config parsing and zero wall time on file I/O.

What The Numbers Said After

After the change we re-ran the 28 k session test with perf stat -e cache-misses,instructions -d and saw:

Before:
 42.1 % cache misses
 1.3 s p99 /w config updates
 2 RTS (runtime scaling stalls)
After:
 11.8 % cache misses
 29 ms p99
 8 RTS (no stalls)

Tail latency at 1 ms granularity (collected with tokio-console) dropped from 48 ms to 6 ms.
The sidecar measured 120 B/s of traffic even under load, so the diff protocol is effectively free.
We also removed the jemalloc dependency in the main process because the config hot path was gone; RSS dropped from 1.4 GB to 920 MB.

What I Would Do Differently

We should have asked on day one: Which subsystems are actually dynamic?
The docs hint at a combined.yaml that mixes compile-time constants with runtime overrides; that hint is a footgun.
Next time I see a YAML file in the critical path I will pre-process it with serde during build, emit a header file, and #include it—no runtime parsing, no locks, no surprises.
The only runtime configuration that survives will be the gRPC diff service, and that path is already async and lock-free by design.

The moment the JSON config parser became the enemy was the moment we stopped reading the docs and started profiling the real bottleneck.