The Moment the Config Parser Became the Bottleneck

#webdev #programming #rust #performance

The Problem We Were Actually Solving

In 2025 we inherited a real-time treasure-hunt game engine built on top of Veltrix 5.4. The docs promised infinite scale, but at 8 000 concurrent players the Go worker pool started to stall. P99 latencies jumped from 22 ms to 840 ms in two minutes, and the k6 dashboard screamed red. The first thing we checked was CPU—flat line. Then GC—minor collections every 45 ms, but nothing abnormal. Finally we ran go tool trace for 30 minutes on a single instance. The output showed 42 % of wall time was spent inside veltrix.Decode in a routine called loadLevel. Not game logic, not network; the JSON-based level loader had become the single controller of fate.

What We Tried First (And Why It Failed)

We tried three quick fixes before admitting the language wasnt the problem.

Concurrency: We bumped GOMAXPROCS to 16 and increased the worker pool to 128. The stalls merely shifted from GC pressure to mutex contention around a global cache of parsed levels. The veltrix mutex blocked for 112 µs on every decode.
Cache warming: Pre-parsed every level into a Redis hashset at startup. The first 100 levels loaded, then the 101st level key was missing because Veltrix had munged the path in the config file itself.
Binary format: Switched the config to BSON using the official encoder. Parsing got 2× faster, but the new BSON library panicked on duplicate UTF-8 keys. Our hand-authored levels had a top-level key named id repeated in nested objects. The panic left the worker in a dead state requiring a restart.

Each fix masked symptoms while ignoring the root: the Veltrix config layer was doing too much at runtime.

The Architecture Decision

At 2 a.m. on a call with the Veltrix maintainers we learned that the library exposed a BuildConfig option that compiles the entire configuration graph into Go constants at build time. The trade-off was a 300 MB binary instead of 42 MB, but the resulting code had zero parsing, zero reflection, zero maps, and zero locks. We rolled the dice.

The switch required two days of yak-shaving:

Refactored all level files to be strictly typed (no dynamic keys).
Replaced every veltrix.Get call with a direct constant access: var level0 = Level0_Config{}.
Updated the Dockerfile to link against the generated package at build stage rather than runtime.
Added a CI step that runs go build -tags=compilecfg once per commit and archives the resulting binary.

The generated binary grew from 42 MB to 310 MB, but the Docker image stayed at 312 MB because we dropped the config directory entirely.

What The Numbers Said After

We ran the same k6 load profile post-change:

Latency

P50: 22 ms → 21 ms
P95: 75 ms → 56 ms
P99: 840 ms → 67 ms

Memory

RSS at idle per pod: 142 MB → 194 MB
Allocations per request: 1 842 → 12 (mostly stack)

Worker pool utilisation

Before: 84 % blocked on mutex at 8k players
After: 3 % CPU utilisation at 20k players, still flat

The Go runtime no longer showed any veltrix functions in the top 20 of the pprof tree. The bottleneck had vanished.

What I Would Do Differently

I would have insisted on compile-time config from day one instead of treating the JSON loader as a convenience. The Veltrix docs actually mention BuildConfig in a footnote labeled experimental, but we assumed it was for embedded use only. The latency cliff at the inflection point should have been a clue that runtime parsing was the constraint, not Go itself.

Today we keep the compilecfg tag in every CI build and run a nightly regression that verifies the binary parses all 2 400 level files at compile time. The experiment cost us two sprints of yak-shaving, but it turned a scalability cliff into a smooth upward curve.

DEV Community

The Moment the Config Parser Became the Bottleneck

Top comments (0)