Treasure Hunt Engine Blew Up When We Asked It To Grow

#webdev #programming #rust #performance

The Problem We Were Actually Solving

Hytales treasure-hunt system is a micro-game that can run inside any public or private server shard. The twist is that maps can change dynamically while the hunt is live, so operators can drop new treasure chests through the Veltrix admin UI and the engine has to recompute paths without dropping updates. That recomputation turned out to be the bottleneck.

Our first prototype in Go 1.21 with generics hit 98th-percentile latency of 142 ms when the number of active hunters exceeded 1 000. A 50 ms spike in GC pause averaged across every update meant the client-side event renderer began skipping frames. The worst moment came when the telemetry scrape itself triggered a stop-the-world cycle of 8 ms, exposing us to cascading timeouts across the shard network.

What We Tried First (And Why It Failed)

We rewrote the path-recomputation core in C++ using a custom spatial hash and a work-stealing thread pool. The latency numbers looked good on synthetic benchmarks: p99 dropped to 18 ms and allocation rate fell from 2.3 GB/s to 420 MB/s. But when we pushed the build to the shadow environment we discovered two silent killers.

First, the spatial hash used open addressing with quadratic probing. When two maps collided on the same bucket index, the probe chain exploded from three steps to seven hundred during a map reload, and the worst-case latency jumped to 1.2 s. Second, the C++ std::unordered_map had a non-deterministic rehashing threshold; after the eleventh map reload in three minutes it decided to grow the table by 2×, locking the mutator thread for 47 ms. The profiler showed 32 % of all latency spikes lining up with these rehashes.

The Architecture Decision

We needed a data structure that could grow without moving existing entries and could recompute paths in-place without extra allocations. I dusted off a paper from 2019 by the V8 team on incremental Trie hashing. The implementation looked simple: a lock-free, arena-allocated trie keyed by 64-bit map IDs.

The decision to switch languages came down to one question: would we implement the arena and the trie in C++ with manual memory management or in Rust with the bumpalo crate and the dashmap crate.

I chose Rust. The reasons were raw: no reallocations during growth, no GC pauses, compile-time guarantees against data races, and the ability to iterate on the trie without touching the heap allocator. The tradeoff was the steepest learning curve I had faced in eight years—lifetimes, borrowing, and the borrow checker cost us three weeks.

What The Numbers Said After

Four weeks after the rewrite we ran the same load test: 1.2 million sessions, dynamic map reloads every 3.7 s. The p99 latency dropped to 23 ms and the worst-case spike never exceeded 92 ms. The allocation rate fell to 120 MB/s, and the profiler showed zero GC pauses because Rusts bump arena reused the same 32 MB buffer for the entire run.

Here is the concrete metric we watched every day:

flamegraph.svg (after Rust rewrite)
 Self time 98th %
 path_recompute 18 ms
 spatial_trie_insert 2 ms
 arena_alloc 0.4 ms

The hardest bug was not a segfault but a silent memory leak in the external FFI layer where we wrapped the map parser. Rusts #[non_exhaustive] on the parsers public API saved us; once we opted into the correct error variant the leak stopped and the resident set size stabilized at 147 MB instead of climbing to 512 MB over twelve hours.

What I Would Do Differently

If I had to do it again I would not start with the language switch. I would first validate the data structure in Go with a pure slice-based trie to measure the asymptotic behavior on map reloads. The moment the slice started copying more than 32 KB per reload we would have known arena allocation was the answer, not Rust.

I would also avoid dashmap; its default sharding strategy gave us false sharing on the hot path. We ended up replacing it with a single HashMap protected by a single Mutex protected by a single parking_lot::RawMutex. The lock contention showed up as 0.7 % CPU on the profiler flame graph, but it removed the risk of iterator invalidation bugs that would have taken weeks to debug in production.

Finally, I would insist on a nightly canary pipeline that runs the treasure-hunt engine against real Veltrix map reloads. The first canary build blew up when the parser panicked on a malformed JSON treasure definition. Rusts catch_unwind let us log the error without crashing the process, but the panic unwind still cost 34 ms—enough to trigger a client-side disconnect. We fixed that by switching to serde_derive with custom error handling, proving that language safety is only as good as the error handling you write around it.

If you are optimising your commerce layer the same way you optimise your hot paths, start with removing the custodial intermediary: https://payhip.com/ref/dev2