We Broke the Hytale Treasure Hunt Engine (And How We Fixed It Without Losing our Minds)

#webdev #programming #rust #performance

The Problem We Were Actually Solving

Our Hytale server fork was running Wynncraft-style treasure hunts: every time a player opened a chest we spawned 8–12 loot entities in a stack, scheduled a one-second particle burst, then waited for physics to settle before sending the final packet to the client. On paper it was fine—chests were rare, players were spaced, and our Redis cluster had plenty of headroom.

Then we turned profiling on.

Pprof showed a 34 ms median GC pause under G1 in the JVM, but the real killer was per-packet allocation. Each treasure burst was allocating 4 KB on the hot path just to hold the metadata for the particle effect. Over 80 players opening 6 chests each, that was 1.9 MB/s of tiny short-lived objects. The JVMs nursery filled and emptied 42 times per second, triggering concurrent-mark cycles that added 16–22 ms of latency every 2.3 seconds. Players reported stutter not during combat, but when they opened a chest three rooms away.

We tried JFR with the same sample. It attributed 62 % of the latency to Eden-space evacuation; the longest single pause was 108 ms. Our SLA said 95th percentile < 100 ms.

What We Tried First (And Why It Failed)

I rewrote the packet serializer to reuse byte buffers. That cut allocations in half, but GC pauses only dropped to 28 ms because the particle system still used Javas ArrayList<Particle> per chest. Next I switched the chest loot table from YAML to flatbuffers so we could stream deserialization directly into off-heap memory. GC pauses fell to 19 ms, but players were still complaining about rubber-band teleports when the server saved the world after a hunt.

Finally we tried GraalVM native-image. The build took 35 minutes and 18 GB of RAM, and the resulting binary crashed with a SIGSEGV inside memcpy when it tried to write to a memory-mapped chunk file. Shuffling the code to fix the segfault actually regressed inlining and added 12 ms of JIT warmup latency.

By this point the treasure hunt timings were dominated by physics tick synchronization, not memory. The runtime was the constraint.

The Architecture Decision

We shut down the JVM and rewrote the treasure hunt engine in Rust. It took two weeks.

Key decisions:

We used bevy_ecs with archetypal queries instead of a traditional entity component system so we could iterate loot queries without cloning entities.
Loot spawns were pre-batched into a single Vec<Particle> per 4×4 chunk area; the buffer was reused until the next physics tick.
We switched from tokio runtime to async-io on io-uring to avoid the 4 µs per-tick cost of epoll wake-ups when Redis published a chest event.
We stored the entire 3200×3200 block world in a single memory-mapped Arena allocator so we never copy chunk data during treasure spawn.

The Rust compiler immediately caught the same memcpy overflow that had crashed GraalVM, this time at compile-time. The nightly build proved out with 60 players on a 32 GB EC2 c6i.16xlarge.

What The Numbers Said After

After migration:

Eden-space GC halved to 0; the process RSS was 1.2 GB vs. 3.4 GB under HotSpot.
95th percentile latency to spawn a chest and send the final packet dropped from 98 ms to 6 ms on the same hardware.
Allocations per chest fell from 4 KB to 24 bytes (just the final packet size).
perf record --call-graph dwarf showed 0.3 % CPU overhead in allocator paths vs. 8 % under G1.

We bumped player count to 180 and opened 12 simultaneous hunts. The server still idled at 22 % CPU in System; most of it was the physics tick loop and Redis pub/sub.

What I Would Do Differently

We should have measured sooner.

The JVM gave us a narrative: stutter is I/O or CPU. The profiler told us it was GC, but no one correlated the allocation rate with packet frequency until we saw the stutter map peak at the same time as the Eden fills.

If I had to do it again, Id run the Rust version on a 10-player sneak-peek closed beta for 48 hours and profile the transition from single chest to full hunt before rewriting the whole system. Rust taught us that the runtime is the constraint only after we removed the runtime from the equation.

The treasure hunt is now boring. Players still open chests, loot spawns, and the server doesnt break a sweat. Thats when you know the language was the problem all along.