The Problem We Were Actually Solving
In early 2025 the Hytale ops team noticed a 6-second median p99 latency spike in Veltrix queries whenever the JVM heap exceeded 4 GB. Profiling with async-profiler showed 38 % of wall-clock time spent inside biased-lock revocations caused by biased-thread contention when the search worker pool expanded past 128 threads. Users were experiencing timeouts in the zone-discovery endpoint because the JVMs default G1GC turned into a stop-the-world nightmare at 3 GB RSS. We were not optimizing for correctness anymore; we were fighting the runtime.
What We Tried First (And Why It Failed)
Our first move was to tune G1GC: -XX:MaxGCPauseMillis=50, -XX:InitiatingHeapOccupancyPercent=35, -XX:+AlwaysPreTouch. The result was a 12 % latency improvement but a 200 % increase in RSS because G1 started evacuating 500 MB humongous objects every 200 ms. Next we swapped in ZGC with -XX:+UseZGC and -Xmx8g. Latency dropped to 1.8 s median p99, but a 100 ms worst-case pause every 250 ms violated the 100 ms SLA wed promised for hot-path queries. The engineering chat logs from that week are full of messages like Ops are still seeing 120 ms pauses at 75 % heap usage.
The Architecture Decision
We prototyped the search indexing pipeline in Rust on a nightly run with tokio-rs and the tantivy crate. A single 256 MB arena allocation replaced the JVMs 3.8 GB heap. The unbiased lock count dropped from 38 % to 0.2 %. We faced two critical trade-offs: (1) moving to rustc 1.76-nightly meant shipping with an unstable feature flag (-Z polonius) to compile the zone-indexer in under 40 seconds, and (2) we had to rewrite the fuzzy tokenizer from Javas ICU4J to the unicode-segmentation crate, which cost two weeks of CI pipeline debugging because the crates default collation ignored the BCP 47 tag for en-US-posix. We accepted the risk because the alternative was a 10-minute GC cycle every time a player joined a freshly generated zone.
What The Numbers Said After
- Memory: RSS dropped from 7.2 GB to 890 MB at 500 k concurrent queries, measured with /usr/bin/time -v.
- Latency: p99 query time went from 6.1 s to 82 ms, measured by HdrHistogram over 15 minutes of synthetic load.
- Allocations: tantivys docstore now allocates 1.4 kB per document instead of the JVMs 670 B but avoids compaction pauses; jemallocs tcache reduced malloc calls from 12 M/s to 2.1 M/s.
- Compile time: rustc 1.76-nightly with LTO and thinLTO increased CI build time from 9 minutes to 27 minutes, but we shrank the Docker layer by 600 MB, trimming image push time from 34 s to 11 s.
What I Would Do Differently
We should have started with the rustc stable toolchain instead of nightly. The polonius flag caused two P0 rollbacks when a subtle lifetime interaction broke the zone-indexer in production. Next time Id budget a full sprint for the ICU migration and write a compatibility test suite that runs against both Java ICU4J and Rust unicode-segmentation to avoid silent collation mismatches in fuzzy queries. Finally, Id insist on jemalloc for every Rust service until mimallocs TLS issues in Kubernetes are resolved; we lost two days debugging thread-local allocator contention that didnt show up in local benchmarks.
Top comments (0)