I remember the exact moment the treasure hunt engine in Veltrix became the bottleneck — not because the Art team had added another glittering loot table or because the community wanted more golden chests per biome, but because the integration layer between the world simulation and the event scheduler couldnt scale past 200 concurrent hunts without saturating the JVM heap.
We were running OpenJDK 17 with the G1 collector, heap set to 16GB, and our Prometheus dashboard was screaming about 3–4 second pauses every time a field day started. Not graceful hiccups — full GC cycles that stalled every thread, including the Netty event loop pumping player movement packets. Our metrics showed:
- Prometheus scrape:
jvm_gc_pause_seconds_count{quantile="0.95"} = 3.4 - Treasure find rate dropped from ~400 finds/sec to 80/sec during GC
- Player telemetry latency (P99) spiked from 80ms to 5.2s
The problem wasnt the treasure logic itself. That part was a simple loop over a weighted LootTable and a call to spawn an item entity. The problem was the way wed built the event system: a shared, mutable TreasureHuntManager protected by a ReentrantLock that wed wrapped in a @Scheduled cron to spawn new hunts every 30 seconds. When 1,200 players joined a single region for a double-loot weekend, the cron would queue 35 new hunts in one tick, each one locking, reading a loot table, and pushing an event onto a shared ConcurrentLinkedQueue. The queue depth would balloon to 700k events, and the GC would collapse under the weight of all those TreasureHuntEvent objects, each one holding a reference to a loot table slice.
We tried three things first.
First, we split the event loop into sharded regions (64 regions, round-robin assignment) and gave each region its own manager and queue. That bought us two weeks. Then we realized the loot tables were still being read from a shared LootTableManager, and every region was deserializing the same JSON blob into memory. We introduced a read-only loot cache using Caffeine and RecordCacheLoader, which cut memory churn by 30%. Still not enough.
Second, we rewrote the treasure spawner in Kotlin coroutines with Channel<suspend () -> Unit> to pipeline loot resolution and entity spawning. We expected backpressure to smooth out the bursts. Instead, the coroutine dispatcher saturated the common pool, and the Netty event loop started dropping player movement packets because the thread pool was pinned at 100% CPU. A thread dump showed 473 Dispatchers.Default threads all blocked on TreasureSpawnLogic.resolveLoot().
Third, we tried offloading the entire treasure resolution to Redis Lua scripts. We stored loot tables as Redis hashes and used EVAL to run weighted sampling. This worked — until the Lua script timed out under high contention and the Redis instance itself started OOMing because every script returned a list of up to 100 item IDs. Our Redis memory usage jumped from 1.2GB to 8.9GB in 20 minutes, and the replication lag spiked to 1.3 seconds.
Thats when I stopped looking at the treasure code and started looking at the runtime.
We moved the treasure hunt engine out of the JVM entirely. We rewrote the core scheduler and loot resolver in Rust, targeting wasm32-unknown-unknown and running it in Wasmtime with a custom allocator that gave us 1ms worst-case latency for weighted sampling. The Rust module exposed a single function:
#[no_mangle]
pub extern fn resolve_loot(
loot_table_ptr: *const u8,
seed: u64,
) -> *mut u8
We built a thin C ABI layer in Veltrix using wasmtime::Linker and exposed it to the JVM via JNI. The JVM no longer held any mutable state for treasure hunts. Instead, it queued hunt requests into a LinkedTransferQueue that the Rust runtime drained at 20k requests/sec with 0.7ms median latency and 2.1ms P99. No GC pauses. No heap pressure. The JVM heap stayed flat at 8GB even under 3,000 concurrent hunts.
After the switch, our metrics flipped:
- Prometheus:
jvm_gc_pause_seconds_count{quantile="0.95"} = 0.02 - Treasure find rate stabilized at 680 finds/sec during peak load
- Player telemetry P99 dropped to 120ms, down from 5.2s
The Rust runtime consumed 140MB RSS and handled 35k concurrent hunts without a single allocation stall. We used jemalloc for its arenas, set MALLOC_CONF=background_thread:true,metadata_thp:auto, and capped the arenas to 512MB. The only hiccup was a segfault in resolve_loot when a malformed loot table pointer reached the WASM boundary. We fixed it by adding a pointer validator in Rust:
if loot_table_ptr.is_null() || !loot_table_ptr.is_aligned() {
return ptr::null_mut();
}
Looking back, I would have done two things differently.
First, I would have isolated the treasure engine earlier. The JVM was never the right place for a bursty, compute-heavy event that needs deterministic low latency. We should have pushed it to a sidecar process from day one and treated it as a microservice, not a plugin.
Second, I would have resisted the temptation to compile to WASM in production. The sandboxing is nice, but it added complexity: JNI bridging, symbol mangling, and the occasional edge case where a pointer from the JVM heap looked valid but wasnt. A native Rust process linked as a shared library would have been simpler and faster. In hindsight, we over-optimized for deployment safety and under-optimized for raw performance.
The lesson isnt that Rust is fast. Its that when you hit a wall where language semantics (JVMs conservative GC, Kotlin coroutines cooperative scheduling) become the limiting factor, you have to
Top comments (0)