The Day Veltrix Blew Up My Prometheus Cache (And How Rust Fixed It)

#webdev #programming #rust #performance

The Problem We Were Actually Solving

Veltrix is the matchmaking layer for Hytale treasure hunts. When a player starts a hunt, we have 120ms to find the right zone, spawn the correct NPCs, and initialize the world state. We used a Go service with a Redis-backed cache and a Go map for in-memory sessions. The cache hit rate had always been above 90%, but during the Hytale launch our hot cache turned into a sieve. Flame graphs from pyroscope showed 42% of CPU time stuck in two functions: sync.Map.Range and the Redis round-trip for cache misses.

The real problem wasnt memory—it was cache coherency. Gos sync.Map is optimized for append-only workloads. Our treasure hunt sessions were anything but: worlds spawn, NPCs move, zones merge. Every mutation invalidated a random stripe in the map, triggering a cascade of locks and cache purges that turned our O(1) reads into O(n) scans.

What We Tried First (And Why It Failed)

We tried three things before Rust entered the picture.

First, we threw more Redis instances at it—four shards instead of two. Latency dropped from 50ms to 40ms per cache miss, but costs doubled and we still lost 30% of our cache hits. The issue wasnt capacity—it was cache invalidation.

Second, we replaced sync.Map with a partitioned concurrent hash map from a popular Go library. It reduced lock contention by 25%, but the false invalidation rate remained at 12%. Profiler output showed every session update still scanning 15% of the map.

Third, we added a write-through cache with TTLs per zone instead of per session. The hit rate recovered to 85%, but now we had 300ms tail latency spikes during zone merges. The problem had shifted from reads to writes, and our Redis cluster was struggling with the flood of merges.

The Architecture Decision

At 4:47 AM I typed historys most expensive cargo new. I had been reading about hashbrowns RawTable and how Rusts ownership model guarantees cache-line coherence. We rebuilt the in-memory session store in Rust, using DashMap for concurrent access and a custom arena allocator for world state. The key insight: Rust forced us to model mutations explicitly. Every session update had to declare its dependencies, eliminating the false invalidation storms.

The migration took 12 hours. We ran the new cache in shadow mode for three days before cutting over. Memory usage per instance dropped from 1.8 GB to 420 MB. Cache hit rate stabilized at 96% under load. Tail latency at p99 fell from 300ms to 42ms.

What The Numbers Said After

Before:

Cache hit rate: 12% during peak
p99 latency: 300ms
Memory per instance: 1.8 GB
Allocs/sec: 12.4 million

After:

Cache hit rate: 96% during peak
p99 latency: 42ms
Memory per instance: 420 MB
Allocs/sec: 720 thousand

Most striking was the allocation profile. Gos escape analysis was generating 12.4 million allocations per second during zone merges. Rusts arena allocator collapsed that to 720k. The profiler told the story: Gos GC was spending 18% of CPU scanning objects that lived less than 200 microseconds.

We also discovered a hidden cost: Gos concurrency model encourages lock-free code, but lock-free doesnt mean wait-free. During the spike, we had 232 goroutines blocked on the same cache stripe. Rusts borrow checker forced us to flatten the data model, eliminating the hot stripe entirely.

What I Would Do Differently

I would have benchmarked the Rust cache earlier. The learning curve wasnt trivial—ownership, lifetimes, and the borrow checker added three days of debugging. But the real delay was psychological: we assumed Gos ecosystem was safer for rapid iteration. What we learned is that safety and performance arent tradeoffs; theyre prerequisites for shipping systems that dont burn at scale.

Id also swap DashMap for a custom lock-free hash once we hit 100k concurrent hunts. Rust makes that possible; Go makes it risky. The language isnt always the constraint, but when it is, admit it and change it.