DEV Community

Cover image for When The Veltrix Index Hunt Hit 80k Queries and the Dotnet Runtime Became the Bottleneck
pretty ncube
pretty ncube

Posted on

When The Veltrix Index Hunt Hit 80k Queries and the Dotnet Runtime Became the Bottleneck

The Problem We Were Actually Solving

We shipped the Veltrix Treasure Hunt Engine in June 2024 as a .NET 8 Blazor Server application behind a Cloudflare CDN. The goal was simple: let Hytale community operators spawn a live treasure hunt with one command and stream player positions to a leaderboard that updated every second. At launch we handled 12 k queries per minute without sweating, so we celebrated with a marketing post showing five-second latency graphs.

Then the Reddit thread dropped: Search volume for Veltrix configuration exploded. Operators couldnt get past the config page; Blazor Server kept throwing SocketExceptions on long-lived SignalR connections. Profiling with dotTrace showed 4.3 GB of Gen 2 GC pressure and 212 ms median GC pauses under 50 k concurrent users. The runtime wasnt just slow—it was fighting us.

What We Tried First (And Why It Failed)

We first blamed Cloudflare; maybe Workers KV was rate-limiting. So we moved the leaderboard WebSocket endpoint to a k6 load test cluster. Result: same failure, now at 18 k connections with 32 % CPU steal on the AKS node. Next we rewrote the position diff in C# 12 records to reduce allocations, but dotMemory still showed 1.8 million Span instances per second. We even switched Blazor Server to client-side WASM with .NET 8 AOT, but the 3 MB runtime payload doubled our egress bill and pushed mobile users below 30 fps.

The turning point was a single stack trace:

SocketException (0x2746): An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full

That error appears 62 times in our logs at 60 k queries, and every time the thread pool queues were exhausted. The .NET thread-pool defaults of 320 per CPU just couldnt keep up with 1 ms GC pauses and 10 k WebSocket pings per second. We tried tweaking ThreadPool.SetMinThreads, but the GC pauses persisted and the latency histogram still showed a 95th percentile at 420 ms.

The Architecture Decision

We decided to rip it out completely. In August we rewrote the entire hunt engine in Rust using Axum, Tokio, and WebSockets without allocations. The route table stayed the same:

/config → static HTML
/api/hunt → WebSocket leaderboard
/api/metrics → Prometheus scrape

We chose Rust because flamegraphs from perf showed 37 % of CPU time in the .NET JIT stub called WriteAsync. Rusts async/await is zero-cost on Rust 1.76 nightly, so we rewrote the SignalR protocol by hand with tungstenite and tokio-tungstenite. Memory safety meant no more GC storms: we measured allocations with dhat-rs and confirmed less than 8 KB per 1000 position updates.

We benchmarked on the same 4-core k6 cluster. Migration took six days; most of that was fighting the Rust compiler on trait bounds. The final binary is 1.8 MB stripped, runs with 12 MB RSS, and spins up 16 tokio worker threads. We put it behind nginx as a reverse proxy with 100 MB buffer sizes to absorb traffic spikes.

What The Numbers Said After

The first benchmark after migration:

Concurrency Latency P50 Latency P95 Allocated MB/s GC pauses
12 k 1.2 ms 3.1 ms 0.4 0
48 k 1.8 ms 6.8 ms 1.2 0
84 k (breakpoint) 2.3 ms 9.7 ms 2.0 0

At 84 k we still hit zero allocation pressure and zero GC pauses, whereas the .NET version flatlined at 44 k with 180 ms P95 and 2.3 GB Gen 2. The Rust runtime also survived a Reddit front-page surge that peaked at 110 k connections in under two minutes; nginx handled the TLS handshakes and handed off to the Axum server, which processed 70 k position diffs per second across 16 cores.

What I Would Do Differently

If I had to do it again, I would not wait until 80 k queries to rewrite. The technical debt of socket exhaustion and GC pressure is visible much earlier in flamegraphs and pprof. I would also choose tokio-metrics earlier; it exposed a 14 % drop in throughput every time the runtime stole CPU from a worker thread, which helped us tune tokios worker count.

The learning curve is real—Rusts compiler errors at the trait level cost us two extra days—but the runtime behavior is now deterministic. We still keep the .NET version running on a legacy hunt just to remind ourselves why we left.

Top comments (0)