The Day Our Latency Shot Up When the Runtime Decided to GC

#webdev #programming #rust #performance

The Problem We Were Actually Solving

We needed a way to move money out of countries where PayPal, Stripe, and Gumroad had no footprint. Our micro-payment layer already integrated with local aggregators—EasyPaisa in Pakistan, bKash in Bangladesh, Flutterwave in Nigeria—but the latency of the Elixir service was eating into the margin. At 800 ms per withdrawal, we were losing around 0.4 % of each transaction to timeouts and retries.

What We Tried First (And Why It Failed)

We started with Elixir because we loved the developer experience. Live upgrade, hot code reload, and fault tolerance were perfect for a small team shipping fast. But the garbage collector was non-deterministic. Even after we tuned +K true +A 24, the 95th percentile GC pause was still 45 ms. Thats half of our entire SLA budget for the /api/withdraw endpoint.

We tried dialyzer to catch dialyzer warnings, but it added 12 minutes to our CI pipeline and still didnt catch the latency spikes when the GC kicked in. We benchmarked with recon_alloc and saw peak memory fragmentation during sudden creator traffic spikes. The BEAM scheduler would occasionally block all run queues for 50-60 ms while it swept the nursery heap. Those spikes were rare but catastrophic for tail latency.

The Architecture Decision

We benchmarked three runtimes under identical load: Go 1.21, Rust 1.72, and Node 18 with TypeScript. The test was a single endpoint that accepted a JSON payload, ran a micro-payment aggregation, and returned a 201 with a withdrawal ID. We used vegeta to hit each endpoint at 1 000 rps for 60 seconds.

Go gave us 95th percentile latency at 42 ms and memory usage at 6.1 GB after 10 minutes. Rust gave us 18 ms and 4.8 GB. Node gave 120 ms and 9.3 GB with frequent event-loop stalls. The Rust binary was 6.7 MB stripped, and the Go binary was 37 MB.

We chose Rust. We rewrote the payout micro-service in Rust, using Tokio 1.35 with tower and hyper. We tuned jemalloc via the mimalloc override, set allocation arenas to 8, and used the jemalloc heap profiler to confirm zero fragmentation spikes. The new endpoint looked like this:

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
 let addr = ([0, 0, 0, 0], 3000).into();
 let listener = tokio::net::TcpListener::bind(addr).await?;
 let state = Arc::new(AppState::new().await?);

 loop {
 let (socket, _) = listener.accept().await?;
 let state = Arc::clone(&state);
 tokio::spawn(async move {
 if let Err(e) = process_withdrawal(socket, state).await {
 tracing::error!(error = %e, "withdrawal failed");
 }
 });
 }
}

We containerised it with distroless-debian12, set stack size to 8 MB, and pinned the CPU governor to performance. We deployed it on the same Hetzner CX41 nodes, sidecard with nginx for TCP load balancing.

What The Numbers Said After

After two weeks of production traffic, the numbers spoke for themselves. We sampled 5 million withdrawal requests over 14 days:

Latency:
 P50: 8 ms
 P95: 24 ms
 P99: 52 ms
 Max over single day: 112 ms

Resource:
 CPU utilisation: 32 %
 Memory: 3.4 GB / 8 GB used
 RssAnon: 2.8 GB
 Voluntary context switches per second: 12 k
 Involuntary context switches per second: 800

GC & alloc:
 Tokio work-stealing runtime: 8 worker threads
 Allocator: jemalloc with mimalloc override
 Total allocations: 1.3 GB/s
 Bump allocator misses: 0.02 %

Throughput:
 Requests per second sustained: 2.1 k
 Error rate: 0.04 %
 Rejected connections due to backpressure: 0.003 %

The latency curve was flat across the board. Even during the Pakistani Eid weekend when withdrawal volume spiked 5×, the 99th percentile stayed below 65 ms. The memory usage was predictable and stayed within the container limits. Wed finally decoupled the payment rail latency from the runtimes GC unpredictability.

What I Would Do Differently

Next time, Id prototype the critical path in a single-threaded Rust service before going async. Our first design used async everywhere, and we spent two weeks chasing tokio::select! races that only showed up under load