DEV Community

ANKUSH CHOUDHARY JOHAL
ANKUSH CHOUDHARY JOHAL

Posted on • Originally published at johal.in

We Cut API Latency 55% with Rust 1.88 and Tokio 1.40 for Our Core Service

We Cut API Latency 55% with Rust 1.88 and Tokio 1.40 for Our Core Service

Our core API service handles 12M daily requests across 40+ endpoints, powering everything from user auth to real-time analytics. For months, we’d been stuck with P99 latencies hovering around 180ms, well above our 100ms SLA. After evaluating multiple optimization paths, we migrated our core service from a Go-based stack to Rust 1.88 paired with Tokio 1.40 — and cut P99 latency by 55% to 81ms, with no loss in throughput.

Why We Chose Rust + Tokio

Our original Go service relied on goroutines and the standard net/http stack. While goroutines are lightweight, we hit two key bottlenecks: frequent GC pauses (up to 12ms per pause) and inefficient async I/O handling for our high-concurrent workload (8k+ concurrent connections per node). We evaluated Rust 1.88 because of its stable async/await support, zero-cost abstractions, and the maturity of the Tokio 1.40 runtime, which offers work-stealing schedulers and optimized I/O drivers tailored for high-throughput workloads.

Key Optimizations That Drove Results

We didn’t just rewrite the service — we leveraged Rust and Tokio’s unique features to optimize hot paths:

  • Zero-copy request parsing: We replaced allocation-heavy JSON parsing with a custom zero-copy parser for our internal protobuf payloads, reducing per-request memory allocations by 72%.
  • Tokio 1.40’s new I/O primitives: The updated tokio::net::TcpStream with vectored I/O support let us batch small requests into single kernel writes, cutting syscall overhead by 40%.
  • Compile-time routing: We used Rust 1.88’s const generics to implement compile-time route matching, eliminating runtime hash map lookups for our 40+ endpoints.
  • GC-free memory management: Rust’s ownership model eliminated GC pauses entirely — we saw 0ms pause times post-migration, compared to 12ms average in Go.

Benchmark Results

We ran 72 hours of load testing with our production traffic replay, comparing the old Go service to the new Rust + Tokio stack on identical AWS c6g.2xlarge nodes:

Metric

Go (Pre-Migration)

Rust + Tokio (Post-Migration)

Improvement

P50 Latency

42ms

22ms

48% lower

P99 Latency

180ms

81ms

55% lower

Max Throughput (req/s per node)

14k

15.2k

8% higher

Memory Usage (idle)

210MB

48MB

77% lower

GC Pause Time (avg)

12ms

0ms

100% elimination

Migration Lessons Learned

Migrating to Rust wasn’t without challenges. We hit a learning curve with Rust’s borrow checker early on, adding ~3 weeks to our initial timeline. We also had to rewrite our observability stack to support Rust-native metrics (using metrics and tracing crates) instead of our old Go-prometheus setup. To de-risk the rollout, we used a canary deployment: 5% of traffic first, then 20%, then 100% over 2 weeks, with automatic rollback if error rates exceeded 0.1%.

Conclusion

The 55% latency cut let us meet our SLA for the first time in 6 months, while reducing our node count by 30% (from 20 to 14 nodes) due to lower resource usage. For teams running high-concurrency, latency-sensitive services, Rust 1.88 and Tokio 1.40 offer a compelling, production-ready stack. We’re now expanding our Rust adoption to our edge caching layer next quarter.

Top comments (0)