Saksham Kapoor

Posted on Mar 24

I Built a Redis Server in Rust — and Found Where It Breaks

#systems #ai #programming #opensource

Most developers use Redis like this:

SET key value
GET key

It feels instant. Effortless.

But once you try building Redis yourself, you realize:
• concurrency is the real problem
• locks kill performance faster than logic
• observability itself can become a bottleneck

So I built RustRedis — a Redis-compatible server in Rust — to understand what actually happens under load.

This wasn’t about features.

It was about answering one question:

Where does performance actually break under concurrency?

🔗 Full Project

Code + benchmarks: Github Link

1. System Design

The server follows a task-per-connection model:

Client → TCP → Tokio Task → Command Execution → Shared DB

Each connection:
• parses RESP protocol
• executes commands
• returns responses

All tasks share a central database.

Two implementations were tested:
• Mutex (global lock)
• DashMap (sharded locks)

This allows direct comparison of locking strategies.

2. The Real Problem: Concurrency

At low load, everything works fine.

At high load, everything changes.

The bottleneck is not:

❌ parsing
❌ networking

It is:

👉 shared state contention

3. Lock Contention (Where It Breaks)

With a global Mutex:
• all writes serialize
• threads queue behind each other
• throughput collapses

At high concurrency:
• p99 latency explodes
• throughput drops significantly

This is called:

👉 lock convoy effect

Even short critical sections become slow under contention.

4. DashMap vs Mutex

Replacing the global lock with DashMap (sharded locking):
• reduces contention
• allows parallel writes
• improves throughput significantly

At high concurrency:
• ~60% higher throughput
• ~40% lower latency

But:

👉 not free

Trade-offs:
• more overhead per operation
• complexity for full-scan operations

5. Observability Became a Bottleneck

This was unexpected.

Tracking metrics per command introduced:

👉 another shared structure

Three approaches were tested:

Global Mutex
• simple
• but severe contention

Sharded Metrics
• better scalability
• reduced lock contention

Thread-Local Batching
• no locks on hot path
• near-zero overhead

Key insight:

Observability can become a primary bottleneck under load.

At high concurrency:
• telemetry alone caused ~30% performance drop

6. Persistence Trade-offs (AOF)

Three persistence modes:

Mode	Behavior
Always	fsync every write
EverySecond	background flush
No	OS-managed.

Results:
• Always → ~80% throughput drop
• EverySecond → minimal overhead
• No → fastest but unsafe

Insight:

👉 durability always costs performance

7. Throughput Scaling

Performance peaks early:
• ~10–100 clients → optimal
• 500+ clients → contention dominates
• 1000 clients → system becomes unstable

Why?

👉 lock contention grows non-linearly

8. Redis vs RustRedis

Compared with Redis-compatible system:

At low concurrency:
• Redis is faster (no locking overhead)

At high concurrency:
• RustRedis shows more stable behavior
• lower tail latency

Reason:

👉 multi-threaded I/O vs single-threaded event loop

9. The Most Important Insight

This project changed how I think about systems:

👉 performance is not about code speed
👉 it’s about contention management

Key takeaways:
• shared state is the real bottleneck
• locks don’t scale linearly
• batching removes contention
• observability must be designed carefully

10. What I’d Improve Next

actor-based architecture (no shared state)
lock-free structures
better persistence batching
distributed sharding

Conclusion

Building a Redis-like system reveals something important:

The hardest part of systems design is not correctness — it’s managing contention under load.

Most systems don’t fail because they are wrong.

They fail because they don’t scale under pressure.

DEV Community