Most developers use Redis like this:
SET key value
GET key
It feels instant. Effortless.
But once you try building Redis yourself, you realize:
• concurrency is the real problem
• locks kill performance faster than logic
• observability itself can become a bottleneck
So I built RustRedis — a Redis-compatible server in Rust — to understand what actually happens under load. 
This wasn’t about features.
It was about answering one question:
Where does performance actually break under concurrency?
🔗 Full Project
Code + benchmarks: Github Link
1. System Design
The server follows a task-per-connection model:
Client → TCP → Tokio Task → Command Execution → Shared DB
Each connection:
• parses RESP protocol
• executes commands
• returns responses
All tasks share a central database.
Two implementations were tested:
• Mutex (global lock)
• DashMap (sharded locks)
This allows direct comparison of locking strategies.
2. The Real Problem: Concurrency
At low load, everything works fine.
At high load, everything changes.
The bottleneck is not:
❌ parsing
❌ networking
It is:
👉 shared state contention
3. Lock Contention (Where It Breaks)
With a global Mutex:
• all writes serialize
• threads queue behind each other
• throughput collapses
At high concurrency:
• p99 latency explodes
• throughput drops significantly
This is called:
👉 lock convoy effect
Even short critical sections become slow under contention.
4. DashMap vs Mutex
Replacing the global lock with DashMap (sharded locking):
• reduces contention
• allows parallel writes
• improves throughput significantly
At high concurrency:
• ~60% higher throughput
• ~40% lower latency
But:
👉 not free
Trade-offs:
• more overhead per operation
• complexity for full-scan operations
5. Observability Became a Bottleneck
This was unexpected.
Tracking metrics per command introduced:
👉 another shared structure
Three approaches were tested:
Global Mutex
• simple
• but severe contention
Sharded Metrics
• better scalability
• reduced lock contention
Thread-Local Batching
• no locks on hot path
• near-zero overhead
Key insight:
Observability can become a primary bottleneck under load.
At high concurrency:
• telemetry alone caused ~30% performance drop
6. Persistence Trade-offs (AOF)
Three persistence modes:
| Mode | Behavior |
|---|---|
| Always | fsync every write |
| EverySecond | background flush |
| No | OS-managed. |
Results:
• Always → ~80% throughput drop
• EverySecond → minimal overhead
• No → fastest but unsafe
Insight:
👉 durability always costs performance
7. Throughput Scaling
Performance peaks early:
• ~10–100 clients → optimal
• 500+ clients → contention dominates
• 1000 clients → system becomes unstable
Why?
👉 lock contention grows non-linearly
8. Redis vs RustRedis
Compared with Redis-compatible system:
At low concurrency:
• Redis is faster (no locking overhead)
At high concurrency:
• RustRedis shows more stable behavior
• lower tail latency
Reason:
👉 multi-threaded I/O vs single-threaded event loop
9. The Most Important Insight
This project changed how I think about systems:
👉 performance is not about code speed
👉 it’s about contention management
Key takeaways:
• shared state is the real bottleneck
• locks don’t scale linearly
• batching removes contention
• observability must be designed carefully
10. What I’d Improve Next
- actor-based architecture (no shared state)
- lock-free structures
- better persistence batching
- distributed sharding
Conclusion
Building a Redis-like system reveals something important:
The hardest part of systems design is not correctness — it’s managing contention under load.
Most systems don’t fail because they are wrong.
They fail because they don’t scale under pressure.
Top comments (0)