Redis Runs on One Thread

#redis #systemdesign #backend #architecture

By Design — Episode 04

In 2009, Salvatore Sanfilippo had a problem.

His startup, LLOOGG, was a real-time web analytics tool. This was before Google Analytics offered real-time visitor data (that would not arrive until 2011). LLOOGG let website owners see who was visiting their site, right now. The business case was real. The technical problem was equally real: MySQL could not keep pace with the required operations: hundreds of list push/pops per second, each needing an immediate response.

Sanfilippo prototyped a memory-based database in Tcl at his home in Sicily. He called it LMDB: LLOOGG Memory Database. The prototype was 300 lines. It worked. He rewrote it in C. He named the result Redis: Remote Dictionary Server.

He made it single-threaded. Deliberately.

The Complaint

"Redis runs on one thread. Modern servers have 64, 96, even 128 cores. Redis uses exactly one. That is a serious architectural limitation."

One does hear this. Usually from someone who has just provisioned a 64-core server, pointed a single Redis instance at it, and noticed that the other 63 cores are contributing nothing to the effort.

The complaint is coherent. The conclusion it implies (that Redis should be multi-threaded) is not.

The Design

The choice was explicit. Sanfilippo built Redis around an event loop (kqueue on BSD/macOS, epoll on Linux, select as a portable fallback) that processes commands sequentially. One command at a time. When a command arrives, it executes. When it completes, the next begins. No locking. No mutexes. No context switches between executions. No possibility of two operations corrupting shared state.

This is not a limitation of ambition. It is a consequence of understanding where the bottleneck actually is.

In a memory-first system, the bottleneck is not computation. It is contention. Consider what multi-threading actually costs:

An uncontended mutex: 100 to 1,000 CPU cycles to acquire and release
A contested mutex: 10,000 CPU cycles or more, plus the cost of putting a thread to sleep and waking it
Cache line invalidation: when thread A writes to a value and thread B reads it, the cache line must travel between CPU cores (this takes 40 to 100 nanoseconds, orders of magnitude slower than an L1 cache hit at 1.2 nanoseconds)
Context switching: 2,000 to 8,000 cycles per switch, plus TLB flushes

Redis operates entirely in memory. Operations are fast: a GET or SET completes in microseconds. At that speed, the overhead of coordinating multiple threads is not a rounding error. It is a significant fraction of the operation itself.

By removing threads from command execution entirely, Redis removes the cost of coordinating them. The sequential model eliminates lock contention by making contention structurally impossible. The event loop provides I/O multiplexing without the overhead of one thread per connection. The simplicity is the performance.

The Trade-Off

Let us be honest. The design has real costs.

A slow command blocks everything. KEYS on a production database containing one million entries stops the entire server for the duration of the scan. No other command executes. No other client receives a response. The server is, to a first approximation, unavailable.

This is not a bug. It is the honest cost of sequential execution. The documentation is clear: use SCAN for incremental iteration. Production incidents are equally clear: someone uses KEYS anyway, usually at 3 AM, usually because they forgot to read the documentation, usually in a way that is not discovered until the monitoring alerts.

The other cost: Redis cannot use multiple CPU cores from a single instance. Scaling across cores requires running multiple Redis instances and distributing keys across them using Redis Cluster or application-level sharding. The model is horizontal, not vertical. One instance per core, not one instance for all cores. This is operational complexity that a multi-threaded design would not require.

These are real trade-offs. Sanfilippo made them knowing they were real. The design documentation has never pretended otherwise.

The Proof

Without pipelining: 180,000 SET operations per second on commodity hardware, with p50 latency under 0.15 milliseconds. With pipelining of 16 commands: over 1.5 million SET operations per second, p50 latency under 0.5 milliseconds. Consistently. Under load. Without a garbage collector pause, because there is no garbage collector.

Twitter adopted Redis in 2010 for timeline fan-out. Instagram adopted it in 2010 for media metadata. GitHub adopted it for job queues. Stack Overflow runs it as their primary cache, serving billions of requests per month from commodity hardware.

In 2020, Redis 6.0 added I/O threading: network reads and writes now happen in parallel, but command execution remains single-threaded. The 37–112% throughput improvement this delivered suggests that network I/O was the actual bottleneck in many workloads, not command execution. The single-threaded model, it turns out, was not the limiting factor.

In 2024, Redis Ltd changed the project's licence from BSD to SSPL, effectively removing it from open source. The community forked it as Valkey within weeks, backed by the Linux Foundation, Amazon, Google, Oracle, and Ericsson. The fork started from Redis 7.2.4. The single-threaded architecture came with it, unchanged, because nobody wanted to change it.

Redis 8 reduced per-command latency by 5.4% to 87.4% across 90 commands, through algorithmic improvements, not threading. Seventeen years after the first commit, the design is still being optimised within its own constraints rather than abandoned.

The Principle

Contention is not a problem you solve with more threads. It is a problem you design out.

Sanfilippo looked at the bottleneck in a memory-first system and identified it correctly: not computation, but coordination. He removed the coordination by making it impossible. The result was a system that processes millions of commands per second on a single core and delivers sub-millisecond latency under load.

The engineering instinct is to add parallelism when performance matters. Redis demonstrates that the opposite instinct (removing the overhead of coordination) can be more effective. The constraint is not a limitation to be overcome. It is the source of the guarantee.

Every application that has ever cached a session token, enforced a rate limit, maintained a leaderboard, or coordinated a distributed lock has relied on the fact that Redis will not corrupt its own data structures under concurrent access, because concurrent access of command execution is, by design, impossible.

The limitation is the architecture. The architecture is the feature.

Read the full article on vivianvoss.net →

By Vivian Voss — System Architect & Software Developer. Follow me on LinkedIn for daily technical writing.