shubham pandey (Connoisseur)

Posted on Mar 22

Redis

#backend #database #performance #systemdesign

From a frustrated Sicilian hacker in 2009 to the backbone of every scaled system you've ever used — here's everything Redis does and why it works.

The World Before Redis — and Why It Was Broken
Inside Redis: The Engine That Shouldn't Work (But Does)
Persistence: RDB vs AOF — How Redis Survives a Crash
Replication & Sentinel — Redis Gets Serious About Availability
Redis Cluster & Sharding — Going Horizontal
Pub/Sub & Streams — Redis as a Message Bus
The Verdict: When Redis Wins and When It Doesn't

1. The World Before Redis — and Why It Was Broken

It's 2009. Salvatore Sanfilippo — antirez on the internet — is trying to build a real-time web analytics tool called LLOOGG. Every page view needs to be recorded. Every user session needs a capped log. The data structure? A list. The operation? Append to end, pop from front when it exceeds N entries.

He tries PostgreSQL. He tries MySQL. They work — for a few requests per second. Then they don't. Disk seeks kill him. Row locking kills him. The impedance mismatch between "I need a capped list" and "here's your B-tree index" kills him.

So he does what engineers do when they're desperate enough: he writes his own database. In C. In a weekend. That database is Redis.

The Timeline

Pre-2009 — The Dark Ages
Every team runs RDBMS for everything. Session storage in MySQL. Cache in MySQL. Rate limiting in MySQL. The database is both the source of truth and the punching bag.

~2003 — Memcached Arrives
Brad Fitzpatrick at LiveJournal builds Memcached to solve the read-heavy problem. Key-value, in-memory, fast. But it's a dumb cache — no persistence, no data structures, no atomicity. You can't do "increment this counter unless it doesn't exist."

2009 — Redis Ships
antirez open-sources Redis. It's Memcached with a brain — data structures, atomic operations, optional persistence. Hacker News loses its mind. Within months, Twitter and GitHub are running it in production.

2010–2015 — The Takeover
Redis Sentinel ships for high availability. Redis Cluster ships for horizontal scaling. Redis goes from "interesting toy" to "required infrastructure." Salvatore joins VMware, then Pivotal, to work on it full-time.

2020–Present — Redis Ltd. & Forks
Redis Labs (now Redis Ltd.) introduces commercial licensing for modules. The community forks into Valkey (Linux Foundation) and Redict. The core is still C. The architecture is still single-threaded. The principles haven't changed.

The real insight: The world before Redis wasn't lacking a faster database. It was lacking a database that matched how engineers actually think about data — as lists, sets, counters, and queues — not just rows and columns.

2. Inside Redis: The Engine That Shouldn't Work (But Does)

Redis is single-threaded. In an era of 64-core servers, this sounds insane. Every database textbook tells you concurrency is how you scale. Redis ignores the textbook — and it's faster than most multi-threaded systems at their own game.

Here's why: the bottleneck was never the CPU. It was the disk. Redis lives in RAM. When your data fits in memory, the CPU processes commands in nanoseconds. Context switching, mutex contention, and lock overhead from multi-threading cost more than the work itself. One thread, zero contention, pure throughput.

The Event Loop (epoll)

Redis uses an event-driven, non-blocking I/O model — a reactor pattern powered by epoll on Linux. Here's what happens on every tick:

┌─────────────────────────────────────────────┐
│ Redis Event Loop (ae.c) │
└─────────────────────────────────────────────┘
epoll_wait() ← blocks until ≥1 fd is ready
│
▼
for each ready fd:
├─ read event? → parse command → execute → write response
├─ write event? → flush output buffer
└─ timer event? → run background jobs (TTL expiry, etc.)
repeat forever
No threads. No locks. No context switches.
10,000 connections → same single loop handles them all.

Data Structures — The Real Differentiator

Redis doesn't store "strings." It stores typed objects, each with an encoding chosen at runtime for memory efficiency:

Type	Internal Encoding	Use Case
String	SDS (Simple Dynamic String)	Cache values, counters, session tokens
List	QuickList (linked list of ListPack nodes)	Message queues, activity feeds
Hash	ListPack → Hashtable (auto-promoted)	User profiles, object storage
Set	ListPack → Hashtable	Tags, unique visitors, membership
Sorted Set	Skip List + Hash Map	Leaderboards, rate limiters, trending topics

The Sorted Set is the most interesting. It maintains a skip list for ordered range queries and a hash map for O(1) score lookups — two data structures kept in sync on every write, giving you O(log n) for everything.

TTL Mechanics — Two-Phase Expiry

When you call EXPIRE key 60, Redis doesn't set a timer. It stores the expiry timestamp in a separate hash. Actual deletion happens in two ways:

Lazy Expiry: On every read, Redis checks if the key is expired before returning it. Expired? Delete it, return nil. Zero background overhead.

Active Expiry: Every 100ms, Redis samples 20 random keys from the expiry hash. If more than 25% are expired, it samples again — loops until the expired ratio drops below 25% or the time budget runs out. This caps CPU usage while still reclaiming memory proactively.

ON every READ command:
if key exists in expires_dict:
if now() > expires_dict[key]:
delete(key)
return NIL
EVERY 100ms (activeExpireCycle):
sample 20 keys from expires_dict
delete all expired ones
if expired_count / 20 > 0.25:
repeat // keep going until clean enough

3. Persistence: RDB vs AOF — How Redis Survives a Crash

Redis is in-memory. If the process dies, the data dies with it — unless you've configured persistence. Redis gives you two mechanisms. They solve different problems. Most production systems use both.

RDB — The Snapshot

RDB (Redis Database) takes a point-in-time snapshot of your entire dataset and writes it to disk as a compact binary file. Think of it as a photograph of your memory at a moment in time.

The magic: Redis uses fork(). The parent process keeps serving requests. The child process inherits the memory, writes the snapshot, exits. The OS handles copy-on-write — pages only get duplicated if the parent modifies them. Zero downtime, low overhead.

RDB Snapshot Flow
Parent Process Child Process
────────────── ─────────────
serving requests (forked)
│ │
user writes page A ──CoW──→ page A copied for child
│ │
continues serving writes RDB to dump.rdb
│ │
│ exits cleanly
│
atomic rename: dump.rdb.tmp → dump.rdb

Trigger it manually with BGSAVE, or configure automatic snapshots: save 900 1 means snapshot if ≥1 key changed in 900 seconds.

RDB Pros: Compact binary format, fast to load on restart, minimal performance impact, perfect for backups.

RDB Cons: You lose all data since the last snapshot (could be minutes). Not suitable when you need near-zero data loss.

AOF — The Append-Only Log

AOF (Append-Only File) logs every write command as it happens. On crash, Redis replays the log to reconstruct state — same concept as PostgreSQL's WAL.

Three fsync policies control the durability vs performance tradeoff:

Policy	Behavior	Data Loss Risk
`always`	fsync after every command	Zero — slowest
`everysec` (default)	fsync once per second	At most 1 second
`no`	OS decides when to fsync	Up to OS buffer size

AOF Rewrite: AOF files grow forever. Redis periodically rewrites the AOF — replacing the log of SET x 1, INCR x, INCR x, INCR x with just SET x 4. Done in the background via fork. File size collapses, replay time shrinks.

The Hybrid: RDB + AOF

Production best practice: enable both. RDB gives you fast restarts and clean backups. AOF gives you durability between snapshots. Redis on restart prefers AOF (more complete), falls back to RDB if AOF is missing.

Rule of thumb: Tolerating minutes of data loss? → RDB only. Storing primary data that can't be replayed? → AOF with everysec. Running something financial on Redis? → AOF with always. And always test your recovery path. A backup you've never restored is just a file.

4. Replication & Sentinel — Redis Gets Serious About Availability

A single Redis node is a single point of failure. If it goes down, every cache miss hits your database simultaneously — the thundering herd. Replication is how Redis spreads read load and survives node failures.

How Replication Works

Redis uses asynchronous leader-follower replication. One primary accepts writes. One or more replicas mirror the primary's data and serve reads.

 ┌──────────────┐
 │   Primary    │  ← ALL writes go here
 └──────┬───────┘
        │  replication stream (async)

┌────────┴────────┐
▼ ▼

┌────────────┐ ┌────────────┐
│ Replica 1 │ │ Replica 2 │
│ (read-only)│ │ (read-only)│
└────────────┘ └────────────┘
Client reads → any replica
Client writes → primary only

Initial Sync: Primary runs BGSAVE, sends the RDB snapshot, then streams all commands that happened during the snapshot. Replica loads RDB, applies the delta — fully caught up.

Partial Resync: If a replica briefly disconnects, it sends its replication offset on reconnect. The primary checks its replication backlog (a circular buffer of recent commands). If the offset is still in the buffer, only the missed commands are replayed — no full RDB needed.

The catch: Replication is asynchronous. Writes acknowledged by the primary may not yet be on replicas. If the primary crashes before replication completes, that data is gone. For critical data, pair with AOF always or use WAIT to force synchronous acknowledgement.

Sentinel — Automated Failover

Sentinel monitors your Redis topology and promotes a replica to primary when the leader fails.

Failure scenario:
1. Primary goes silent
2. Sentinel marks it SDOWN (subjectively down)
3. Quorum of Sentinels agree → ODOWN (objectively down)
4. Election: one Sentinel leads the failover
5. Best replica promoted to primary
6. Other replicas repoint to new primary
7. Clients get new primary address via Sentinel API

Run at least 3 Sentinel instances (odd number for quorum). Your client talks to Sentinel first — asks "who is the current primary?" — then connects. Libraries like ioredis and redis-py handle this transparently.

Sentinel is not a silver bullet. Failover takes 30–60 seconds by default. Design your application to handle degraded writes gracefully — queue them, circuit-break, or serve stale reads. Never assume failover is instant.

5. Redis Cluster & Sharding — Going Horizontal

Sentinel solves availability. It doesn't solve capacity. If your dataset is 500GB, no single machine runs it in RAM. That's the problem Redis Cluster solves.

Hash Slots — The Foundation of Sharding

Redis Cluster divides the keyspace into 16,384 hash slots. Every key maps to a slot: slot = CRC16(key) % 16384. Slots are distributed across nodes.

Node A (Primary) ── Slots 0–5460 + Node A’ (Replica)
Node B (Primary) ── Slots 5461–10922 + Node B’ (Replica)
Node C (Primary) ── Slots 10923–16383 + Node C’ (Replica)
Key “user:1234” → CRC16 % 16384 = 8976 → Node B
Key “user:5678” → CRC16 % 16384 = 1204 → Node A

Request Routing — MOVED & ASK

Cluster-aware clients cache the slot→node mapping and route directly. If the map is stale, the node replies with a MOVED error. During live resharding, a transitional ASK redirect is used.

The Multi-Key Trap

In Cluster mode, all keys in a single command must map to the same slot. MGET user:1 user:2 fails if those keys land on different nodes.

Solution: hash tags. {user}.1 and {user}.2 both hash on user — guaranteed same slot.

bash

Without hash tags — might fail in cluster

MGET user:1 user:2

With hash tags — guaranteed same slot

MGET {user}.1 {user}.2

This is not optional trivia. It’s a design constraint you’ll hit on day one of cluster migration.
Cluster vs Sentinel

	Sentinel	Cluster
Use when	Dataset fits in one node’s RAM	Dataset exceeds single-node RAM
Write scale	Single primary	Horizontal across nodes
Multi-key commands	Works freely	Requires hash tags
Ops complexity	Lower	Higher

Pub/Sub & Streams — Redis as a Message Bus Redis can act as a lightweight message broker — with two very different mechanisms: Pub/Sub and Streams. They solve different problems. Use the wrong one and you’ll regret it at 2am. Pub/Sub — Fire and Forget Publishers send to channels, subscribers receive. No history. No persistence. If a subscriber is offline when a message is published, that message is gone forever.

Publisher

PUBLISH notifications "user:42 completed checkout"

Subscriber

SUBSCRIBE notifications

Pattern subscribe

PSUBSCRIBE notifications:*

Use Pub/Sub for: real-time events where loss is acceptable — typing indicators, presence updates, cache invalidation signals.
Never use Pub/Sub for: anything requiring guaranteed delivery. If a subscriber is offline, the message is gone. Full stop.
The invisible problem: Pub/Sub is a telephone call, not a voicemail. If no one picks up, nothing is recorded.
Redis Streams — Persistent, Ordered Message Log
Streams are a durable, append-only log introduced in Redis 5.0 — conceptually similar to Kafka topics but living inside Redis. Messages persist until explicitly deleted.

Producer

XADD orders * user_id 42 item "laptop" amount 1299

Consumer (blocking)

XREAD COUNT 10 BLOCK 0 STREAMS orders $

Consumer Group — each message delivered to one worker

XGROUP CREATE orders order-processors $ MKSTREAM
XREADGROUP GROUP order-processors worker-1 COUNT 5 STREAMS orders >

Acknowledge — removes from Pending Entry List

XACK orders order-processors 1686123456789-0

The Pending Entry List (PEL) is the critical piece: every delivered-but-not-acknowledged message sits here. If a worker crashes, XCLAIM reassigns its pending messages to another worker. This gives you at-least-once delivery semantics.
Streams vs Pub/Sub

	Pub/Sub	Streams
Persistence	None	Yes
Delivery guarantee	None	At-least-once (PEL + XACK)
Consumer groups	No	Yes
Replay	Impossible	From any point
Best for	Ephemeral signals	Reliable event workflows

Streams vs Kafka: Under 100K events/sec and already running Redis? Streams probably suffice. Above that? Kafka earns its complexity.

The Verdict: When Redis Wins and When It Doesn’t Use Redis for: ∙ Session storage and auth tokens ∙ Rate limiting (INCR + EXPIRE is atomic) ∙ Leaderboards (Sorted Sets) ∙ Distributed locks (SET NX PX) ∙ Real-time feed aggregation (fan-out on write) ∙ Job queues (Lists or Streams) ∙ Caching hot database rows ∙ Cache invalidation across nodes (Pub/Sub) Don’t use Redis for: ∙ Primary data store for large datasets (RAM is expensive) ∙ Complex queries and JOINs ∙ Full-text search → use Elasticsearch ∙ Heavy analytics → use ClickHouse or BigQuery ∙ High-volume audit logs → use Kafka antirez built Redis to solve one problem: fast, structured, in-memory operations. Fifteen years later, it still solves exactly that — and everything built on top of it is just clever applications of the same primitives he wrote in C over a weekend in Sicily. The lesson isn’t “use Redis.” The lesson is: know what your database is optimized for, and stop asking it to do everything else.

If this was useful — share it. If you disagree — the comments exist for a reason.

DEV Community