Database Replication is the process of keeping a copy of the same data on multiple nodes. Whether you are aiming for high availability, reduced latency, or horizontal scalability, choosing the right replication algorithm is critical.
In this guide, we will explore the three primary algorithms used in modern distributed systems: Single Leader, Multi-Leader, and Leaderless.
Table of Contents
- Single Leader Replication
- Multi-Leader Replication
- Leaderless Replication
- The Replication Lag Problem
- Summary Comparison
1. Single Leader Replication
This is the most common approach (used by MySQL, PostgreSQL, and MongoDB). One node is designated as the leader (master), and all other nodes are followers (read replicas).
How it Works
- Writes: All write requests must be sent to the leader. The leader writes the data locally and sends the change to all followers.
- Reads: Clients can read from the leader or any follower.
Synchronous vs. Asynchronous
-
Synchronous: The leader waits for followers to confirm the write.
- Pros: Guaranteed consistency.
- Cons: High latency; if one node fails, the whole write pipeline blocks.
-
Asynchronous: The leader confirms the write immediately.
- Pros: High performance.
- Cons: Risk of data loss if the leader fails before followers sync.
Handling Failures
- Follower Failure: A follower "catches up" by using its local log to request missing data from the leader.
- Leader Failure (Failover): Requires detecting failure via timeouts, electing a new leader, and reconfiguring the system.
2. Multi-Leader Replication
In this setup, more than one node can accept writes. This is typically used for applications spread across multiple geographic data centers.
Use Cases
- Multi-Data Center Operation: Users write to the nearest data center to reduce latency.
- Offline Operation: Apps like calendars or note-taking tools act as local "leaders" that sync with a server later.
The Challenge: Conflict Resolution
If two users edit the same data in different data centers simultaneously, a conflict occurs.
- Conflict Avoidance: Routing all writes for a specific record to the same leader.
- Convergence: Using Last Write Wins (LWW) or Conflict-free Replicated Data Types (CRDTs) to merge changes.
3. Leaderless Replication
Popularized by Amazon’s Dynamo, this approach allows any node to accept writes and reads. Systems like Cassandra and Riak use this model.
Quorums ($n, w, r$)
To maintain consistency without a leader, these systems use quorums:
- $n$: Total number of replicas.
- $w$: Nodes that must confirm a write.
- $r$: Nodes that must be queried for a read.
- The Rule: For a successful read of the latest data, $w + r > n$.
Fixing Stale Data
Since nodes can go down, systems fix stale data via:
- Read Repair: When a client detects an old version during a read, it pushes the newer value back to that node.
- Anti-Entropy: A background process that constantly syncs data between replicas.
The Replication Lag Problem
Regardless of the algorithm, asynchronous replication often results in "replication lag." To maintain a good user experience, developers should implement:
- Read-Your-Own-Writes: Ensures a user always sees the updates they just made.
- Monotonic Reads: Ensures a user doesn't see data "disappear" when querying different replicas.
- Consistent Prefix Reads: Guarantees that if writes happen in a specific order, they are read in that same order.
Summary Comparison
| Algorithm | Best For | Main Downside |
|---|---|---|
| Single Leader | Read-heavy apps, general simplicity | Leader is a single point of failure for writes |
| Multi-Leader | Multi-region apps, offline capabilities | Extremely complex conflict resolution |
| Leaderless | High write throughput, high availability | Complexities in eventual consistency |
Top comments (0)