One-liner: Copies of your primary database that handle read queries, freeing the primary to focus on writes.
π The Problem
Most web applications are read-heavy (80-90% reads, 10-20% writes).
A single database server handles both reads and writes β becomes a bottleneck.
All traffic β [Primary DB]
CPU: 95% π₯
Connections: maxed out π₯
π‘ The Solution: Read Replicas
Add secondary copies of the database:
- Primary (Master) β handles all writes
- Replicas (Slaves) β handle all reads
[App Server 1] ββwriteβββΊ [Primary DB] ββreplicatesβββΊ [Replica 1]
[App Server 2] ββreadβββΊ [Replica 1] [Replica 2]
[App Server 3] ββreadβββΊ [Replica 2]
π How Replication Works
Synchronous Replication
Write β Primary β waits for Replica ACK β confirms to client
- β Zero data loss β Replica always has latest data
- β Higher write latency β must wait for replica confirmation
Asynchronous Replication
Write β Primary β confirms to client β replicates to Replica in background
- β Lower write latency
- β Replica lag β reads might return slightly stale data
- β If primary crashes before replication, data loss possible
Semi-Synchronous
Write β Primary β waits for ACK from at least 1 replica β confirms
- Balance between safety and latency (MySQL's default option)
π Replication Lag
Replica lag is the delay between a write on primary and it being visible on replica.
| Lag | Impact |
|---|---|
| < 100ms | Usually fine β imperceptible to users |
| 100ms β 1s | Acceptable for most use cases |
| > 1s | Noticeable β "I just posted but I can't see it" bug |
| Minutes | Data consistency issue β investigate urgently |
Handling Lag in Code
// Read-after-write consistency:
// After a user writes, route their next read to primary
async function getUserProfile(userId, justUpdated = false) {
if (justUpdated) {
return await primaryDB.query("SELECT * FROM users WHERE id = ?", [userId]);
}
return await replicaDB.query("SELECT * FROM users WHERE id = ?", [userId]);
}
ποΈ Architecture Patterns
Single Replica (Simple)
Writes β [Primary] β [Replica 1] β Reads
Multiple Replicas (Common)
[Primary] βββΊ [Replica 1] \
Writes β β βββΊ [Replica 2] βββΊ Read traffic distributed
ββββββΊ [Replica 3] /
Cascading Replicas
[Primary] β [Replica 1] β [Replica 2] β [Replica 3]
Reduces load on primary but increases replica lag.
Regional Replicas (Multi-region)
[Primary - Mumbai] βββΊ [Replica - Singapore]
βββΊ [Replica - US-East]
Serve reads from the closest region to the user.
π Failover: When Primary Dies
- Detect failure β health check fails
- Elect new primary β replica with least lag is promoted
- Redirect writes β app/load balancer routes writes to new primary
- Old primary rejoins β becomes a replica when it comes back
Before: App β Primary(A) β Replica(B)
Failure: Primary(A) dies
After: App β Primary(B) [B promoted]
β
Replica(A) [A rejoins as replica]
β οΈ Read Replica Caveats
| Caveat | Detail |
|---|---|
| Eventual consistency | Reads may return stale data |
| Write bottleneck | Replicas help reads; writes still limited to primary |
| Complex routing | Need to distinguish read vs write queries |
| Cost | Each replica = another database server |
π When to Add Read Replicas
DB CPU > 70% sustained β add replica
Read:Write ratio > 5:1 β add replica
Reporting queries slowing down app β add dedicated analytics replica
Multi-region users β add geo-replica
π¨ Diagram
The diagram shows:
- Primary DB receiving writes
- Replication arrows to 2-3 replicas
- App servers routing reads to replicas
- Failover promotion flow
π Key Takeaways
- Read replicas solve read scalability β writes still bottleneck on primary
- Use async replication for performance, sync for zero data loss
- Always handle replica lag in your application code
- Read replicas are the first step; sharding is the next if writes bottleneck
Top comments (0)