DEV Community

Cover image for Read Replicas
Gouranga Das Samrat
Gouranga Das Samrat

Posted on

Read Replicas

One-liner: Copies of your primary database that handle read queries, freeing the primary to focus on writes.


πŸ“Œ The Problem

Most web applications are read-heavy (80-90% reads, 10-20% writes).

A single database server handles both reads and writes β†’ becomes a bottleneck.

All traffic β†’ [Primary DB]
              CPU: 95% πŸ”₯
              Connections: maxed out πŸ”₯
Enter fullscreen mode Exit fullscreen mode

πŸ’‘ The Solution: Read Replicas

Add secondary copies of the database:

  • Primary (Master) β†’ handles all writes
  • Replicas (Slaves) β†’ handle all reads
[App Server 1] ──write──► [Primary DB] ──replicates──► [Replica 1]
[App Server 2] ──read──►  [Replica 1]                 [Replica 2]
[App Server 3] ──read──►  [Replica 2]
Enter fullscreen mode Exit fullscreen mode

πŸ”„ How Replication Works

Synchronous Replication

Write β†’ Primary β†’ waits for Replica ACK β†’ confirms to client
Enter fullscreen mode Exit fullscreen mode
  • βœ… Zero data loss β€” Replica always has latest data
  • ❌ Higher write latency β€” must wait for replica confirmation

Asynchronous Replication

Write β†’ Primary β†’ confirms to client β†’ replicates to Replica in background
Enter fullscreen mode Exit fullscreen mode
  • βœ… Lower write latency
  • ❌ Replica lag β€” reads might return slightly stale data
  • ❌ If primary crashes before replication, data loss possible

Semi-Synchronous

Write β†’ Primary β†’ waits for ACK from at least 1 replica β†’ confirms
Enter fullscreen mode Exit fullscreen mode
  • Balance between safety and latency (MySQL's default option)

πŸ“Š Replication Lag

Replica lag is the delay between a write on primary and it being visible on replica.

Lag Impact
< 100ms Usually fine β€” imperceptible to users
100ms – 1s Acceptable for most use cases
> 1s Noticeable β€” "I just posted but I can't see it" bug
Minutes Data consistency issue β€” investigate urgently

Handling Lag in Code

// Read-after-write consistency:
// After a user writes, route their next read to primary

async function getUserProfile(userId, justUpdated = false) {
  if (justUpdated) {
    return await primaryDB.query("SELECT * FROM users WHERE id = ?", [userId]);
  }
  return await replicaDB.query("SELECT * FROM users WHERE id = ?", [userId]);
}
Enter fullscreen mode Exit fullscreen mode

πŸ—οΈ Architecture Patterns

Single Replica (Simple)

Writes β†’ [Primary] β†’ [Replica 1] ← Reads
Enter fullscreen mode Exit fullscreen mode

Multiple Replicas (Common)

         [Primary] ──► [Replica 1]  \
Writes β†’      β”‚    ──► [Replica 2]  ──► Read traffic distributed
              └────► [Replica 3]  /
Enter fullscreen mode Exit fullscreen mode

Cascading Replicas

[Primary] β†’ [Replica 1] β†’ [Replica 2] β†’ [Replica 3]
Enter fullscreen mode Exit fullscreen mode

Reduces load on primary but increases replica lag.

Regional Replicas (Multi-region)

[Primary - Mumbai] ──► [Replica - Singapore]
                   ──► [Replica - US-East]
Enter fullscreen mode Exit fullscreen mode

Serve reads from the closest region to the user.


πŸ”„ Failover: When Primary Dies

  1. Detect failure β€” health check fails
  2. Elect new primary β€” replica with least lag is promoted
  3. Redirect writes β€” app/load balancer routes writes to new primary
  4. Old primary rejoins β€” becomes a replica when it comes back
Before: App β†’ Primary(A) β†’ Replica(B)
Failure: Primary(A) dies
After:  App β†’ Primary(B)   [B promoted]
              ↑
           Replica(A) [A rejoins as replica]
Enter fullscreen mode Exit fullscreen mode

⚠️ Read Replica Caveats

Caveat Detail
Eventual consistency Reads may return stale data
Write bottleneck Replicas help reads; writes still limited to primary
Complex routing Need to distinguish read vs write queries
Cost Each replica = another database server

🌍 When to Add Read Replicas

DB CPU > 70% sustained β†’ add replica
Read:Write ratio > 5:1 β†’ add replica
Reporting queries slowing down app β†’ add dedicated analytics replica
Multi-region users β†’ add geo-replica
Enter fullscreen mode Exit fullscreen mode

🎨 Diagram

The diagram shows:

  • Primary DB receiving writes
  • Replication arrows to 2-3 replicas
  • App servers routing reads to replicas
  • Failover promotion flow

πŸ”‘ Key Takeaways

  • Read replicas solve read scalability β€” writes still bottleneck on primary
  • Use async replication for performance, sync for zero data loss
  • Always handle replica lag in your application code
  • Read replicas are the first step; sharding is the next if writes bottleneck

Top comments (0)