Replication & High Availability
๐น What is Replication?
Replication is the process of copying data from one database server (the primary) to one or more other servers (replicas/secondaries).
- The primary (or master) handles writes (INSERT, UPDATE, DELETE).
- The replicas (or slaves, followers, standbys) receive these changes and apply them.
- This ensures multiple copies of the same data exist across servers.
๐ Think of it like keeping photocopies of your important notebook in multiple places โ if one is lost, you still have backups.
๐น Types of Replication
- Synchronous Replication
- Writes on the primary are confirmed only after replicas also confirm theyโve written the change.
- Ensures no data loss but can slow down performance (since it waits).
- Example: PostgreSQL synchronous replication.
- Asynchronous Replication
- Primary writes immediately return success, and replicas catch up later.
- Faster, but risk of data loss if primary crashes before replicas sync.
- Example: MySQLโs default replication.
- Semi-synchronous Replication
- A middle ground: primary waits for at least one replica to confirm before success.
- Balance between safety & performance.
๐น Why Replication is Important?
- Read Scaling: Distribute reads across replicas (read-heavy apps benefit).
- High Availability: If primary fails, replicas can be promoted as new primary.
- Disaster Recovery: Data is safe even if one server is lost.
- Geographic Distribution: Users in Asia can read from an Asia replica instead of a US server.
๐น High Availability (HA)
HA is about keeping your database always online, even during failures. Replication is a core building block of HA, but HA adds automatic failover and monitoring.
- If primary dies, a replica automatically becomes the new primary.
- Clients reconnect automatically without manual intervention.
- Requires a cluster manager / orchestrator.
๐น Replication & HA in Popular Databases
-
MySQL
- Replication: Asynchronous by default (
binlog
-based). - HA: Tools like MySQL InnoDB Cluster, Orchestrator, or ProxySQL handle failover.
- Replication: Asynchronous by default (
-
PostgreSQL
- Built-in streaming replication.
- HA: Tools like Patroni, PgBouncer, repmgr, or Stolon.
-
MongoDB
- Replica sets built-in: one primary, multiple secondaries, automatic failover.
-
Cassandra
- Replication is peer-to-peer (no master/replica distinction). Every node can handle writes.
๐น Example Scenario
Imagine a banking app:
- Primary DB (Mumbai) โ handles all deposits/withdrawals.
-
Replicas:
- Delhi replica โ handles read queries for North India users.
- London replica โ handles reads for European users.
If the Mumbai primary crashes:
- Delhi replica is promoted to new primary.
- All writes now go to Delhi, ensuring zero downtime for customers.
โ
Summary
Replication = Copying data across servers.
High Availability = Making sure the database stays alive automatically, even if one server dies.
๐น How Automatic Failover Works
1. Health Monitoring
- Tools like Patroni (PostgreSQL), Orchestrator (MySQL), or MongoDB built-in replica sets constantly ping the primary DB.
-
They check:
- Is the DB server alive (via TCP/heartbeat)?
- Is replication up-to-date?
- Is there network partition (primary is alive but unreachable)?
๐ If the primary stops responding within a certain timeout (say 10s), itโs marked as failed.
2. Leader Election
- In a cluster, you donโt want two primaries (split-brain issue โ ๏ธ).
- A consensus system like Etcd, Consul, or Zookeeper is used.
-
Cluster members vote:
- "Primary is dead, we need a new one."
- They agree on which replica is most up-to-date.
๐ The most recent replica (with the least replication lag) becomes the new Primary.
3. Failover / Promotion
-
The chosen replica:
- Promotes itself to Primary.
- Stops being read-only.
- Starts accepting writes.
๐ In Patroni (PostgreSQL), this is done with pg_ctl promote
.
๐ In Orchestrator (MySQL), it issues RESET SLAVE ALL
+ reconfigures replication.
4. Reconfiguration
- Other replicas now start replicating from the new Primary.
- The cluster updates routing so apps know where to send writes.
- Tools like HAProxy, ProxySQL, or PgBouncer help by pointing apps to the current Primary.
5. Application Transparency
- Applications donโt have to know which DB is Primary.
- They just connect to a load balancer / proxy.
- If failover happens, the proxy redirects traffic to the new Primary.
๐ This ensures no downtime for the app.
๐น Example with Tools
-
PostgreSQL + Patroni
- Patroni uses Etcd/Consul for cluster state.
- Monitors DB health.
- On failure โ promotes replica โ updates routing.
-
MySQL + Orchestrator
- Continuously monitors replication topology.
- Detects primary failure.
- Promotes the best replica automatically.
- Updates HAProxy/ProxySQL.
-
MongoDB Replica Set
- Built-in heartbeat detection.
- Automatic election (no external tool needed).
- One secondary becomes new primary.
โ
Summary in Simple Words
Failover tools are like a traffic cop:
- Check if the main road (primary DB) is open.
- If closed, find the best alternative road (replica).
- Redirect all cars (applications) automatically.
Top comments (0)