Ahmed Raza Idrisi

Posted on Sep 20

Replication & High Availability

#webdev #programming #database

Replication & High Availability

🔹 What is Replication?

Replication is the process of copying data from one database server (the primary) to one or more other servers (replicas/secondaries).

The primary (or master) handles writes (INSERT, UPDATE, DELETE).
The replicas (or slaves, followers, standbys) receive these changes and apply them.
This ensures multiple copies of the same data exist across servers.

👉 Think of it like keeping photocopies of your important notebook in multiple places — if one is lost, you still have backups.

🔹 Types of Replication

Synchronous Replication

Writes on the primary are confirmed only after replicas also confirm they’ve written the change.
Ensures no data loss but can slow down performance (since it waits).
Example: PostgreSQL synchronous replication.

Asynchronous Replication

Primary writes immediately return success, and replicas catch up later.
Faster, but risk of data loss if primary crashes before replicas sync.
Example: MySQL’s default replication.

Semi-synchronous Replication

A middle ground: primary waits for at least one replica to confirm before success.
Balance between safety & performance.

🔹 Why Replication is Important?

Read Scaling: Distribute reads across replicas (read-heavy apps benefit).
High Availability: If primary fails, replicas can be promoted as new primary.
Disaster Recovery: Data is safe even if one server is lost.
Geographic Distribution: Users in Asia can read from an Asia replica instead of a US server.

🔹 High Availability (HA)

HA is about keeping your database always online, even during failures. Replication is a core building block of HA, but HA adds automatic failover and monitoring.

If primary dies, a replica automatically becomes the new primary.
Clients reconnect automatically without manual intervention.
Requires a cluster manager / orchestrator.

🔹 Replication & HA in Popular Databases

MySQL
- Replication: Asynchronous by default (binlog-based).
- HA: Tools like MySQL InnoDB Cluster, Orchestrator, or ProxySQL handle failover.
PostgreSQL
- Built-in streaming replication.
- HA: Tools like Patroni, PgBouncer, repmgr, or Stolon.
MongoDB
- Replica sets built-in: one primary, multiple secondaries, automatic failover.
Cassandra
- Replication is peer-to-peer (no master/replica distinction). Every node can handle writes.

🔹 Example Scenario

Imagine a banking app:

Primary DB (Mumbai) → handles all deposits/withdrawals.
Replicas:
- Delhi replica → handles read queries for North India users.
- London replica → handles reads for European users.

If the Mumbai primary crashes:

Delhi replica is promoted to new primary.
All writes now go to Delhi, ensuring zero downtime for customers.

✅ Summary
Replication = Copying data across servers.
High Availability = Making sure the database stays alive automatically, even if one server dies.

🔹 How Automatic Failover Works

1. Health Monitoring

Tools like Patroni (PostgreSQL), Orchestrator (MySQL), or MongoDB built-in replica sets constantly ping the primary DB.
They check:
- Is the DB server alive (via TCP/heartbeat)?
- Is replication up-to-date?
- Is there network partition (primary is alive but unreachable)?

👉 If the primary stops responding within a certain timeout (say 10s), it’s marked as failed.

2. Leader Election

In a cluster, you don’t want two primaries (split-brain issue ⚠️).
A consensus system like Etcd, Consul, or Zookeeper is used.
Cluster members vote:
- "Primary is dead, we need a new one."
- They agree on which replica is most up-to-date.

👉 The most recent replica (with the least replication lag) becomes the new Primary.

3. Failover / Promotion

The chosen replica:
- Promotes itself to Primary.
- Stops being read-only.
- Starts accepting writes.

👉 In Patroni (PostgreSQL), this is done with pg_ctl promote.
👉 In Orchestrator (MySQL), it issues RESET SLAVE ALL + reconfigures replication.

4. Reconfiguration

Other replicas now start replicating from the new Primary.
The cluster updates routing so apps know where to send writes.
Tools like HAProxy, ProxySQL, or PgBouncer help by pointing apps to the current Primary.

5. Application Transparency

Applications don’t have to know which DB is Primary.
They just connect to a load balancer / proxy.
If failover happens, the proxy redirects traffic to the new Primary.

👉 This ensures no downtime for the app.

🔹 Example with Tools

PostgreSQL + Patroni
- Patroni uses Etcd/Consul for cluster state.
- Monitors DB health.
- On failure → promotes replica → updates routing.
MySQL + Orchestrator
- Continuously monitors replication topology.
- Detects primary failure.
- Promotes the best replica automatically.
- Updates HAProxy/ProxySQL.
MongoDB Replica Set
- Built-in heartbeat detection.
- Automatic election (no external tool needed).
- One secondary becomes new primary.

✅ Summary in Simple Words
Failover tools are like a traffic cop:

Check if the main road (primary DB) is open.
If closed, find the best alternative road (replica).
Redirect all cars (applications) automatically.

DEV Community

Replication & High Availability

Replication & High Availability

🔹 What is Replication?

🔹 Types of Replication

🔹 Why Replication is Important?

🔹 High Availability (HA)

🔹 Replication & HA in Popular Databases

🔹 Example Scenario

🔹 How Automatic Failover Works

1. Health Monitoring

2. Leader Election

3. Failover / Promotion

4. Reconfiguration

5. Application Transparency

🔹 Example with Tools

Top comments (0)