DEV Community

Cover image for Replication & High Availability
Ahmed Raza Idrisi
Ahmed Raza Idrisi

Posted on

Replication & High Availability

Replication & High Availability

๐Ÿ”น What is Replication?

Replication is the process of copying data from one database server (the primary) to one or more other servers (replicas/secondaries).

  • The primary (or master) handles writes (INSERT, UPDATE, DELETE).
  • The replicas (or slaves, followers, standbys) receive these changes and apply them.
  • This ensures multiple copies of the same data exist across servers.

๐Ÿ‘‰ Think of it like keeping photocopies of your important notebook in multiple places โ€” if one is lost, you still have backups.


๐Ÿ”น Types of Replication

  1. Synchronous Replication
  • Writes on the primary are confirmed only after replicas also confirm theyโ€™ve written the change.
  • Ensures no data loss but can slow down performance (since it waits).
  • Example: PostgreSQL synchronous replication.
  1. Asynchronous Replication
  • Primary writes immediately return success, and replicas catch up later.
  • Faster, but risk of data loss if primary crashes before replicas sync.
  • Example: MySQLโ€™s default replication.
  1. Semi-synchronous Replication
  • A middle ground: primary waits for at least one replica to confirm before success.
  • Balance between safety & performance.

๐Ÿ”น Why Replication is Important?

  • Read Scaling: Distribute reads across replicas (read-heavy apps benefit).
  • High Availability: If primary fails, replicas can be promoted as new primary.
  • Disaster Recovery: Data is safe even if one server is lost.
  • Geographic Distribution: Users in Asia can read from an Asia replica instead of a US server.

๐Ÿ”น High Availability (HA)

HA is about keeping your database always online, even during failures. Replication is a core building block of HA, but HA adds automatic failover and monitoring.

  • If primary dies, a replica automatically becomes the new primary.
  • Clients reconnect automatically without manual intervention.
  • Requires a cluster manager / orchestrator.

๐Ÿ”น Replication & HA in Popular Databases

  • MySQL

    • Replication: Asynchronous by default (binlog-based).
    • HA: Tools like MySQL InnoDB Cluster, Orchestrator, or ProxySQL handle failover.
  • PostgreSQL

    • Built-in streaming replication.
    • HA: Tools like Patroni, PgBouncer, repmgr, or Stolon.
  • MongoDB

    • Replica sets built-in: one primary, multiple secondaries, automatic failover.
  • Cassandra

    • Replication is peer-to-peer (no master/replica distinction). Every node can handle writes.

๐Ÿ”น Example Scenario

Imagine a banking app:

  • Primary DB (Mumbai) โ†’ handles all deposits/withdrawals.
  • Replicas:

    • Delhi replica โ†’ handles read queries for North India users.
    • London replica โ†’ handles reads for European users.

If the Mumbai primary crashes:

  • Delhi replica is promoted to new primary.
  • All writes now go to Delhi, ensuring zero downtime for customers.

โœ… Summary
Replication = Copying data across servers.
High Availability = Making sure the database stays alive automatically, even if one server dies.


๐Ÿ”น How Automatic Failover Works

1. Health Monitoring

  • Tools like Patroni (PostgreSQL), Orchestrator (MySQL), or MongoDB built-in replica sets constantly ping the primary DB.
  • They check:

    • Is the DB server alive (via TCP/heartbeat)?
    • Is replication up-to-date?
    • Is there network partition (primary is alive but unreachable)?

๐Ÿ‘‰ If the primary stops responding within a certain timeout (say 10s), itโ€™s marked as failed.


2. Leader Election

  • In a cluster, you donโ€™t want two primaries (split-brain issue โš ๏ธ).
  • A consensus system like Etcd, Consul, or Zookeeper is used.
  • Cluster members vote:

    • "Primary is dead, we need a new one."
    • They agree on which replica is most up-to-date.

๐Ÿ‘‰ The most recent replica (with the least replication lag) becomes the new Primary.


3. Failover / Promotion

  • The chosen replica:

    • Promotes itself to Primary.
    • Stops being read-only.
    • Starts accepting writes.

๐Ÿ‘‰ In Patroni (PostgreSQL), this is done with pg_ctl promote.
๐Ÿ‘‰ In Orchestrator (MySQL), it issues RESET SLAVE ALL + reconfigures replication.


4. Reconfiguration

  • Other replicas now start replicating from the new Primary.
  • The cluster updates routing so apps know where to send writes.
  • Tools like HAProxy, ProxySQL, or PgBouncer help by pointing apps to the current Primary.

5. Application Transparency

  • Applications donโ€™t have to know which DB is Primary.
  • They just connect to a load balancer / proxy.
  • If failover happens, the proxy redirects traffic to the new Primary.

๐Ÿ‘‰ This ensures no downtime for the app.


๐Ÿ”น Example with Tools

  • PostgreSQL + Patroni

    • Patroni uses Etcd/Consul for cluster state.
    • Monitors DB health.
    • On failure โ†’ promotes replica โ†’ updates routing.
  • MySQL + Orchestrator

    • Continuously monitors replication topology.
    • Detects primary failure.
    • Promotes the best replica automatically.
    • Updates HAProxy/ProxySQL.
  • MongoDB Replica Set

    • Built-in heartbeat detection.
    • Automatic election (no external tool needed).
    • One secondary becomes new primary.

โœ… Summary in Simple Words
Failover tools are like a traffic cop:

  1. Check if the main road (primary DB) is open.
  2. If closed, find the best alternative road (replica).
  3. Redirect all cars (applications) automatically.

Top comments (0)