CAP Theorem Explained: What I Learned Setting Up MySQL Replication

Gauransh Gupta — Fri, 08 May 2026 22:18:16 +0000

I recently started my journey into system design and data-intensive applications. This experience has completely changed how I look at software — especially around the hard tradeoffs of consistency, availability, and what happens when things go wrong at scale.

Let me start with a scenario that surprised me when I first encountered it:

Imagine you just inserted a record in your app. A millisecond later, a read on a different server returns the old data. How? Welcome to distributed systems.

What is CAP Theorem?

CAP stands for Consistency, Availability, and Partition Tolerance. The theorem states that a distributed system can only guarantee 2 out of these 3 properties at any given time.

Consistency — Every read returns the most recent write, or an error. All nodes in the system see the same data at the same time.

Availability — Every request gets a response (not an error), even if some nodes are down. The system is always up.

Partition Tolerance — The system keeps running even when network communication between nodes is lost or delayed.

The Real Choice: CP vs AP

Here is the most important thing to understand: you cannot actually drop Partition Tolerance.

Network partitions are not a design choice — they are a reality. Hardware fails, packets drop, data centers lose connectivity. Any system that runs across multiple machines will eventually face a partition.

So the real decision every distributed system makes is: when a partition happens, do you stay consistent or stay available?

A Simple Analogy: The Bank Branch

Imagine a bank with its HQ in New York and a branch in Boston. The branch always calls HQ before confirming any transaction.

A customer in Boston deposits $500. Right at that moment, the network between Boston and New York goes down.

The Boston branch now has two options:

Option 1 — CP (Consistency + Partition Tolerance):
The branch refuses to process any transactions until it can reach HQ. The data stays consistent, but the service is unavailable during the outage. If you walk up to the counter, you get turned away.

Option 2 — AP (Availability + Partition Tolerance):
The branch processes the deposit locally and syncs with HQ once the network recovers. The service stays available, but if someone checks the account balance at the New York branch before the sync, they see the old amount — temporarily inconsistent data.

Neither option is wrong. The right choice depends entirely on what your application can tolerate.

Hands-On: MySQL Replication as an AP System

To make this concrete, I built a small master-slave replication setup using Docker to see these tradeoffs in action.

You can find the full setup in this repository.

How it works:

One MySQL container acts as the master — it accepts all writes
A second container acts as the slave — it monitors the master and replays every change
The master records every write to a binary log (binlog)
The slave runs two threads: an IO thread that pulls the binlog from the master, and a SQL thread that replays those events locally

What I observed:

When I inserted rows on the master, they showed up on the slave within milliseconds under normal conditions. But when I inserted 10,000 rows rapidly, I could watch the Seconds_Behind_Master value in SHOW SLAVE STATUS grow and then shrink back to zero as the slave caught up.

That gap — replication lag — is the tradeoff in action. During those seconds, a read on the slave returns stale data. The slave is always available to serve reads, but it cannot guarantee it has the latest write.

This makes the setup an AP system: Available and Partition Tolerant, but not strongly Consistent.

If you wanted CP behaviour, you would switch to synchronous replication — the master waits for the slave to confirm every write before acknowledging it to the client. You get consistency, but every write now pays the latency cost of that round-trip.

When to Choose CP vs AP

Prefer CP when a wrong answer is worse than no answer:

Bank balances and financial transactions
Seat reservation systems (cinema, flights)
Distributed locks and coordination

Prefer AP when brief staleness is acceptable:

Social media feeds
Product catalogs and shopping carts
DNS resolution
Caching layers

Beyond CAP: Consistency is a Spectrum

CAP theorem is a useful mental model, but it is a simplification. In practice, consistency is not binary — there is a whole spectrum:

Linearizability — the strongest guarantee; reads always reflect the latest write
Sequential consistency — operations appear in the same order across all nodes, but may lag
Eventual consistency — all nodes will converge to the same value, given enough time (this is what MySQL async replication gives you)

If you want to go deeper, Designing Data-Intensive Applications by Martin Kleppmann covers replication in Chapter 5 and consistency models in Chapter 9. It is genuinely the best resource I have found on this topic.

Conclusion

CAP is not a theorem you apply once and move on — it is a lens for thinking about tradeoffs. Every distributed system you design makes a CAP choice, whether you make it explicitly or not. Understanding that choice, and being able to justify it, is what separates a system that works in production from one that only works in theory.

DEV Community: Gauransh Gupta