Rajkiran

Posted on May 31

System Design - 6.CAP Theorem & PACELC, CAP Theorem & PACELC: The Most Important Trade-off in Distributed Systems

#architecture #database #distributedsystems #systemdesign

The Theorem That Changed How We Think About Databases
In 2000, Eric Brewer stood at a conference and proposed a conjecture that would reshape distributed systems forever:

"You can only guarantee two of these three properties at the same time: Consistency, Availability, and Partition Tolerance."

Two years later, Seth Gilbert and Nancy Lynch proved it mathematically. It became known as the CAP Theorem — and every distributed system architect since has had to wrestle with it.
It sounds abstract. But once you understand it, you'll never look at a database choice the same way again. You'll understand why Amazon DynamoDB and Google Spanner make opposite architectural choices. You'll know why your bank uses PostgreSQL while Twitter uses Cassandra.
Let's break it down from first principles.

The Three Properties
C — Consistency
Every read receives the most recent write, or an error. There's only one version of the truth — all nodes agree.

Not the same consistency as ACID. CAP consistency (linearizability) means every read reflects the latest write across all nodes. ACID consistency means transactions don't violate database constraints. Different concepts, same confusing word.

A — Availability
Every request receives a non-error response — though it might not be the most recent data. The system is always up and answering.

Note: "Available" in CAP doesn't mean "fast." It means "responds without error." A system that always returns a (possibly stale) answer is Available.

P — Partition Tolerance
The system continues operating even when network messages between nodes are lost or delayed. A partition is when part of your distributed system can't communicate with another part.

The Unavoidable Truth: P Is Not Optional
Here's the insight that makes CAP actually useful:
In any real distributed system, partitions will happen. Networks fail. Cables get cut. Data centers lose connectivity. AWS regions go down.
Since you must tolerate partitions (or have a single-server system, which doesn't scale), the real choice is:
When a partition happens, do you choose:
→ Consistency (refuse to respond rather than return stale data)?
→ Availability (respond with potentially stale data rather than error)?
CAP is really CP vs AP.

CP Systems: Consistency + Partition Tolerance
A CP system chooses to return an error (or block) during a partition rather than return potentially inconsistent data.
The behavior:
Node A and Node B are partitioned (can't communicate).

User writes to Node A: balance = $500
User reads from Node B:
→ CP system: "I can't confirm this is current. Returning error."
→ User gets error, but data is never wrong.
Real CP systems:
HBase — used for large-scale analytics (Facebook messages, LinkedIn activity data). During a partition, HBase refuses reads/writes to maintain consistency. Brief unavailability is acceptable because data correctness is critical.
ZooKeeper — the distributed coordination service used by Kafka, Hadoop, and many others for leader election and configuration management. If ZooKeeper can't reach a quorum of nodes, it stops responding. This is intentional — a misconfigured cluster is worse than a temporarily unavailable one.
MongoDB in strong mode — with writeConcern: majority, MongoDB behaves as a CP system. Writes must be acknowledged by a majority of nodes before succeeding. If nodes can't reach quorum, writes fail.
When to choose CP:

Financial transactions (cannot show wrong balance)
Inventory systems (cannot oversell the last seat)
Distributed locks (cannot have two nodes think they hold the same lock)
Any system where incorrect data is worse than no data

AP Systems: Availability + Partition Tolerance
An AP system chooses to always respond — even if the data might be stale — during a partition.
The behavior:
Node A and Node B are partitioned (can't communicate).

User writes to Node A: tweet posted
User reads from Node B:
→ AP system: "I'll return what I have." Returns feed without latest tweet.
→ User gets a response. Data might be slightly out of date.

Real AP systems:
Cassandra — designed from the ground up for AP. During a partition, Cassandra keeps serving reads and writes from available nodes. Data eventually syncs when the partition heals. Facebook built their Inbox search on Cassandra. Instagram's feed uses it. The brief staleness is acceptable because showing a tweet 2 seconds late is better than a 500 error.
CouchDB — designed for offline-first applications. Phones with intermittent connectivity can still read and write. When connectivity resumes, changes sync and conflicts are resolved.
DynamoDB (default mode) — Amazon's flagship NoSQL service defaults to eventual consistency. Writes succeed as long as one node is available. This is by design — Amazon's shopping cart research showed that availability matters more than momentary consistency for e-commerce.

When to choose AP:

Social media feeds (slightly stale is fine)
Shopping cart contents (show something rather than error)
DNS lookups (stale IP for a short time is OK)
Any "best effort" data where brief staleness is acceptable

The Amazon Shopping Cart Study (A Case Study in AP)
Amazon's 2007 Dynamo paper describes a fascinating design decision:
They observed that when customers added items to their cart, the worst experience was an error ("could not add to cart — try again"). The second worst experience was the cart appearing empty.
Temporary inconsistency — adding an item and briefly not seeing it — was actually less bad than an error from the user's perspective.
So DynamoDB was built as an AP system. When two versions of your cart conflict (you added on desktop while your phone was offline), Dynamo keeps both versions and presents them to the application to merge. That's why you sometimes see duplicate items in your Amazon cart — it's the conflict merge surfacing.

_The insight: _ Availability vs Consistency isn't a technical preference. It's a business decision. What does your user experience most when things go wrong?

PACELC: The Extension CAP Forgot
CAP only describes what happens during a partition. But partitions are rare events. What about the other 99.9% of the time, when the network is healthy?
In 2012, Daniel Abadi proposed PACELC to fill this gap:
If Partition → choose between Availability (A) or Consistency (C)
Else (no partition) → choose between Latency (L) or Consistency (C)

Written as: PAC/ELC
The insight: even without a partition, you face a trade-off. To guarantee consistency without a partition, you still need coordination — nodes must agree before returning. That coordination costs latency.

PACELC classifications of real systems:
Reading the table:

1. PA/EL systems (DynamoDB, Cassandra): sacrifice consistency for availability and speed in all situations
2. PC/EC systems (HBase, Spanner): maintain consistency in all situations, paying in latency

How to "Cheat" CAP with Read Replicas
Here's a popular interview question: "Can you have both strong consistency and high availability?"
Technically no, per CAP. But in practice, you can get very close with a common architecture:
Read replicas with routing logic:
Write path: User → Primary DB (strongly consistent write)
Read path: User → Read from Primary (always consistent)
OR
User → Read from Replica (faster, eventually consistent)
You let the application decide per operation:

Payment confirmation: read from Primary → strong consistency
User profile display: read from Replica → eventual consistency, faster

PostgreSQL, MySQL, and most relational databases support this pattern. The application routes sensitive reads to the primary and non-sensitive reads to replicas.

This isn't "beating" CAP — you're still making the CP choice for critical reads (by always hitting the primary). But for non-critical reads you get near-AP performance. It's a practical hybrid.

Choosing CP or AP for an E-Commerce Cart

This is a classic interview scenario. Here's how to think through it:
The question: "Should the shopping cart be CP or AP?"
Wrong answer: "CP, because we need consistency."
Right answer (think through the user scenarios):

Scenario A: User adds item, gets error due to partition
→ User is frustrated, might leave the site
→ Business cost: potential lost sale

Scenario B: User adds item, briefly doesn't see it (AP staleness)
→ User might be confused but the item is actually saved
→ Business cost: minor confusion

Scenario C: User adds item, sees different items on refresh (inconsistency)
→ More confusing, but still recoverable
→ Business cost: confusion, possible support ticket
Amazon's conclusion: AP wins for cart. The cost of availability failures (errors) is higher than the cost of consistency failures (brief staleness).
But: The checkout step must be CP. When you confirm a purchase and deduct inventory, that must be consistent. You can't sell the last item twice.

The pattern: Use AP for browsing and cart management. Use CP for checkout and payment. Different consistency requirements for different operations within the same system.

Why PACELC Matters More Than CAP for Modern Design
CAP is conceptually important but PACELC is more actionable because:

Partitions are rare. Your system spends 99.9% of its time in the "no partition" state. The ELC part (latency vs consistency) is your everyday trade-off.
CAP doesn't tell you about latency. Two CP systems can have wildly different latencies. Spanner achieves global strong consistency in ~10ms. A naive 2-phase commit might take 500ms. PACELC captures this.
PACELC drives database selection more precisely. "I need PA/EL" → Cassandra or DynamoDB. "I need PC/EC" → Spanner or HBase. CAP alone doesn't give you this resolution.

Key Takeaways

CAP theorem: in a distributed system, you can only guarantee 2 of 3 — Consistency, Availability, Partition Tolerance.
P is mandatory in real distributed systems. The real choice is always CP vs AP.
CP systems (HBase, ZooKeeper, Spanner) refuse to answer during partitions rather than return stale data.
AP systems (Cassandra, DynamoDB, CouchDB) always respond, even if data might be stale.
PACELC extends CAP: even without a partition, you trade latency (L) vs consistency (C). Most systems are PA/EL or PC/EC.
The right choice is a business decision, not a technical preference. What hurts your users more — wrong data or no data?
You can build practical hybrids: AP for browsing, CP for payment. Route reads to primary for strong consistency, to replicas for speed.

DEV Community

System Design - 6.CAP Theorem & PACELC, CAP Theorem & PACELC: The Most Important Trade-off in Distributed Systems

Top comments (0)