CAP Theorem in Distributed Systems : Beyond the ‘Pick Two’ Myth

#systemdesign #distributedsystems #architecture #softwareengineering

When most of us first encounter the CAP theorem, it’s introduced with a catchy phrase: “Consistency, Availability, Partition Tolerance - pick two.”

It’s a neat way to remember it, but like most neat sayings, it oversimplifies the reality. In truth, CAP is about understanding trade offs in distributed systems, and appreciating its nuances makes you a better engineer.
In this post, we’ll go beyond the surface level explanation and see how it applies to the systems we work with, using a relatable example: Bank ATMs

The ATM Analogy

An ATM network is a real world example of a distributed system. Here:

The bank’s central server acts as the source of truth.
Each ATM machine is a node in the system.

Together, they form a distributed system where your account balance should remain consistent across all ATMs. Now, imagine you walk up to an ATM to deposit or withdraw cash.

In the ideal case, the ATM successfully communicates with the bank’s server.
Your transaction goes through, and the update is reflected across all ATMs in the network.

This is a scenario with no partition — the nodes (ATMs) are fully connected to the server, and both consistency and availability is naturally maintained.

What is CAP Theorem?

The CAP theorem is a fundamental principle in distributed systems that tells us about the trade offs we must make when our system is partitioned. To understand it properly, we need to start with the most important piece:

What is a Partition?

Imagine you have multiple servers that should stay in sync. If a network failure prevents one server from talking to another, the system is said to be "partitioned".

This is not a rare edge case because in large scale distributed systems, partitions are inevitable due to:

Network failures
Geographic distance
Latency spike

What it means in ATM terms?

A partition happens when an ATM loses connection to the bank’s central server.

The ATM can no longer verify your latest account balance in real time.
Any deposits or withdrawals made at this ATM won’t immediately sync with the rest of the system.

Effectively, this ATM is "cut off" from the rest of the distributed network.

Since partitions are unavoidable, we need our system to have "Partition Tolerance". So, the real world trade off is always between Consistency (C) and Availability (A).

Partition Tolerance means the system continues to operate despite partitioning due to network failures.

CP - Consistency & Partition Tolerance

In a CP system, the design choice is to guarantee Consistency and Partition Tolerance even if it means sacrificing Availability during a partition.

ATM Analogy for CP

Imagine you walk up to an ATM during a partition:

The ATM tries to connect to the central bank server, but connection fails (partition), the ATM will refuse your transaction. Here, the ATM deliberately compromises "Availability" for "Consistency."

Why? Because letting you withdraw without verifying your account risks double withdrawals or invalid balances.

AP - Availability & Partition Tolerance

In an AP system, the design guarantees Availability and Partition Tolerance even if that means sacrificing Consistency during a partition.

ATM Analogy for AP

Imagine you walk up to an ATM during a partition:

The ATM tries to connect to the central bank server but the connection fails (partition). Instead of refusing your transaction, the ATM still lets you withdraw cash. Here, the ATM deliberately compromises "Consistency" in order to preserve "Availability".

Why? Because the system chooses to stay operational, even if it means your withdrawal might not immediately reflect in your account balance risking temporary inconsistencies like negative balances.

Beyond the Simplistic “Pick Two” View

The CAP theorem is often oversimplified to guarantee only two out of "Consistency, Availability, and Partition Tolerance".

Real systems don’t just fall neatly into CP or AP categories. They often sit somewhere in between, offering "partial consistency" or "partial availability" depending on the situation.

What is Partial Consistency?

Consistency doesn’t always mean perfectly up to date or completely wrong. Many systems aim for eventual consistency, where data may be temporarily stale but eventually converges across nodes when partition is resolved.

What is Partial Availability?

Availability isn’t just about a system being fully online or completely offline. In many cases, systems aim for partial availability — they keep some features working while restricting others during a partition. This allows users to get at least something instead of facing a total outage.

Back to our ATM analogy:

Even if the ATM is partitioned, it might still allow deposits. After a deposit, your balance may look incorrect for a short while, but it eventually gets updated once the connection is restored. In this case, the ATM is partially available because it continues to provide some functionality. At the same time, it is also partially consistent because it deliberately restricts withdrawals to avoid issues like negative balances.

Conclusion

The CAP theorem isn’t about picking two properties and discarding the third. It’s about understanding how distributed systems behave under failure and making deliberate trade offs. In practice, systems rarely offer absolute consistency or availability. Instead, they operate in a combination of both — tolerating temporary inconsistency to preserve availability, while restricting risky operations during partitions to maintain safety.