CAP Theorem

#distributedsystems #webdev #programming

In this short article I will talk about an important topic in computer science, the CAP theorem. It is also known as Brewer's theorem and it's a fundamental principle in distributed data store systems, but first, what's a distributive system?

Distributive System

In a few words, a system is considered to be a distributive system when it has multiple computers operating over a network, called nodes, and for the end-user it appears to be just one after all.

What's CAP Theorem?

The theorem states that it's impossible for a distributed data store to simultaneously provide more than two of the following three pillars: Consistency, Availability, and Partition Tolerance (CAP).

Importance

This is indeed important because modern applications rely on distributed systems to be scalable and resilient, with tolerance to faults. Given that these nodes communicate over a network, failures are inevitable, and the CAP theorem provides trade-offs that we can make when a failure happens.

The Three Pillars Explained

C - Consistency: Guarantees that every read operation receives the most recent write or an error. When data is written to one node it will be replicated to the others, ensuring that whenever a read operation is done in any node it will be consistent.

A - Availability: This means that every request receives a response, even if one or more nodes are failing, but it doens't guarantee that the last write will be returned.

P - Partition Tolerance: This is the ability of the system to continue operating, even when there is a network partition. This pillar is a must have since node communications due to newtork failures can happen at any time, so we need to decide in which way we will bend to when having a partition.

How to act in a network partition

Since network partitions are inevitable in a distributed system, the theorem forces a crucial choice: when a partition happens, do you sacrifice Consistency or Availability? This leads to two main system design patterns.

CP - Consistency & Partition Tolerance

A CP system chooses to prioritize consistency over availability when a partition happens.

If a partition happens, a request to a partitioned node will be blocked or return an error until the network communication is restored and the data can be fully synchronized between the nodes. This ensures that no client ever sees an outdated data and this behavior is chosen in banking systems, financial ledgers, e-commerce inventory management.

AP - Availability & Partition Tolerance

An AP system chooses to prioritize availability over consistency when a partition happens.

Even if nodes can't communicate with each other, every client request will still get a response. This response, however, might contain outdated data since the node handling the request may not have received the latest write from another part of the system. These systems often aim for eventual consistency, where the data across all nodes will be synchronized once the partition is resolved.

This is ideal for systems where uptime and responsiveness are more important than having perfectly up-to-date data at all times, for example social media feeds, content delivery networks (CDNs), and databases like Amazon's DynamoDB or Cassandra, which are designed for high availability. MongoDB can also be configured for AP by allowing reads from secondary nodes when the primary is unavailable.

Conclusion

I hope you enjoyed the quick read and if you have something to add to this topic please leave a comment!

DEV Community