System Design (6 Part Series)
CAP theorem stands for Consistency, Availability, and Partition tolerance. CAP theorem also know as Brewer's theorem states that it is impossible for any distributed database system to provide all three of the following properties together, that is to say, a distributed database can only provide almost any two of the following three characteristics: Consistency, Availability, and *Partition-Tolerance. * With the advances in parallel processing and distributed systems, it is more common to expand horizontally or have more machines, and the CAP theorem is the backbone of such architecture. Let's understand the characteristic of the CAP theorem in detail.
A consistent system is one in which all nodes see the same data at the same time. In other words, if we perform read operations after multiple write operations then a consistent system should return the same value for all the read operations and that too the value of the most recent write operation. Note that consistency, as defined in the CAP theorem, is quite different from the consistency guaranteed in ACID database transactions.
A highly available distributed system is one that remains operational 100% of the time. Hence, every request made should be accepted and receives a (non-error) response. Note it should not necessarly that the response contains the most recent write value (i.e. the system needs not to be consistent but it should be available all the time).
It states that a system should continue to run even if the connection between nodes delays or breaks. Note, this doesn't mean a node/nodes goes down, nodes are up, but can’t communicate. Let's say that we a two nodes N1 & N2 and both are connected. Now assume that the network connecting both the nodes goes down (network gets partitioned), but both nodes N1 and N2 are up and running fine but the updates happening at node N1 can no longer reach node N2 and vice-versa Partition tolerance is more of a necessity than an option in modern distributed systems, hence we can't we cannot avoid “P” of CAP. So. we have to choose one among consistency & availability either CP or AP systems.
Data is consistent among all the nodes and the nodes maintain partition tolerance. hence the de-sync node will not accept any request i.e. some request will be dropped instead of returning inconsistent (most recent) information. Let's say we have two nodes and one of the nodes (let's say N2) stops servicing read/write requests on detecting that network connecting the system got partitioned. This means that system treats node N2 as not available anymore, and any request on this node will be rejected as it will end up returning the old/stale copy of the data. Hence, this kind of system provides consistency & Partition Tolerance.
Example - Google BigTable, Hbase, MongoDB, MemcacheDB, Redis
The distributed system is highly available and the system is partition tolerant, hence every read request will not guarantee the most recent information, but every request will be processed.
Example - Voldemort, SimpleDB, CouchDB
This is not possible in any distributed architecture. It can be found only in some non-scalable monolithic architecture.
Google Docs & GMail enforces consistency at the cost of availability hence the result you will see across the devices will be same/consistent.
But Google Search, twitter & youtube focuses on availability and relaxes consistency hence, the results you see will depend on the state of the server that responds to your request, and different servers can have inconsistent states. This is why, sometimes on youtube or twitter, you see inconsistent results in terms of likes or views.
This post was originally published at nlogn. I highly recommend you visit the website.
Posted on May 19 by: