Understanding CAP theorem

After having long session with my study group I am collecting major points here.

Notes from discussion:

NoSQL Landscape
- key-value store:
  
  Regular: memcached, Redis, Flare, AmazonSimpleDB, Keyspace...
  
  Eventual consistency: Amazon Dynamo
- column-oriented stores:
  
  GoogleBigTable, HBase, Cassandra, HyperTable, QBase
- Document Databases:
  
  CouchDB, MongoDB, ...
- Object Datastores:
  
  Amazon S3
- Graph Databases:
  
  Neo4j, VertexDB, Filament ...
link for blog post:

https://bardoloi.com/blog/2017/03/06/cap-theorem/#:~:text=In%20other%20words%2C%20the%20CAP,C%20or%20A%20by%20design.&text=there%20can%20be%20several%20levels%20of%20consistency
From the point of view of data intensive applications

Consistency:

All clients have same view of data at same time
Note actually ACID transactions in db

Partition-tolerance:

System continues to operate despite network partitions.
No single point of failure
Infinite scale out

Availability:

Each client can always read and write
Total redundancy
What does a traditional CAP theorem says?

A distributed system can satisfy any two of these guarantee at same time but not all

1) There is some risk of data becoming unavailable. [MongoDB, Hbase, Neo4j, Google BigTable] - CP

2) Network problem might stop system. Eg: RDBMS, Orable db, Mysql,.. - CA

3) Client may read inconsistent data .[ Like eventual consistency in Cassandra Db] - AP

BUT !!

Can we really compromise the partition tolerance?

The future of databases is primarily based on distribution. (Big Data trends ..)

A proper understanding of CAP theorem is essential to making decisions about the future of distributed database design. Misunderstanding can lead to inappropriate design choices.

Though the ACID (atomicity, Consistency, Isolation, Durability) properties of Relation Databases are powerful and we would like to have them in our database. But its unfortunately impossible to have availability , consistency & partition tolerance all in at once. .... Well really? 🤔

Popular misconception is 2 or of 3.

An important observation is that in larger distributed-scale systems, network partitions are a given; therefore, consistency and availability cannot be achieved at the same time

12 Years later: Prof Eric - father of CAP theorem

 The 2 of 3 formula was always misleading because it tends to oversimplify the tensions among properties. Its not a 0 or 1 game.

CAP prohibits only a tiny part of design space: perfect availability and consistency in presence of partitions, which are rare.

availability is inverse of latency / latency is inverse of availability

Proposition: Make partition tolerance a must to have and dissolve rigidity in terms of consistency and availability.

Consistency

Types of Consistency
- Strong Consistency
  
  After update competes, any subsequent access will return same updated value.
- Weak Consistency
  
  Its is not guaranteed that subsequent accesses will return the updated value.
- Eventual Consistency
  
  Specific form of weak consistency. 0 → 1
  
  Its guaranteed that if no new updates are made to object, eventually all accesses will return the last updated value (eg. propagate updates to replicas in a lazy fashion)

Eventual Consistency:

Tweets feed
- Lots of users are refreshing latest tweets. But its okay to show them stale data for some time as eventually all of them will be viewing latest tweets.
Offline money withdraw allowed from ATM (with limit)
- Though banks transaction need to be highly consistent. But when you have to perform an ACID transaction with only single node you are talking to (for the instance ).

Dynamic Consistency: