After having long session with my study group I am collecting major points here.
Notes from discussion:
-
NoSQL Landscape
-
key-value store:
Regular: memcached, Redis, Flare, AmazonSimpleDB, Keyspace...
Eventual consistency: Amazon Dynamo
-
column-oriented stores:
GoogleBigTable, HBase, Cassandra, HyperTable, QBase
-
Document Databases:
CouchDB, MongoDB, ...
-
Object Datastores:
Amazon S3
-
Graph Databases:
Neo4j, VertexDB, Filament ...
-
-
link for blog post:
From the point of view of data intensive applications
Consistency:
- All clients have same view of data at same time
- Note actually ACID transactions in db
Partition-tolerance:
- System continues to operate despite network partitions.
- No single point of failure
- Infinite scale out
Availability:
- Each client can always read and write
- Total redundancy
-
What does a traditional CAP theorem says?
A distributed system can satisfy any two of these guarantee at same time but not all
1) There is some risk of data becoming unavailable. [MongoDB, Hbase, Neo4j, Google BigTable] - CP
2) Network problem might stop system. Eg: RDBMS, Orable db, Mysql,.. - CA
3) Client may read inconsistent data .[ Like eventual consistency in Cassandra Db] - AP
BUT !!
Can we really compromise the partition tolerance?
The future of databases is primarily based on distribution. (Big Data trends ..)
A proper understanding of CAP theorem is essential to making decisions about the future of distributed database design. Misunderstanding can lead to inappropriate design choices.
Though the ACID (atomicity, Consistency, Isolation, Durability) properties of Relation Databases are powerful and we would like to have them in our database. But its unfortunately impossible to have availability , consistency & partition tolerance all in at once. .... Well really? 🤔
Popular misconception is 2 or of 3.
An important observation is that in larger distributed-scale systems, network partitions are a given; therefore, consistency and availability cannot be achieved at the same time
- 12 Years later: Prof Eric - father of CAP theorem
The 2 of 3 formula was always misleading because it tends to oversimplify the tensions among properties. Its not a 0 or 1 game.
CAP prohibits only a tiny part of design space: perfect availability and consistency in presence of partitions, which are rare.
availability is inverse of latency / latency is inverse of availability
Proposition: Make partition tolerance a must to have and dissolve rigidity in terms of consistency and availability.
Consistency
-
Types of Consistency
-
Strong Consistency
After update competes, any subsequent access will return same updated value.
-
Weak Consistency
Its is not guaranteed that subsequent accesses will return the updated value.
-
Eventual Consistency
Specific form of weak consistency. 0 → 1
Its guaranteed that if no new updates are made to object, eventually all accesses will return the last updated value (eg. propagate updates to replicas in a lazy fashion)
-
Eventual Consistency:
- Tweets feed
- Lots of users are refreshing latest tweets. But its okay to show them stale data for some time as eventually all of them will be viewing latest tweets.
- Offline money withdraw allowed from ATM (with limit)
- Though banks transaction need to be highly consistent. But when you have to perform an ACID transaction with only single node you are talking to (for the instance ).
Dynamic Consistency:
- Airlines Booking
Partition
- Data partitioning
- shopping cart data, product data, user data, etc
- Operational partitioning
- customer query, billing desk
- Services/Functional partitioning
- Microservices, DNS, distributed Lock ,
- User Partitioning
- regions / availability zones, geographic areas
Availability
-
High availability = less latency
- By replicating data
unavailable things have extreme high latency
-
link for blog post:
https://www.allthingsdistributed.com/2008/12/eventually_consistent.html
-
Resources
💡 Notes
Last Reviewed : 30 Oct, 7am
Top comments (0)