Apologies for not writing the article over the past two weeks; I was occupied with the preparations for my sister's wedding. We will be releasing articles in this series every Sunday, as we usually do.
Why is Atomicity Important?
Imagine you are a database administrator, and your banking database does not guarantee that a transaction completes atomically. It may fail in the middle. For example, consider transferring amount X from account A to account B. The transaction initiates, money is debited from account A, and then the transaction fails due to some system crash. The amount X is completely vanished from the economy. This is not a consistent state at all. You are soon going to be fired for choosing such a database that does not ensure atomicity.
To handle the above scenario, there should be a rollback mechanism where amount X is deposited back into account A.
WAL and Its Limitation
To guarantee atomicity and durability, all changes can be logged as a Write-Ahead Log (WAL). The logs are stored on disk before the write is done in the datastore. This is called WAL. Logically speaking, these WAL logs contain the data object identifier being changed, the identifier of the transaction making the change, and the old and new values of the identified data object. During a rollback, this information is sufficient to roll back one or several transactions and undo the changes consistently.
However, the WAL-based rollback mechanism only guarantees atomicity within a single datastore.
2PC
2PC, or Two-Phase Commit, is a protocol to implement atomic transaction commits across multiple processes.
To implement 2PC, all processes are either called coordinators or participants. A coordinator process may be the one that starts the transaction.
To commit a change, the coordinator sends a prepare request to all participants, and all participants reply if they are prepared. In case any participant is not prepared, the coordinator sends an abort message to all participants. If all participants are prepared, the coordinator sends a commit message to all.
In this protocol, there are two points of no return. If, after sending the prepare message, the coordinator crashes, the participants will keep waiting for the coordinator to send either a commit or abort message. Another scenario occurs when the coordinator finalizes whether to commit or abort, and if one of the participants crashes, the coordinator will keep retrying to send the message to the participant.
Since there are two round trips in 2PC, it is slow, and we have also discussed two failure scenarios. These are the drawbacks of 2PC.
2PC can be refined. For example, replications can be done on coordinators, participants, or both (e.g., using Raft).
NewSQL
In the late 2000s, NoSQL databases emerged, prioritizing scalability over traditional database guarantees like ACID transactions. Over time, distributed data stores evolved to incorporate these traditional features, leading to NewSQL systems. One prominent example is Google Spanner, which combines scalability with strong transactional guarantees.
How Spanner Works?
Data Partitioning & Replication:
- Data is partitioned into key-value pairs and replicated across multiple nodes using the Paxos protocol for fault tolerance.
- A leader node in each replication group manages writes, locks, and isolation using 2PL (Two-Phase Locking). You can read about 2PL in my previous article - 2PL
Handling Multi-Partition Transactions:
- Spanner uses 2PC (Two-Phase Commit) to handle transactions across partitions.
- Coordinators and participants log transaction states in replicated logs to recover from node failures.
Consistency & Isolation:
- Spanner employs MVCC (Multi-Version Concurrency Control) for consistent snapshots during read-only transactions.
- Write transactions use 2PL to ensure strict serializability.
Timestamp Management:
- Transactions are assigned unique timestamps, but in distributed systems, synchronized clocks are a challenge.
- Spanner resolves this by using uncertainty intervals for timestamps and ensuring transactions wait for these intervals to resolve before committing.
- To minimize uncertainty, Spanner deploys GPS and atomic clocks in data centers for precise synchronization.
Other systems, like CockroachDB, take inspiration from Spanner but use hybrid-logical clocks instead of physical clock synchronization.
In the example figure below, there are three partitions and three replicas per partition.
Conclusion
Atomicity is a cornerstone of reliable database systems, ensuring that transactions are completed fully or not at all. Techniques like WAL and protocols like 2PC address atomicity challenges, each with their limitations. Advanced systems like Spanner and NewSQL databases build on these principles to provide strong transactional guarantees, scalability, and fault tolerance in distributed environments.
Here are links to my previous posts which I publish every Sunday on distributed systems:
- Building Resilient Applications: Insights into Scalability and Distributed Systems
- Understanding Server Connections in Distributed Systems
- How are your connections with web secure and integral?
- Understanding System Models in Distributed system
- Ping & Heartbeat in distributed systems
- How Real-Time Editing Works: Understanding Event Ordering in Distributed Systems
- High Availability for Social Media Platforms: Leader-Follower Architecture and Leader Election
- ACID Properties in Single Transactions Explained
- How is Concurrency Control Maintained to Ensure Isolation?
Feel free to check them out and share your thoughts!
Top comments (0)