Back when I was studying Relational Database, I found that database transaction is a very interesting topic. But things soon get complicated when we need to deal with concurrent transactions. Recently I have had a chance to reread my notes about this subject. Therefore, I decided to write a couple of posts to review the knowledge. It will be a series focusing on isolation levels.
So let's start by defining what is a database transaction.
Briefly speaking, transaction is an independent unit of work. It usually consists of a list of statements that attempt to read and write data. Let's take an example, with MySQL, it will be a list of SELECTs, UPDATEs, INSERTs, DELETEs, and so on.
A classic example of a database transaction is to simulate a bank transaction. Suppose Tom wants to transfer $50 from his account to Mary's. We can demonstrate such a transaction in SQL as follows:
START TRANSACTION; UPDATE account SET balance = balance - 50 WHERE owner = 'Tom'; UPDATE account SET balance = balance + 50 WHERE owner = 'Mary'; COMMIT;
This is a very simplified transaction. In fact, there can also be a SELECT statement to read Tom's balance before making the transfer to guarantee that he has enough money to do so. The statements inside a transaction can also access and modify the data at different scales.
Anyway, the database management system (DBMS) should (and they tried to) guarantee that such transactions are ACID compliant. And by ACID compliance, I mean that it should satisfy 4 properties: Atomicity, Consistency, Isolation, and Durability.
In the scope of this overview blog post, I shall try to briefly explain those four concepts.
A transaction is called atomic when it never occurs partially. When Tom transfers $50 to Mary, either both the UPDATE statements above are executed, or none of them is executed at all. There cannot be the case when Tom's account got subtracted $50, but Mary's account does not change.
Remember that a transaction is not guaranteed to be successfully executed. If it fails due to some reason, it must undo all the previous changes (rollback) before exiting. Anything between the BEGIN and COMMIT statements should be atomic.
It's easier to define the consistency of the data. When all the necessary constraints (e.g. uniqueness, range of values, etc.) of our data are satisfied. Specifically, in the beginning, when our database is empty, it can be considered as a consistent state.
Imagine we have a special camera that can capture the data at that time of the database (and let's call it a snapshot instead of a photo). Then, someone begins a transaction, making some operations on our data, and commits it. After that, we take another snapshot. The consistency property simply means that if the initial snapshot is consistent, then the snapshot after the transaction is executed is also consistent.
Up to this point, we can view transactions as ways to carry our data from a consistent state to another consistent state.
The level of isolation measures how much concurrent transactions can interfere with each other. Ideally, two isolated transactions that are running concurrently should not affect each other. However, as we see, this is often not the case.
We know that whenever we have concurrency, we need to deal with race conditions. To a certain extent, DBMS also suffers from race conditions between transactions. Those issues are called read phenomena, and different isolation levels will try to resolve a couple of these phenomena.
This is also the focus of this series. In the next posts, we will cover 4 isolation levels, namely, READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, and SERIALIZABLE.
What does it mean durable? It means "last for a long time". Data stored in RAM is not durable, and data stored in the hard disk is more durable. Most DBMSs try to guarantee that successful transactions will survive permanently, and by survive I mean its effects on the data will never be lost, despite system crashes. This is often implemented by certain log and recovery mechanisms.
ACID, in some way, is not separated from each other. They are related and support each other. For example, one of the situations we might encounter with the SERIALIZABLE isolation level is deadlock. And when a deadlock occurs, we expect that the transaction is rolled back properly.
That's it. We will cover dirty read and two first isolation levels next time.
Top comments (0)