Locking Mechanism, Serializable, and Deadlock

#database #beginners

In the post about repeatable read isolation level, I've mentioned the serialization anomaly problem. Despite having consistent read results, the writing operations can introduce unexpected behaviors or even inconsistent data states after our transaction got committed.

The last transaction isolation level is called SERIALIZABLE, and it can be used to protect our transactions from being interfered with by others when running concurrently. But before going through this isolation level, I'd like to describe the approach that MySQL uses to eliminate the interference of UPDATE statements into our read results.

Locking mechanism

First, we might need to go through the basics of shared lock and exclusive lock. Those two locks are respectively similar to read lock and write lock in the famous readers-writers problem.

Simply put, if a transaction is holding a shared lock on a row, other transactions can only acquire the shared lock on that row, but not the exclusive lock. Meanwhile, if a transaction is holding an exclusive lock on a row, then no other transactions can acquire any kind of lock on that row.

Let's continue with the familiar example of Tom's balance in the account table.

> select @@transaction_isolation;
+-------------------------+
| @@transaction_isolation |
+-------------------------+
| REPEATABLE-READ         |
+-------------------------+
1 row in set (0.00 sec)

> describe account;
+---------+-------------+------+-----+---------+-------+
| Field   | Type        | Null | Key | Default | Extra |
+---------+-------------+------+-----+---------+-------+
| owner   | varchar(50) | NO   | PRI | NULL    |       |
| balance | int         | YES  |     | NULL    |       |
+---------+-------------+------+-----+---------+-------+
2 rows in set (0.03 sec)

> select * from account;
+-------+---------+
| owner | balance |
+-------+---------+
| Mary  |     160 |
| Tom   |     300 |
+-------+---------+
2 rows in set (0.00 sec)

We are in the REPEATABLE READ isolation level. For now, let's just stick to one terminal, and start the transaction that adds $50 to Mary's balance. However, do not commit this transaction at the moment, because we are going to see what happened after we update that row.

> begin;
Query OK, 0 rows affected (0.01 sec)

> update account set balance = balance + 50 where owner = 'Mary';
Query OK, 1 row affected (0.04 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Now, let's make a SELECT statement to query what rows have been locked by our engine.

> select * from performance_schema.data_locks\G
*************************** 1. row ***************************
               ...
          OBJECT_NAME: account
               ...
            LOCK_TYPE: TABLE
            LOCK_MODE: IX
          LOCK_STATUS: GRANTED
            LOCK_DATA: NULL
*************************** 2. row ***************************
               ...
          OBJECT_NAME: account
               ...
            LOCK_TYPE: RECORD
            LOCK_MODE: X,REC_NOT_GAP
          LOCK_STATUS: GRANTED
            LOCK_DATA: 'Mary'
2 rows in set (0.00 sec)

I've added the suffix "\G" to the query to make the result is rendered in vertical format. Ignoring some irrelevant fields, I want you to notice that there are currently two locks that are active in our database.

The first lock is the TABLE lock, and it is of mode IX (which stands for intention exclusive). It is used to indicate that our account table will require a record exclusive lock. And that exclusive lock is also the second entry in the retrieved results. It is of mode X (which stands for exclusive), and the level of granularity is RECORD.

Let's commit the transaction and check the performance_schema table again.

> commit;
Query OK, 0 rows affected (0.01 sec)

> SELECT * from performance_schema.data_locks\G
Empty set (0.00 sec)

The locks are now released. From this, we can have a sense that whenever we make an update (or a write) on the data, our transaction will acquire an exclusive lock on the rows that we updated. And the locks are only released when we commit (or rollback) our transaction.

What happened if another transaction also wants to acquire the exclusive lock on that row (say, by trying to subtract $60 from Mary's balance)? The answer is: that transaction will be pending, either until timeout or the exclusive lock is released by our transaction. Feel free to verify my statement.

"But wait," you might say. "We've tried to read the row that got updated in the examples of dirty read, non-repeatable read, and phantom read. However, you said that shared lock is similar to the concept of read lock. Why we can still acquire the shared lock to read the exclusive locked records?"

Well, in fact, our transaction cannot acquire the shared lock on the records that have been locked before by the exclusive lock. What happened is the SELECT statement does not apply the shared lock (by default) to the records in the result. Our transaction only acquires the shared lock when our SELECT statement ends with two words: FOR SHARE.

Let's open two transactions on two separate connections to test what I've just said.

:: TERMINAL A
> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from account where owner = 'Tom' for share;
+-------+---------+
| owner | balance |
+-------+---------+
| Tom   |     300 |
+-------+---------+
1 row in set (0.00 sec)

> select * from performance_schema.data_locks\G
*************************** 1. row ***************************
               ...
          OBJECT_NAME: account
               ...
            LOCK_TYPE: TABLE
            LOCK_MODE: IS
          LOCK_STATUS: GRANTED
            LOCK_DATA: NULL
*************************** 2. row ***************************
               ...
          OBJECT_NAME: account
               ...
            LOCK_TYPE: RECORD
            LOCK_MODE: S,REC_NOT_GAP
          LOCK_STATUS: GRANTED
            LOCK_DATA: 'Tom'
2 rows in set (0.01 sec)

IS and S stand for intention shared and shared, respectively.

Now, we can try updating Tom's balance in another transaction and see if we can still get the result immediately.

:: TERMINAL B
mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> update account set balance = balance + 50 where owner = 'Tom';
...

It's pending. You can rollback (or commit) transaction A and check to see if the update got executed immediately.

There are other modes of lock (say, REC_NOT_GAP). But having a grasp of the X lock and S lock is enough to talk about the last isolation level.

Serializable

For your convenience, I will repeat the anomaly scenario we encountered in the previous post:

I want to subtract $50 from Tom's account if he has enough money
I open a transaction, run a SELECT statement, and realize that he has $100
At this very moment, someone opens a transaction, check and see that Tom has $100, then he subtracts $60 from Tom's balance and commits immediately
Since I've checked that Tom got $100, I proceed to subtract $50 from this account
Now, his on-disk balance value is -$50, which is an inconsistent state

The main cause of this issue is that someone can still update Tom's account after I've read it. Imagine if my SELECT statement is replaced by the SELECT FOR SHARE statement, what would happen?

After step 2, Tom's row got locked (by a shared lock hold by my transaction). Then, as long as my transaction is still in progress, I can be sure that no one can update Tom's row anymore. And we can get rid of the write interferences.

That's exactly what the SERIALIZABLE isolation level does. It implicitly replaces all SELECT statements by SELECT FOR SHARE, and we are free from serialization anomalies.

Once again, we can try to see if the transaction is SERIALIZABLE level actually acquires shared lock on normal SELECT statements (remember to rollback or commit all previous transactions to make sure that Tom's row got free).

> set session transaction isolation level serializable;
Query OK, 0 rows affected (0.01 sec)

> select @@transaction_isolation;
+-------------------------+
| @@transaction_isolation |
+-------------------------+
| SERIALIZABLE            |
+-------------------------+
1 row in set (0.00 sec)

> begin;
Query OK, 0 rows affected (0.00 sec)

> select * from account where owner = 'Tom';
+-------+---------+
| owner | balance |
+-------+---------+
| Tom   |     150 |
+-------+---------+
1 row in set (0.00 sec)

> select * from performance_schema.data_locks\G
*************************** 1. row ***************************
               ...
          OBJECT_NAME: account
               ...
            LOCK_TYPE: TABLE
            LOCK_MODE: IS
          LOCK_STATUS: GRANTED
            LOCK_DATA: NULL
*************************** 2. row ***************************
               ...
          OBJECT_NAME: account
               ...
            LOCK_TYPE: RECORD
            LOCK_MODE: S,REC_NOT_GAP
          LOCK_STATUS: GRANTED
            LOCK_DATA: 'Tom'
2 rows in set (0.00 sec)

As we can see, our transaction did acquire the shared lock on the SELECT statement. We now get rid of dirty read, non-repeatable read, phantom read, and even serialization anomaly. However, it doesn't mean that SERIALIZABLE is the best choice for all situations.

Deadlock

If you have some experience with locking before in the OS courses, or when trying to make a multithread application, you probably have heard of this issue. Whenever there is a lock, there is a chance that we might encounter deadlocks.

Imagine this scenario:

Transaction A reads Tom's account (acquire the shared lock on Tom's row)
Transaction B reads Tom's account (acquire the shared lock on Tom's row, remember shared lock is shareable)
Transaction A updated Tom's account and got pending, waiting for transaction B (since it has been locked by the shared lock of transaction B)
Transaction B updated Tom's account and got pending, waiting for transaction A (for the same reason)

Two transactions wait for each other. Neither transaction can commit or rollback, we are in a deadlock situation. Luckily, our DBMS is smart enough to detect it in advance and resolve the deadlock. Typically, some transaction needs to roll back to get rid of the locks.

You might ask: "If the SERIALIZABLE transaction is free from dirty read, non-repeatable read, phantom read, and serialization anomaly, then why it's not the default isolation level? Why do people still even use other isolation levels?"

One of the reasons I can think of is the deadlock. Once our transaction got rollback due to deadlock, it should be executed again. Having to re-execute transactions decreases our application performance significantly. Therefore, for some scenarios, where read phenomena are not likely to happen or not cause some serious issues, people can tolerate it and choose a lower isolation level to increase application performance.

It's also the end of the isolation levels series. We went through the basics of the ACID properties, four isolation levels, and how they resolve the read phenomena. I also include some examples in MySQL.

Hope you guys find this helpful and continue to explore other interesting topics of DBMS. You can also lookup the documentation of other DB systems to see how they implement the isolation levels (like PostgreSQL and Oracle). There is a slight variation but the fundamental ideas are the same.