Mervyn Lee

Posted on Dec 20, 2019 • Edited on May 2, 2020

Understanding SQL Server Isolation Levels through example

#sql

After reading a lot helpful contents in dev.to, I decided to start contributing while learning and start writing! I am inspired to write my first post when I was reading a topic regarding Isolation Levels in High Performance MySQL by Baron Schwartz. It looks confusing, therefore I decided to mess around in an actual SQL server and justify my understanding. So here it is, my very first blog post. I will explain about these isolations along with SQL queries.

I found it very helpful in one of my previous project. We have a table which serves data to client through API endpoint according to different parameters and filters, and new data comes in daily. With this knowledge, I am able to be more confident to handle different scenarios including data import failure halfway, knowing if part of the new data will be exposed to the client during import and so on. I will revisit this project and write a separate article soon.

The SQL standards defines four isolation levels, with specific rules for which changes are and aren't visible inside and outside a transaction. A transaction is a group of SQL queries that are treated atomically, as a single unit of work. If the database engine can apply the entire group of queries to a database, it does so, but if any of them can't be done because of a crash or other reason, none of them is applied.

READ UNCOMMITTED

In this isolation level, transactions can view the result of uncommitted transactions. This level is rarely used in practice as the performance isn't much better than other levels, which have many advantages.

The example require two separate transactions running at the same time. I did this with two terminals logged in at the same time to have two sessions, I named it Session A and Session B. This example is executed in MYSQL 8.

This is how the sample table looks like:

Table and data generation code is at the end of this post.

SELECT * FROM student;
+----+-----------+-----------+------+
| id | firstname | lastname  | cgpa |
+----+-----------+-----------+------+
|  1 | Wilman    | Kala      |  3.5 |
|  2 | Karl      | Jablonski | 2.67 |
|  3 | Matti     | Karttunen | 3.86 |
|  4 | John      | Doe       | 1.93 |
|  5 | Will      | Smith     | 2.25 |
+----+-----------+-----------+------+

-- Session A:
> SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
> START TRANSACTION;
> SELECT * FROM student WHERE firstname = 'Will';

+----+-----------+----------+------+
| id | firstname | lastname | cgpa |
+----+-----------+----------+------+
|  5 | Will      | Smith    | 2.25 |
+----+-----------+----------+------+

-- Session B:
> START TRANSACTION;
> UPDATE student SET CGPA = 3.1 WHERE firstname = 'WILL';

--Session A:
> SELECT * FROM student WHERE firstname = 'Will';
+----+-----------+----------+------+
| id | firstname | lastname | cgpa |
+----+-----------+----------+------+
|  5 | Will      | Smith    |  3.1 |
+----+-----------+----------+------+
> COMMIT;

As you can see, the transaction in Session A is able to see the uncommitted row. This phenomena is called Dirty Read - situation when a transaction reads a data that has not yet been committed. This isolation prevents Dirty Write only. Let's finish the transaction in Session B.

-- Session B:
> COMMIT;

READ COMMITTED

The default isolation level for most database systems (but not MYSQL). A transaction will see only those changes made by transactions that were already committed when it began, and its changes won't be visible to other until it has committed.

--Session A:
> SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
> SELECT * FROM student WHERE firstname = 'Will';
+----+-----------+----------+------+
| id | firstname | lastname | cgpa |
+----+-----------+----------+------+
|  5 | Will      | Smith    |  3.1 |
+----+-----------+----------+------+

--Session B:
> START TRANSACTION;
> UPDATE student SET CGPA = 3.999 WHERE firstname = 'WILL';

--Session A:
> SELECT * FROM student WHERE firstname = 'Will';
+----+-----------+----------+------+
| id | firstname | lastname | cgpa |
+----+-----------+----------+------+
|  5 | Will      | Smith    |  3.1 |
+----+-----------+----------+------+

As you can see, when the transaction in Session B is not committed yet, the transaction in Session A is not able to see the change. Now we go back to Session B to commit the change.

--Session B:
> COMMIT;

--Session A:
SELECT * FROM student WHERE firstname = 'Will';
+----+-----------+----------+-------+
| id | firstname | lastname | cgpa  |
+----+-----------+----------+-------+
|  5 | Will      | Smith    | 3.999 |
+----+-----------+----------+-------+

Great. Now the transaction in Session A only able to see the change only after commit operation is done by transaction in Session B. However, when your transaction only want to perform some update or delete on the data that it selected in previously queries excluding new data that coming in parallel from other transaction, you are in trouble. This level still allows what's know as a Non Repeatable Read - occurs when a transaction reads same row twice, and get a different value each time.

Confuse? Let me show you an example.

--Session A:
> START TRANSACTION;
> SELECT * FROM student;
+----+-----------+-----------+-------+
| id | firstname | lastname  | cgpa  |
+----+-----------+-----------+-------+
|  1 | Wilman    | Kala      |   3.5 |
|  2 | Karl      | Jablonski |  2.67 |
|  3 | Matti     | Karttunen |  3.86 |
|  4 | John      | Doe       |  1.93 |
|  5 | Will      | Smith     | 3.999 |
+----+-----------+-----------+-------+

--Session B:
> START TRANSACTION;
> INSERT INTO student (firstname, lastname, cgpa) VALUES ('Will', 'Smack', 0.98);
> COMMIT;

--Session A:
> SELECT * FROM student;
+----+-----------+-----------+-------+
| id | firstname | lastname  | cgpa  |
+----+-----------+-----------+-------+
|  1 | Wilman    | Kala      |   3.5 |
|  2 | Karl      | Jablonski |  2.67 |
|  3 | Matti     | Karttunen |  3.86 |
|  4 | John      | Doe       |  1.93 |
|  5 | Will      | Smith     | 3.999 |
|  6 | Will      | Smack     |  0.98 |
+----+-----------+-----------+-------+
> COMMIT;

Simply put, when new data is inserted, modified or deleted from commited transaction in Session B, the transaction in Session A is able to see the new change. To solve that, let's move on to another level.

REPEATABLE READ

It solved the problems that READ COMMITTED allows. It guarantees that any row a transaction reads will "look the same" in subsequent reads within the same transaction.

--Session A:
> START TRANSACTION;
> SELECT * FROM student;
+----+-----------+-----------+-------+
| id | firstname | lastname  | cgpa  |
+----+-----------+-----------+-------+
|  1 | Wilman    | Kala      |   3.5 |
|  2 | Karl      | Jablonski |  2.67 |
|  3 | Matti     | Karttunen |  3.86 |
|  4 | John      | Doe       |  1.93 |
|  5 | Will      | Smith     | 3.999 |
|  6 | Will      | Smack     |  0.98 |
+----+-----------+-----------+-------+

--Session B:
> START TRANSACTION;
> INSERT INTO student (firstname, lastname, cgpa) VALUES ('Will', 'Win', 3.414);
> COMMIT;

--Session A:
> SELECT * FROM student;
+----+-----------+-----------+-------+
| id | firstname | lastname  | cgpa  |
+----+-----------+-----------+-------+
|  1 | Wilman    | Kala      |   3.5 |
|  2 | Karl      | Jablonski |  2.67 |
|  3 | Matti     | Karttunen |  3.86 |
|  4 | John      | Doe       |  1.93 |
|  5 | Will      | Smith     | 3.999 |
|  6 | Will      | Smack     |  0.98 |
+----+-----------+-----------+-------+

As you can see, even though the transaction that performs insertion in Session B is committed, the transaction in Session A is still looking at the same data it retrieved previously. To see the change, end the transaction in Session A by commit and start a new transaction to perform the query again.

> COMMIT;
> SELECT * FROM student;
+----+-----------+-----------+-------+
| id | firstname | lastname  | cgpa  |
+----+-----------+-----------+-------+
|  1 | Wilman    | Kala      |   3.5 |
|  2 | Karl      | Jablonski |  2.67 |
|  3 | Matti     | Karttunen |  3.86 |
|  4 | John      | Doe       |  1.93 |
|  5 | Will      | Smith     | 3.999 |
|  6 | Will      | Smack     |  0.98 |
|  7 | Will      | Win       | 3.414 |
+----+-----------+-----------+-------+

Woohoo! Perfect? Sadly, it still allows another tricky problem: phantom reads. Simply put, a phantom read can happen when you select some range of rows using WHERE clause, another transaction inserts a new row into the range, and then you select the same range again; you will then see the new "phantom" row. Let's see the phantom by ourselves!

--Session A:
> START TRANSACTION;
> SELECT * FROM student WHERE firstname='Wilman';
+----+-----------+----------+------+
| id | firstname | lastname | cgpa |
+----+-----------+----------+------+
|  1 | Wilman    | Kala     |  3.5 |
+----+-----------+----------+------+

--Session B:
> START TRANSACTION;
> INSERT INTO student (firstname, lastname, cgpa) VALUES ('Wilman', 'Jackson', 3.98);
> COMMIT;

SELECT * FROM student WHERE firstname='Wilman';
+----+-----------+----------+------+
| id | firstname | lastname | cgpa |
+----+-----------+----------+------+
|  1 | Wilman    | Kala     |  3.5 |
|  8 | Wilman    | Jackson  | 3.98 |
> COMMIT;

So, the rule of thumb - A SELECT statement without WHERE clause will able to be REPEATABLE READ in this isolation level, but not when WHERE clause is around. To solve this, we will need to move to the 4th level.

SERIALIZABLE

The highest level of isolation, SERIALIZABLE, solves the phantom read problem by forcing transactions to be ordered so that they can't possibly conflict. In short, it places a lock on every row it reads. At this level, a lot of timeouts and lock contention can occur. It trades off concurrency in favour of the data stability that results. Here is an example when your transaction will wait for another transaction...

--Session A:
> SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
> START TRANSACTION;
> SELECT * FROM student WHERE firstname='Wilman';
+----+-----------+----------+------+
| id | firstname | lastname | cgpa |
+----+-----------+----------+------+
|  1 | Wilman    | Kala     |  3.5 |
| 10 | Wilman    | Jackson  | 3.98 |
+----+-----------+----------+------+

--Session B:
> START TRANSACTION;
> INSERT INTO student (firstname, lastname, cgpa) VALUES ('Wilman', 'Mars', 2.777);

After you fire the INSERT command, it will just stuck there...
Don't worry. Go back to our Session A and commit the transaction.

--Session A:
> COMMIT;

After the completion of transaction in Session A, Session B is finally able to complete INSERT operation.

I hope you enjoy the hands on example! Table below summaries the various isolation levels and the drawbacks associated with each one.

Isolation Level	Dirty Reads possible	Non Repeatable Reads possible	Phantom Reads possible	Locking reads
READ UNCOMMITTED	Yes	Yes	Yes	Yes
READ COMMITTED	No	Yes	Yes	Yes
REPEATABLE READ	No	No	Yes	Yes
SERIALIZABLE	No	No	No	Yes

Table and data generation code

I use Docker to run SQL server in a container using image in the Docker Hub. Skip this step if you don't feel to use Docker.

> docker run --name mysql -e MYSQL_ROOT_PASSWORD=your_password -d mysql:tag
> docker inspect mysql #Identify the IP address
> docker run -it --rm mysql mysql -h172.17.0.2 -uroot -p

CREATE TABLE student (
id INT(6) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
firstname VARCHAR(30) NOT NULL,
lastname VARCHAR(30) NOT NULL,
cgpa FLOAT(50)
);

INSERT INTO student (firstname, lastname, cgpa) VALUES ('Wilman', 'Kala', 3.5);
INSERT INTO student (firstname, lastname, cgpa) VALUES ('Karl', 'Jablonski', 2.67);
INSERT INTO student (firstname, lastname, cgpa) VALUES ('Matti', 'Karttunen', 3.86);
INSERT INTO student (firstname, lastname, cgpa) VALUES ('John', 'Doe', 1.93);
INSERT INTO student (firstname, lastname, cgpa) VALUES ('Will', 'Smith', 2.25);