justin gage for Retool

Posted on Mar 25, 2020

Isolation levels and locking in relational databases

#sql #database #architecture

“We need to build this for scale!” yells your manager, in a fury. You think back to a simpler time: when your application only had a few users. When your transactions executed sequentially, and you didn’t need to deal with things like Hot Locks and stalemates. Time to hire a DBA.

Running a database would be easy if your app only had one user, or if your users did one thing after another, never concurrently and never in contention. In reality though, you want to get more people using what you’ve built – your business needs to grow. Increased user volume tends to mean a lot of concurrent transactions, and all those requests bring us into the realm of database locking and isolation, a topic with little publicly available information (until you know what to search for!) and an almost mystic air to it. This post will hopefully make things simpler.

Why database locking and isolation exist

A modern web application might need to handle hundreds of thousands of transactions daily, and the biggest ones deal with orders of magnitude more than that. That kind of scale can create a bunch of problems with data integrity, starting with the big 3 as defined by the SQL standard:

Dirty Reads

If a transaction is in the middle of updating some data and hasn’t committed yet, and another transaction is allowed to read that uncommitted data, that’s called a dirty read, and could lead to your app showing incorrect data that got rolled back.

An example of a dirty read could be a transaction that invalidates login tokens when a user changes their password. If as the first transaction loads the token, a second one reads that token before the first invalidates it, you’d have yourself a dirty read.

In terms of actual SQL, here's what a dirty read might look like:

### Transaction 1 ###

SELECT user_login_token_id
FROM tokens
WHERE ...

UPDATE tokens
SET token_status = "INVALID"
WHERE ...

### Transaction 2 ###

SELECT user_login_token_id
FROM tokens
WHERE ...

If the second SELECT statement was allowed to read the user_login_token_id before the first transaction invalidated it, the read would be dirty: it's reading stale data.

Non-Repeatable Reads

If you’ve got two consecutive reads in one transaction with a concurrent update in between, those reads are going to show different results even though they’re part of the same transaction.

An example might be two writers working on a blog. Our first user starts a transaction that reads a post’s title, writes to the post, and then reads that post’s title again. If a second user changes that post’s title in the middle of the first user’s transaction, the first user is going to see different values for the title across the two reads; or in other words, a non-repeatable read.

Here's what a non-repeatable read might look like in SQL:

### Transaction 1 ###

SELECT post_title
FROM posts

SELECT
    post_title,
    post_content
FROM posts

### Transaction 2 ###

UPDATE posts
SET post_title = "something_new"
WHERE post_title = post_title

Phantom Reads

If a transaction reads data and then a concurrent transaction inserts data that would have been read in the original transaction, that’s a phantom read.

Let’s use the same example as a non-repeatable read: if our second user adds content in between our first user’s two reads, the first read will be missing data that appears in the second read (this is actually really similar to a non-repeatable read, which is why the same example works).

In SQL, a phantom read might look like this:

### Transaction 1 ###

SELECT post_title
FROM posts

SELECT
    post_title,
    post_content
FROM posts

### Transaction 2 ###

INSERT INTO posts
VALUES "something_new", ...

To avoid all of this nastiness, most SQL databases follow a set of principles called ACID that prioritizes transactional integrity. We wrote all about ACID and concurrency in databases here, but the TL;DR is that databases use locks that prevent data from being read or changed while a transaction is making use of it.

How database locking works

So you’ve got your database — let’s say it’s a Postgres database — up and running, and you start a transaction. How does locking actually happen? Let’s take a look at two concurrent transactions and see how locking impacts them:

Transaction 1 (T1) wants to read a user’s password and change it to a new one (SELECT and UPDATE)
Transaction 2 (T2) wants to read that user’s password (SELECT)

Think of the leftmost column as a time axis. When a transaction acquires a lock, other transactions basically won’t be able to interact with the locked data. Here’s how these concurrent transactions will play together:

Time	Transaction 1	Transaction 2
1	Selects row #1, and acquires a lock on row #1
2		Tries to select row #1, but blocked by T1’s lock
3	Updates row #1 and commits transaction
4		Selects row #1

Notice that Transaction 1 is made up of two queries — a SELECT and an UPDATE — and even though the lock is acquired on that first query, it _remains _until the end of the entire transaction when the transaction gets committed.

The most common lock is on the row level, like in our example; but locks are maintained in database memory, so having too many of them active at any point can be computationally prohibitive. This means if your DBMS (database management system) is running out of memory, it might escalate a lock to a higher level to free up space. For example, if one database table has 40 different locks active on 40 different rows, the DBMS might escalate the lock to the table level so it can continue blocking while using less memory.

Locks can exist across larger sections of your data, and even on an entire database. System level updates might acquire a database lock, which would block all transactions on any part of a database. That kind of lock is pretty uncommon, but table level locks are used pretty often (like our escalation example above), and page or file locks (this means different things across different database setups) are also not infrequent.

Levels of database isolation

We’ve been assuming that a lock means that no other transactions can use the locked data at all. But that’s not entirely true; in fact, if it was, your database would basically never work at even low levels of concurrency. Databases deal in different isolation levels to help avoid lock craziness, and there are four (4) major ones:

READ_UNCOMMITTED

This is the lowest level of isolation, and does almost nothing. It means that transactions can read data being worked with by other transactions, even if the changes aren’t committed yet. READ_UNCOMMITTED doesn’t avoid any of the three read problems that we outlined above. Let’s go back to our two transaction example and see what would happen:

Time	Transaction 1	Transaction 2
1	Selects row #1
2		Selects row #1 (even though it’s currently being used by T1)
3	Updates row #1 and commits transaction

Transaction 1 never acquires a lock when it starts, so Transaction 2 is able to slide in there and execute its SELECT query. In this example, things turned out fine: but if T2 did its SELECT query after the update from T1, but T1 got rolled back (aborted), we’d have a dirty read on our hands.

READ_COMMITTED

The next level of isolation is READ_COMMITTED, which adds a little locking into the equation to avoid dirty reads. In READ_COMMITTED, transactions can only read data once writes have been committed. Let’s use our two transactions, but change up the order a bit: T2 is going to read data after T1 has written to it, but then T1 gets rolled back (for some reason).

Time	Transaction 1	Transaction 2
1	Selects row #1
2	Updates row #1, acquires lock
3		Tries to select row #1, but blocked by T1’s lock
4	Rolls back transaction
5		Selects row #1

READ_COMMITTED helps avoid a dirty read here: if T2 was allowed to read row #1 at Time 3, that read would be invalid; T1 ended up getting rolled back, so the data that T2 read was actually wrong. Because of the lock acquired at Time 2 (thanks READ_COMMITTED!), everything works smoothly and T2 waits to execute its SELECT query.

REPEATABLE_READ

The most popular isolation level is REPEATABLE_READ, which (surprise!) helps avoid non-repeatable reads. This iso level locks data from the start of the entire transaction – including on SELECT queries – as opposed to the start of the query, which adds an additional layer of isolation.

If you recall from above, a non-repeatable read happens when one transaction has two SELECT queries, and another transaction is able to insert or update a row that makes those two SELECT queries show different results. REPEATABLE_READ isolation avoids that by letting the first transaction acquire a lock when it starts the first SELECT query:

Time	Transaction 1	Transaction 2
1	Selects row #1, acquires lock on row #1
2		Tries to update row #1, but is blocked by T1’s lock
3	Selects row #1, commits transaction
4		Updates row #1

If you’re really clever, you might notice that REPEATABLE_READ still doesn’t solve for phantom reads: Transaction 1 only acquired a lock on the row it was working with (Row #1), so Transaction 2 is still able to insert a row that might show up in T1’s second SELECT query. The only way to avoid that is with a higher lock level than a row, which is our next level of isolation.

SERIALIZABLE

This is the highest possible level of isolation, and is a bit difficult to define: in practice, SERIALIZABLE means that from the user’s perspective, transactions appear as if they’re executing sequentially, not concurrently. In other words, isolation is so intense that transactions will only execute concurrently if the end result is the same as them executing sequentially.

SERIALIZABLE also uses range locks to avoid phantom reads. Range locks are a mechanism that’s in between locking a row and locking a table: if you’re running a SELECT query with a WHERE clause, range locks will lock some of the rows that exist close to your selected rows (before and after).

Wikipedia has a pretty great summary chart for wrapping up how these isolation levels help avoid the SQL standard read problems:

You can think of isolation levels as lock settings: they determine when locks get acquired and released, and how intense that process is. Most modern SQL databases use REPEATABLE_READ as the default isolation level, but it’s a setting that you can change internally if you need (even on a transaction level).

Problems you’ll run into with database locks

If you’ve been following along, you’ve probably gathered that locks solve a lot of problems, but they also block data access; and that might not always work out well for your application’s user. There are three big ones to worry about:

Lock Contention

If a bunch of user sessions all require access to the same lock concurrently, they’re not going to be able to retrieve the data they need. A good example here is anything that gets incremented on activity, like a total monthly orders count in a metrics table. If every user order needs to increment that row, the constant locking is going to block your user requests. The cool kids also call this one a Hot Lock.

Long Term Blocking

Long term lock issues happen when you have some sort of transaction or job that requires a lock for a long period of time. If you’re running a batch job that needs locking access to a table or row for a while (think: 15 minutes), and that table or row is also heavily used by smaller transactions, they’re going to get stuck waiting while the job finishes.

Database Deadlocks

A deadlock happens when two transactions both need to wait for each other to commit...to commit. This might sound kind of funky, so let’s use our classic two transaction example and see what can go wrong:

Time	Transaction 1	Transaction 2
1	Selects row #1, acquires lock on row #1
2		Selects row #2, acquires lock on row #2
3	Tries to select row #2, blocked by T2’s lock
4		Tries to select row #1, blocked by T1

Your DBMS will usually resolve this stalemate by rolling back whichever transaction is easiest to roll back, but this is also a setting you can tweak when configuring your database. Methods and Tools wrote a great paper about this entire “locking issues” topic and how it impacts QA if you want to go deeper.

If you’re building your app on top of a SQL database, check out Retool! Retool saves you a bunch of time and hassle by letting you drag and drop components like tables, search boxes, and buttons, and connecting them to SQL or Javascript on the backend. Retool also protects you against bad writes (no accidental DROP TABLE;) through a GUI based write interface.

Top comments (1)

Michael Currin • Mar 25 '21

Thanks for the article. Well-explained.

Some recommended reading from Postgres docs.

Transactions tutorial - syntax for BEGIN UPDATE ... COMMIT.
Transaction isolation levels - Dirty reads, etc.
Explicit locking - On table locks, row locks and more.

Markdown correction:

 -it _remains _until the
+ it _remains_ until the

Language tip:

-GUI based
+GUI-based