Franck Pachot for YugabyteDB

Posted on Jan 2, 2022 • Edited on Jan 11, 2022

SQL to avoid data corruption in race conditions with SERIALIZABLE 🐘 🚀

#postgres #yugabytedb #sql #database

He I start a series on how to implement an API Rate Limiter with SQL. This post is a preliminary short example about the Serializable isolation level which is a must for this use case that really starts on the next post. It is a remake of my previous post Oracle serializable is not serializable to show a basic example of SERIALIZABLE in PostgreSQL and YugabyteDB.

SQL databases, with their ACID properties, provide a huge advantage when scaling read-write workloads: they can guarantee the consistency of data even in race conditions. Simply put, without extensive code review and dedicated concurrency tests, your multi-user application will corrupt data one day. The probability is low, but will be hard to detect it, and probably too late. But SQL strong consistency solves this. SQL transaction isolation is the way to ensure that what you validated with unit tests is still consistent in race conditions. I started a poll about this (I must mention that the majority of my followers love SQL):

The SERIALIZABLE isolation level ensures that** reads and writes happen logically at the same time**, the time of commit. The DB verifies that the state that was read is still the same when writing. If it changed, the transaction fails. With the lower levels, like READ COMMITTED, the writes still happen as of commit times, but the reads can come from a previous state (the start of the transaction or statement).

PostgreSQL READ COMMITED

Here is a simple test. Create the following table in a lab:

drop table if exists franck;
create table franck(v int);
insert into franck values (42);

Connect from two terminals and start a transaction in each with the following code:

begin transaction;
--default-- set transaction isolation level read committed;
select sum(v) from franck;
\gset

This reads the sum of values into the sum variable. Your mission is to enter a compensation row, to set the total to zero (by inserting - sum). You do that on both terminals where you initiated the transaction:

insert into franck values(-:sum);
commit;
select * from franck;

This simulates two users having received the same instructions (read all values and write a compensation record) but without coordinating with each-others:

At the end you displayed the values:

The compensation happened two times. Because the read and writes do not occur at the same time: session 2 has read the sum before the session 1 wrote the new record, but session 2 writes after that, when the state that was read has been updated by session 1.

This was in PostgreSQL where the default is READ COMMITTED (be careful with that - see READ COMMITTED anomalies in PostgreSQL ) but we can set the SERIALIZABLE isolation level to have the database ensuring that reads and writes happen on the same state.

PostgreSQL SERIALIZABLE

I run the same:

drop table if exists franck;
create table franck(v int);
insert into franck values (42);

But set the SERIALIZABLE isolation level:

begin transaction;
set transaction isolation level serializable;
select sum(v) from franck;
\gset

Once read in all sessions, here are the writes:

insert into franck values(-:sum);
commit;
select * from franck;

The first one succeeds, but the second one detects the conflict and fails:

This happened because there is no possible order where each transaction read and write can happen at the same time. Of course, the reads cannot be postponed to the commit time and happened before. Then, the database ensures that what was read did not change so that the state is the same at read and commit time. If it changed, the application should catch the exception and retry the whole transaction:

ERROR:  could not serialize access due to read/write dependencies among transactions
DETAIL:  Reason code: Canceled on identification as a pivot, during write.
HINT:  The transaction might succeed if retried.

Oracle Serializable is not Serializable

I mentioned that this doesn't work in Oracle Database even when setting SERIALIZABLE because this is a wrong naming for a lower isolation level. A possible reason is an old mistake in the standard. You can test it:

exec begin execute immediate 'drop table franck'; exception when others then null; end;
create table franck(v int);
insert into franck values (42);
commit;

Run this on two sessions:

set transaction isolation level serializable;
column sum_v new_value sum;
select sum(v) sum_v from franck;

Then this on those sessions:

insert into franck values(-&sum);
commit;
select * from franck;

Data is corrupt, like in READ COMMITED:

The only possibility in Oracle to avoid this undetected conflict is by locking the table in SHARE mode. But this doesn't scale.

YugabyteDB SERIALIZABLE

Back to PostgreSQL behavior, where SERIALIZABLE allows scaling without data corruption, let's add scale-out for performance and high availability. YugabyteDB is a distributed SQL database, with postgres compatibility (re-using the PostgreSQL query layer), where you can connect to any node, in read and write, with full SQL features and ACID properties, even if those nodes are geo-distributed into different regions.

I run exactly the same as with PostgreSQL:

drop table if exists franck;
create table franck(v int);
insert into franck values (42);

In two sessions (which can be connected to different nodes):

begin transaction;
set transaction isolation level serializable;
select sum(v) from franck;
\gset

Then continuing those two transactions:

insert into franck values(-:sum);
commit;
select * from franck;

Same behavior than PostgreSQL but different message (because different implementation of the transaction layer, which is distributed here):

Here the first session has detected the conflict earlier. It could have been the second one, this is random because they use the same yb_transaction_priority_lower_bound and yb_transaction_priority_upper_bound but you can set non-overlapping bounds if you want to.

The message ERROR: Operation failed. Try again: d1bbba13-dfab-423b-b577-6baf39eb9054 Conflicts with higher priority transaction: f25928b4-df3b-435e-94b4-fd118174c97a explicitly mentions that the application should retry. Optimistic locking is used when the probability of conflict is low, and better handle retries rather than locking which would not be scalable. From the application, this error can be detected with SQLSTATE 40001

} catch(SQLException e) {
 if ( "40001".equals(e.getSQLState()) ) { // transaction conflict
  System.out.printf(
   Instant.now().toString()+" SQLSTATE %s on retry #%d %s\n",e.getSQLState(),retries,e 
 );

Some retries can be handled automatically by the driver, but only when the driver knows that there are no application side effects. Using atomic commands like single statement, plpgsql DO blocks, or stored procedures help. The above catch(SQLException e) and "40001".equals(e.getSQLState() can use a retry counter, in a loop that waits a few milliseconds.

This is the first post of a series on "Rate Limit" because I'll detail a common use-case where read-write consistency is needed.

Top comments (2)

Rafael Ponte • Jan 7 '22 • Edited

Very nice article, Franck. Thanks for sharing it!

One question, for your example, just using REAPEATBLE_READ on PostgreSQL would be sufficient, wouldn't it? I mean, or this issue is related to that we are doing an INSERT of a new row instead of an UPDATE in an existent row?

Franck Pachot YugabyteDB • Jan 7 '22

Thanks Rafael. Yes I think an update of a single row is safe in REPEATABLE READ because it reads (as of the start of query), then locks, then read again (the current version) and writes it. If no concurrent sessions have changed a column used in the where clause between the two reads, it is consistent (i.e the same as the read and writes occured at the same time). But if you read and write in different statement, or even in same update but different rows, you may write on a version different than the one that was read. SERIALIZABLE ensures that the read state didn't change until the write so that if the transaction completes read and writes logically occurs at the same time, as of the commit time