Discussion on: Isolation Levels - part IX: Read Committed

View post

Thanks for this series, Franck.

One question on those two statements:

"when Oracle or YugabyteDB encounters such a situation, it can roll back the statement to an implicit savepoint and restart it to ensure a consistent result based on a more recent read time"
"In such cases, if a row has been modified since it was last read, PostgreSQL will re-read the row to avoid corrupting it"

To be honest, I did not understand the differences between them. What's the difference between "rollback+restart" and "re-read"? They seem the same thing to me.

Franck Pachot • Dec 27 '23

The difference is re-read the row at a newer time (but do not discard what the statement has already read before) vs rollback all to re-run the statement so that the result is consistent

Rafael Ponte • Dec 27 '23

Thanks for the explanation, Franck.

Sorry, it is still confusing to me.

Do you have any articles with something more practical so I can understand them better?

Franck Pachot • Jan 3 • Edited

Let's take an example of "re-read" and inconsistent result in PostgreSQL:

postgres=# create table demo as 
 select n, 2*(n%2)-1 flag from generate_series(1,6) n;
SELECT 6

postgres=# begin transaction;
BEGIN

postgres=*# select * from demo;

 n | flag
---+------
 1 |    1
 2 |   -1
 3 |    1
 4 |   -1
 5 |    1
 6 |   -1
(6 rows)

postgres=*# -- flip the flag for rows 3,4,5
postgres=*# update demo 
 set flag=flag-100 where n in(3);
UPDATE 1
postgres=*# update demo 
 set flag=flag+100 where n in(4,5);
UPDATE 2

postgres=*# select * from demo;
 n | flag
---+------
 1 |    1
 2 |   -1
 3 |  -99
 4 |   99
 5 |  101
 6 |   -1
(6 rows)

Do not commit this transaction yet, and in a second session run:

postgres=# select * from demo 
 where flag>0 for update;

It waits on the first session. Then commit the first session and then the SELECT FOR UPDATE shows:

n | flag
---+------
 1 |    1
 5 |  101
(2 rows)

Here is what happened in PostgreSQL:

row 1 is read as of before the session 1 commit and verifies flag>0, then is selected
row 2 is read as of before the session 1 commit and does not verify flag>0, then is discarded
row 3 is read as of before the session 1 commit and verifies flag>0, then is selected but, because an ongoing mutation is detected, is re-read as of after the session 1 commit and the mutated value does not verify flag>0, then is discarded
row 4 is read as of before the session 1 commit and does not verify flag>0, then is discarded, even if the mutated value would verify the predicate
row 5 is read as of before the session 1 commit and verifies flag>0, then is selected but, because an ongoing mutation is detected, is re-read as of after the session 1 commit and the mutated value still verifies flag>0, then is selected, and shows the new value
row 6 is read as of before the session 1 commit and does not verify flag>0, then is discarded

The result is inconsistent, with rows 3 and 5 seeing the concurrent update but row 4 ignoring it. It is not isolated from the concurrent updates, breaking the I in ACID.

This is PostgreSQL behavior, with "re-read", and results in an inconsistent result, not using a single database state to evaluate the predicate. To read as-of a single database state, the result should be either (1),(3),(5) if evaluated before the other session commits, or (1),(4),(5) if evaluated after it.

Due to the use of FOR UPDATE, (1),(3),(5) is not usable as, being a stale state, it cannot be updated.

The current state that can be locked and updated is (1),(4),(5). However, as the conflict is detected during execution, the statement must be rolled back to an implicit savepoint taken just before it and restarted to execute the whole statement on the new state. I think PostgreSQL doesn't implement that because savepoints are not scalable (see PostgreSQL Subtransactions Considered Harmful). To avoid this anomaly you can use higher isolation level, get a serializable error, and implement the restart yourself.

The same code run on YugabyteDB, which is PostgreSQL-compatible but implements "rollback+restart", returns a consistent result:

yugabyte=# select * from demo where flag>0 for update;

 n | flag
---+------
 5 |  101
 1 |    1
 4 |   99
(3 rows)

Each row evaluated its predicate and result on the state after the other session was committed, ensuring a consistent and updatable result.

Franck Pachot • Jan 3 • Edited

This is described in a PostgreSQL code README

Rafael Ponte • Jan 3

Thanks @franckpachot !

Now, the difference is more apparent. It was very didact, you're amazing! 👊🏻