DEV Community

Discussion on: Isolation Levels - part IX: Read Committed

Collapse
 
rponte profile image
Rafael Ponte

Thanks for this series, Franck.

One question on those two statements:

  1. "when Oracle or YugabyteDB encounters such a situation, it can roll back the statement to an implicit savepoint and restart it to ensure a consistent result based on a more recent read time"

  2. "In such cases, if a row has been modified since it was last read, PostgreSQL will re-read the row to avoid corrupting it"

To be honest, I did not understand the differences between them. What's the difference between "rollback+restart" and "re-read"? They seem the same thing to me.

Collapse
 
franckpachot profile image
Franck Pachot

The difference is re-read the row at a newer time (but do not discard what the statement has already read before) vs rollback all to re-run the statement so that the result is consistent

Collapse
 
rponte profile image
Rafael Ponte

Thanks for the explanation, Franck.

Sorry, it is still confusing to me.

Do you have any articles with something more practical so I can understand them better?

Thread Thread
 
franckpachot profile image
Franck Pachot • Edited

Let's take an example of "re-read" and inconsistent result in PostgreSQL:

postgres=# create table demo as 
 select n, 2*(n%2)-1 flag from generate_series(1,6) n;
SELECT 6

postgres=# begin transaction;
BEGIN

postgres=*# select * from demo;

 n | flag
---+------
 1 |    1
 2 |   -1
 3 |    1
 4 |   -1
 5 |    1
 6 |   -1
(6 rows)

postgres=*# -- flip the flag for rows 3,4,5
postgres=*# update demo 
 set flag=flag-100 where n in(3);
UPDATE 1
postgres=*# update demo 
 set flag=flag+100 where n in(4,5);
UPDATE 2

postgres=*# select * from demo;
 n | flag
---+------
 1 |    1
 2 |   -1
 3 |  -99
 4 |   99
 5 |  101
 6 |   -1
(6 rows)
Enter fullscreen mode Exit fullscreen mode

Do not commit this transaction yet, and in a second session run:

postgres=# select * from demo 
 where flag>0 for update;
Enter fullscreen mode Exit fullscreen mode

It waits on the first session. Then commit the first session and then the SELECT FOR UPDATE shows:

n | flag
---+------
 1 |    1
 5 |  101
(2 rows)
Enter fullscreen mode Exit fullscreen mode

Here is what happened in PostgreSQL:

  • row 1 is read as of before the session 1 commit and verifies flag>0, then is selected
  • row 2 is read as of before the session 1 commit and does not verify flag>0, then is discarded
  • row 3 is read as of before the session 1 commit and verifies flag>0, then is selected but, because an ongoing mutation is detected, is re-read as of after the session 1 commit and the mutated value does not verify flag>0, then is discarded
  • row 4 is read as of before the session 1 commit and does not verify flag>0, then is discarded, even if the mutated value would verify the predicate
  • row 5 is read as of before the session 1 commit and verifies flag>0, then is selected but, because an ongoing mutation is detected, is re-read as of after the session 1 commit and the mutated value still verifies flag>0, then is selected, and shows the new value
  • row 6 is read as of before the session 1 commit and does not verify flag>0, then is discarded

The result is inconsistent, with rows 3 and 5 seeing the concurrent update but row 4 ignoring it. It is not isolated from the concurrent updates, breaking the I in ACID.

This is PostgreSQL behavior, with "re-read", and results in an inconsistent result, not using a single database state to evaluate the predicate. To read as-of a single database state, the result should be either (1),(3),(5) if evaluated before the other session commits, or (1),(4),(5) if evaluated after it.

Due to the use of FOR UPDATE, (1),(3),(5) is not usable as, being a stale state, it cannot be updated.

The current state that can be locked and updated is (1),(4),(5). However, as the conflict is detected during execution, the statement must be rolled back to an implicit savepoint taken just before it and restarted to execute the whole statement on the new state. I think PostgreSQL doesn't implement that because savepoints are not scalable (see PostgreSQL Subtransactions Considered Harmful). To avoid this anomaly you can use higher isolation level, get a serializable error, and implement the restart yourself.

The same code run on YugabyteDB, which is PostgreSQL-compatible but implements "rollback+restart", returns a consistent result:

yugabyte=# select * from demo where flag>0 for update;

 n | flag
---+------
 5 |  101
 1 |    1
 4 |   99
(3 rows)
Enter fullscreen mode Exit fullscreen mode

Each row evaluated its predicate and result on the state after the other session was committed, ensuring a consistent and updatable result.

Thread Thread
 
franckpachot profile image
Franck Pachot • Edited

This is described in a PostgreSQL code README

Thread Thread
 
rponte profile image
Rafael Ponte

Thanks @franckpachot !

Now, the difference is more apparent. It was very didact, you're amazing! 👊🏻