I'm a Brazilian software engineer with 19+ years of experience building distributed systems with "boring" techs like Java, Spring, Hibernate, SQL, and relational databases.
"when Oracle or YugabyteDB encounters such a situation, it can roll back the statement to an implicit savepoint and restart it to ensure a consistent result based on a more recent read time"
"In such cases, if a row has been modified since it was last read, PostgreSQL will re-read the row to avoid corrupting it"
To be honest, I did not understand the differences between them. What's the difference between "rollback+restart" and "re-read"? They seem the same thing to me.
The difference is re-read the row at a newer time (but do not discard what the statement has already read before) vs rollback all to re-run the statement so that the result is consistent
I'm a Brazilian software engineer with 19+ years of experience building distributed systems with "boring" techs like Java, Spring, Hibernate, SQL, and relational databases.
Let's take an example of "re-read" and inconsistent result in PostgreSQL:
postgres=#createtabledemoasselectn,2*(n%2)-1flagfromgenerate_series(1,6)n;SELECT6postgres=#begintransaction;BEGINpostgres=*#select*fromdemo;n|flag---+------1|12|-13|14|-15|16|-1(6rows)postgres=*#-- flip the flag for rows 3,4,5postgres=*#updatedemosetflag=flag-100wherenin(3);UPDATE1postgres=*#updatedemosetflag=flag+100wherenin(4,5);UPDATE2postgres=*#select*fromdemo;n|flag---+------1|12|-13|-994|995|1016|-1(6rows)
Do not commit this transaction yet, and in a second session run:
postgres=#select*fromdemowhereflag>0forupdate;
It waits on the first session. Then commit the first session and then the SELECT FOR UPDATE shows:
n|flag---+------1|15|101(2rows)
Here is what happened in PostgreSQL:
row 1 is read as of before the session 1 commit and verifies flag>0, then is selected
row 2 is read as of before the session 1 commit and does not verify flag>0, then is discarded
row 3 is read as of before the session 1 commit and verifies flag>0, then is selected but, because an ongoing mutation is detected, is re-read as of after the session 1 commit and the mutated value does not verify flag>0, then is discarded
row 4 is read as of before the session 1 commit and does not verify flag>0, then is discarded, even if the mutated value would verify the predicate
row 5 is read as of before the session 1 commit and verifies flag>0, then is selected but, because an ongoing mutation is detected, is re-read as of after the session 1 commit and the mutated value still verifies flag>0, then is selected, and shows the new value
row 6 is read as of before the session 1 commit and does not verify flag>0, then is discarded
The result is inconsistent, with rows 3 and 5 seeing the concurrent update but row 4 ignoring it. It is not isolated from the concurrent updates, breaking the I in ACID.
This is PostgreSQL behavior, with "re-read", and results in an inconsistent result, not using a single database state to evaluate the predicate. To read as-of a single database state, the result should be either (1),(3),(5) if evaluated before the other session commits, or (1),(4),(5) if evaluated after it.
Due to the use of FOR UPDATE, (1),(3),(5) is not usable as, being a stale state, it cannot be updated.
The current state that can be locked and updated is (1),(4),(5). However, as the conflict is detected during execution, the statement must be rolled back to an implicit savepoint taken just before it and restarted to execute the whole statement on the new state. I think PostgreSQL doesn't implement that because savepoints are not scalable (see PostgreSQL Subtransactions Considered Harmful). To avoid this anomaly you can use higher isolation level, get a serializable error, and implement the restart yourself.
The same code run on YugabyteDB, which is PostgreSQL-compatible but implements "rollback+restart", returns a consistent result:
I'm a Brazilian software engineer with 19+ years of experience building distributed systems with "boring" techs like Java, Spring, Hibernate, SQL, and relational databases.
Thanks for this series, Franck.
One question on those two statements:
"when Oracle or YugabyteDB encounters such a situation, it can roll back the statement to an implicit savepoint and restart it to ensure a consistent result based on a more recent read time"
"In such cases, if a row has been modified since it was last read, PostgreSQL will re-read the row to avoid corrupting it"
To be honest, I did not understand the differences between them. What's the difference between "rollback+restart" and "re-read"? They seem the same thing to me.
The difference is re-read the row at a newer time (but do not discard what the statement has already read before) vs rollback all to re-run the statement so that the result is consistent
Thanks for the explanation, Franck.
Sorry, it is still confusing to me.
Do you have any articles with something more practical so I can understand them better?
Let's take an example of "re-read" and inconsistent result in PostgreSQL:
Do not commit this transaction yet, and in a second session run:
It waits on the first session. Then
commit
the first session and then the SELECT FOR UPDATE shows:Here is what happened in PostgreSQL:
flag>0
, then is selectedflag>0
, then is discardedflag>0
, then is selected but, because an ongoing mutation is detected, is re-read as of after the session 1 commit and the mutated value does not verifyflag>0
, then is discardedflag>0
, then is discarded, even if the mutated value would verify the predicateflag>0
, then is selected but, because an ongoing mutation is detected, is re-read as of after the session 1 commit and the mutated value still verifiesflag>0
, then is selected, and shows the new valueflag>0
, then is discardedThe result is inconsistent, with rows 3 and 5 seeing the concurrent update but row 4 ignoring it. It is not isolated from the concurrent updates, breaking the I in ACID.
This is PostgreSQL behavior, with "re-read", and results in an inconsistent result, not using a single database state to evaluate the predicate. To read as-of a single database state, the result should be either
(1),(3),(5)
if evaluated before the other session commits, or(1),(4),(5)
if evaluated after it.Due to the use of FOR UPDATE,
(1),(3),(5)
is not usable as, being a stale state, it cannot be updated.The current state that can be locked and updated is
(1),(4),(5)
. However, as the conflict is detected during execution, the statement must be rolled back to an implicit savepoint taken just before it and restarted to execute the whole statement on the new state. I think PostgreSQL doesn't implement that because savepoints are not scalable (see PostgreSQL Subtransactions Considered Harmful). To avoid this anomaly you can use higher isolation level, get a serializable error, and implement the restart yourself.The same code run on YugabyteDB, which is PostgreSQL-compatible but implements "rollback+restart", returns a consistent result:
Each row evaluated its predicate and result on the state after the other session was committed, ensuring a consistent and updatable result.
This is described in a PostgreSQL code README
Thanks @franckpachot !
Now, the difference is more apparent. It was very didact, you're amazing! 👊🏻