SQL isolation levels mainly affect reads, not writes. However, it's important to consider the isolation level when preventing dirty writes, as it determines the order of reads and write intents with lock wait.
Dirty reads and dirty writes are terms that were originally used in the context of locking. They refer to situations where a transaction reads or writes data that has been modified by another transaction that has not yet been committed. To prevent these scenarios, data that is being read is locked in a shared mode, and data that is being modified is locked in an exclusive mode. This approach allows concurrent transactions to access and modify data without interfering with each other's changes, as originally intended with the concept of locking.
When a transaction reads or writes, locks are acquired during execution. Write locks must remain acquired until the end of the transaction (commit or rollback). For read locks, isolation levels were introduced to define their duration and avoid too many blocking situations at the price of some anomalies in complex transactions with dependencies between their reads and writes.
Multi-Version Concurrency Control (MVCC) is a technique used to prevent read locks and readers from blocking writers. By implementing MVCC, the read-time is shifted to the beginning of a statement (for Read Committed) or the beginning of a transaction (for Repeatable Read or Serializable). However, this technique also introduces another effect on writes - the state may change between the read-time and the write-time, due to other transactions.
It's possible to read data from the past, but writing can only happen on the current state of the database. It's crucial to keep in mind that writing should be done on the commit state, and locking ensures consistency between the actual write and the commit state. When using optimistic locking, conflicts may occur, and the database will detect them, requiring the transaction to be retried. The newer MVCC read-time will have resolved the conflict.
Another technique permits retrying only the statement and not the whole transaction. This can be used only in Read Committed, where the MVCC read-time doesn't need to match the beginning of the transaction and can be reset to a newer one.
When attempting to retry only a single statement, it's necessary to acquire a savepoint before each statement. This is done implicitly by Oracle and YugabyteDB, which makes the read restart transparent to the application.
However, PostgreSQL does not use implicit savepoints, as they are quite expensive and cannot transparently rollback and restart the statement execution. Instead, PostgreSQL changes the read point without restarting the entire statement to avoid errors, even if it sacrifices consistency (see READ COMMITTED anomalies in PostgreSQL).
This blog post specifically applies to databases that provide consistent "Read Committed" transactions through read restarts. It shows an example with YugabyteDB, which is PostgreSQL-compatible but with true Read Committed consistency.
For example, let's say I create a table, and a transaction starts. It reads for id=123
, finds no rows, and decides to insert one with value=101
:
-- session 1
yugabyte=# create table demo (id bigint, primary key(id asc), value int);
CREATE TABLE
yugabyte=# begin transaction isolation level read committed;
BEGIN
yugabyte=*# select * from demo where id=123;
id | value
----+-------
(0 rows)
yugabyte=*# insert into demo values (123, 101)
on conflict(id) do update set value=101;
INSERT 0 1
yugabyte=*# select * from demo where id=123;
id | value
-----+-------
123 | 101
(1 row)
yugabyte=*#
The insert was coded as an upsert, with INSERT ON CONFLICT DO UPDATE to update if the row exists.
This transaction is not committed yet when another starts, trying to do something similar on the same row:
-- session 2
yugabyte=# begin transaction isolation level read committed;
BEGIN
yugabyte=*# select * from demo where id=123;
id | value
----+-------
(0 rows)
yugabyte=*# insert into demo values (123, 102)
on conflict(id) do update set value=102;
I've chosen another value to see later which transaction have set which value. The value is supposed to be decided by the business logic from what was read before in the transaction.
This statement waits on write-write conflict to avoid dirty writes, until the first transaction commits:
-- session 1
yugabyte=*# commit;
COMMIT
yugabyte=# select * from demo;
id | value
-----+-------
123 | 101
(1 row)
The second transaction continues automatically:
-- session 2
yugabyte=*# insert into demo values (123, 102)
on conflict(id) do update set value=102;
...
INSERT 0 1
yugabyte=*# commit;
COMMIT
yugabyte=# select * from demo;
id | value
-----+-------
123 | 102
(1 row)
What happened here is that the second transaction insert ... on conflict do update
has read first to find if we are in an INSERT or UPDATE situation. This read phase uses MVCC, like all reads, without any lock. At that time, the other transaction was not committed, so no rows were found and the insert ... on conflict do update
decided for an INSERT.
Then, it requested a lock for this write but this conflicts with the uncommitted write from the other transaction and then starts to wait for it to complete.
Once the other transaction commits, the update acquired the lock but detected that the state has changed. This state was responsible for the decision to INSERT rather than UPDATE, but now that the state has changed it cannot continue as-is.
This is where the transaction is transparently rolled back to the implicit savepoint taken at the beginning of the statement execution, and restarts it with a new read-time.
When you set yb_debug_log_internal_restarts = on
in YugabyteDB you can see the following in the logs:
2023-11-08 07:50:24.156 GMT [455732] STATEMENT: insert into demo values (123, 101) on conflict(id) do update set value=102;
2023-11-08 07:50:24.156 GMT [455732] LOG: Restarting statement due to kReadRestart/kConflict error:
Query: insert into demo values (123, 101) on conflict(id) do update set value=102;
Error: could not serialize access due to concurrent update
Attempt No: 0
It is a Serializable conflict error but, in Read Committed, handled transparently by YugabyteDB. It is important to understand that serialization errors do not happen only in Serializable isolation level. They can happen at any isolation levels in MVCC databases, where we read and write on different states, but may be solved transparently by the database.
Now, the re-started statement read phase uses a newer MVCC read-time, when the other transaction is now committed. It sees the new row and then the insert ... on conflict do update
decides for an update.
This is the advantage on Read Committed: the application doesn't have to implement a retry logic for serialization errors. However, because the read-time is per statement, you may encounter the Read Committed anomalies like Lost Update. In my demo, there are two read operation: one by the SELECT Statement, and one as the first phase of the INSERT ON CONFLICT statement. Read Committed anomalies happen between two statements of the same transaction, and one result determines what is written by the other.
For example, if the set value=102
was a consequence of the previous read ((0 rows)
) this previous read was not restarted and then we have a transaction that took decisions on two different states: update because the row exists, but set a specific value because the row didn't exist. Finally, after both transactions are committed, all looks like there were only one insert. The result would have been the same if the first transaction didn't insert anything. Its modifications are lots, this is called Lost Update.
According to the definition of Read Committed, it accepts some anomalies to avoid raising conflict errors. To change the behavior, you can either modify the isolation level to Repeatable Read or use SELECT FOR UPDATE
for the first read. The former approach is optimistic locking and expects conflict to be an exception scenario, whereas the latter is pessimistic locking where conflict is the nominal scenario.
Or a better choice is to try to put all the logic into a single SQL statement (leveraging WITH clauses and RETURNING updates) that YugabyteDB can restart transparently, and have no dependencies between statements in a transaction.
You may ask why a statement can be restarted transparently without affecting any other operations, while a whole transaction cannot. The reason for this is that an application may perform non-transactional operations between statements, which the database is not aware of, such as writing to files, pushing events to a queue, or sending emails. These operations are not rolled back by the database, and the application must handle this when catching a serializable error and retrying the statement.
As a general rule, in-database transparent restart is possible when the transaction didn't send any information back to the application. In such cases, the database knows that no additional logic used those results, and it can restart the statement without any side effects.
Top comments (3)
That is an AMAZING article, Franck. Thanks for sharing it! 👏🏻👏🏻
I have two questions:
What would happen in this scenario since Postgres does not support proper read consistency on the Read_Commited isolation level? I guess the application would receive an error from the database, probably a "duplicate key" or something like that.
Using
SELECT...FOR UPDATE
could not work as expected here since the row could not exist at that moment, and there was nothing to lock. So I guessINSERT...ON CONFLICT
would still be necessary anyway. Does it make sense?Thanks again for all this explanation. I always learn a lot from you, Franck. 👊🏻
Thanks a lot for your feedback.
Thanks, Franck!
I think I understood the opposite. Sorry! I thought YugabyteDB had a different behavior from Postgres. 😅
But what would happen with Oracle? Would it raise an error of "duplicate key" instead?
Some comments may only be visible to logged-in visitors. Sign in to view all comments.