In a situation where a deadlock is detected, one of the transactions involved must be cancelled. The hope is that the other transaction can then be committed and retried. This process is similar to a serializable conflict where the application receives a retriable error. In such a case, the application is responsible for retrying the transaction.
In a serializable isolation level, the database is unable to automatically retry a transaction because it is unaware of any other actions the application may have taken. As a result, the transaction must be either canceled or compensated by the application. However, in a Read Committed isolation level, where the read point can be updated to a newer time, the database can retry the statement without canceling the entire transaction.
What happens if a deadlock occurs in a transaction that uses the Read Committed isolation level? In cases where there are multiple statements involved, simply retrying the last statement that failed is not enough because the conflict remains and the other transaction is still waiting. However, if the deadlock occurs in a single statement, we can imagine that the database can rollback it and retry. Let's test it in Oracle Autonomous Database.
Creating a single table with two indexes, and a slow()
function that waits 10 seconds when called is an easy way to build a reproducible test case:
create table demo as
select rownum k, 0 v, 0 x
from xmltable('1 to 100000');
create index demo_asc on demo(k asc );
create index demo_desc on demo(k desc);
create or replace function slow(v number) return number as
begin dbms_session.sleep(10); return v; end;
/
I am using two indexes to control the order in which the rows are read. I am choosing one of the indexes using a hint, and then waiting for 10 seconds before reading the same with the other index from another session. For this, I am using a function in the WHERE clause to pause before each read.
During the first session, my intent is to update two rows: k=10
and k=90000
. The second update starts 10 seconds after the first one thanks to the slow function:
-- session 1
update /*+ index(demo, demo_asc ) */ demo
set v=v+1 where k in (10,90000) and slow(x)>=0;
After entering this, wait 10 seconds and update rows k=90000 and k=10 in another session:
-- session 2
set timing on
! sleep 10
update /*+ index(demo, demo_desc) */ demo
set v=v+1 where k in (10,90000);
This statement raises an error after 12.5 seconds, due to deadlock detection after the other session's transition to the next row:
-- session 2
DEMO@adb_tp> set timing on
DEMO@adb_tp> ! sleep 10
DEMO@adb_tp> update /*+ index(demo, demo_desc) */ demo
2* set v=v+1 where k in (10,90000);
Error starting at line : 1 in command -
update /*+ index(demo, demo_desc) */ demo
set v=v+1 where k in (10,90000)
Error at Command Line : 2 Column : 26
Error report -
SQL Error: ORA-00060: deadlock detected while waiting for resource
00060. 00000 - "deadlock detected while waiting for resource"
*Cause: Transactions deadlocked one another while waiting for resources.
*Action: Look at the trace file to see the transactions and resources involved. Retry if necessary.
Elapsed: 00:00:12.539
When I get this error in the second session, the first session updated the two rows but did not commit.
I attempt the same update on the second session. I waits for several minutes before finally succeeding:
-- session 2
DEMO@adb_tp> update /*+ index(demo, demo_desc) */ demo
2* set v=v+1 where k in (10,90000);
2 rows updated.
Elapsed: 00:12:00.666
DEMO@adb_tp> select * from demo where k in (10,90000);
DEMO@adb_tp> select * from demo where k in (10,90000);
K V X
_________ ____ ____
10 1 0
90,000 1 0
How can it finally be successful? My first session was canceled and disconnected, probably after an idle timeout, and then the changes were rolled back:
-- session 1
DEMO@adb_tp> select 1 from dual;
Error starting at line : 1 in command -
select 1 from dual
Error at Command Line : 1 Column : 1
Error report -
SQL Error: No more data to read from socket
If the first session had already committed at the time, the retry of the second session would succeed. This means that a transparent retry would have been possible.
I was running this in the Oracle Autonomous Database (19c) and I can see that the statement that was hanging on the row lock while it was running:
With hints and a slow function, I created a reproducible test case. However, I believe that queries that change their execution plan can also produce similar results, albeit with a very low probability.
Do you think the database should retry transparently in this case? I am not aware of any other databases that operate differently and retry on deadlock. Presently, YugabyteDB restarts only when there is a conflict between the read and write state, similar to Oracle, but it does not restart on deadlock. Please let me know if this is an issue and if a retry mechanism should be implemented.
If you try to reproduce the example, be aware that it relies on internal implementation that may change in other versions:
Top comments (0)