DEV Community

Franck Pachot for YugabyteDB Distributed PostgreSQL Database

Posted on • Edited on

Read Committed is a must for Postgres-compatible distributed SQL databases

In SQL databases, isolation levels serve as a hierarchy for preventing update anomalies. Many people assume that higher isolation levels are always better and that if a database offers Serializable isolation, there's no need for Read Committed. However, consider the following points:

  • Read Committed is the default isolation level in PostgreSQL. As a result, the majority of applications utilize it and often employ SELECT ... FOR UPDATE to prevent certain anomalies.

  • Serializable isolation does not scale well with pessimistic locking. Distributed databases typically rely on optimistic locking, which necessitates the implementation of transaction retry logic in your code.

Given these two points, a distributed SQL database that does not offer Read Committed isolation cannot claim compatibility with PostgreSQL. This is because running applications built for PostgreSQL's default settings would be impossible.

YugabyteDB was developed with the principle that "the higher the better" in mind, and it utilizes "Snapshot Isolation" for the Read Committed isolation level. This approach works well for new applications. However, when migrating existing applications that were designed for Read Committed, you may want to avoid implementing retry logic for serializable failures (SQLState 40001) and instead expect the database to handle these automatically. To switch to Read Committed, you can enable this feature using the **yb_enable_read_committed_isolation** gflag.

Note: a GFlag in YugabyteDB is a global configuration parameter for the database, documented in yb-tserver reference. The PostgreSQL parameters, which can be set by the ysql_pg_conf_csv GFlag concern only the YSQL API but GFlags covers all YugabyteDB layers

In this blog post, I will demonstrate the true value of the Read Committed isolation level: there is no need to implement retry logic because YugabyteDB can handle it automatically.

Start YugabyteDB

I am starting a YugabyteDB single node database for this simple demo:

Franck@YB:~ $ docker  run --rm -d --name yb       \
 -p7000:7000 -p9000:9000 -p5433:5433 -p9042:9042  \
 yugabytedb/yugabyte                              \
 bin/yugabyted start --daemon=false               \
 --tserver_flags=""

53cac7952500a6e264e6922fe884bc47085bcac75e36a9ddda7b8469651e974c
Enter fullscreen mode Exit fullscreen mode

I explicitly didn't set any GFlags to show the default behavior. This is version 2.13.0.0 build 42.

I check the read committed related gflags

Franck@YB:~ $ curl -s http://localhost:9000/varz?raw | grep -E "\
(yb_enable_read_committed_isolation\
|ysql_output_buffer_size\
|ysql_sleep_before_retry_on_txn_conflict\
|ysql_max_write_restart_attempts\
|ysql_default_transaction_isolation\
)"

--yb_enable_read_committed_isolation=false
--ysql_max_write_restart_attempts=20
--ysql_output_buffer_size=262144
--ysql_sleep_before_retry_on_txn_conflict=true
--ysql_default_transaction_isolation=
Enter fullscreen mode Exit fullscreen mode

Read Committed is the default isolation level, by PostgreSQL compatibility:

Franck@YB:~ $ psql -p 5433 \
-c "show default_transaction_isolation"

 default_transaction_isolation
-------------------------------
 read committed
(1 row)
Enter fullscreen mode Exit fullscreen mode

I create a simple table:

Franck@YB:~ $ psql -p 5433 -ec "
create table demo (id int primary key, val int);
insert into demo select generate_series(1,100000),0;
"

create table demo (id int primary key, val int);
insert into demo select generate_series(1,100000),0;

INSERT 0 100000
Enter fullscreen mode Exit fullscreen mode

I'll execute the following update to set the default isolation level to Read Committed, just to be cautious, as it is the default setting.

Franck@YB:~ $ cat > update1.sql <<'SQL'
\timing on
\set VERBOSITY verbose
set default_transaction_isolation to "read committed";
update demo set val=val+1 where id=1;
\watch 0.1
SQL
Enter fullscreen mode Exit fullscreen mode

This will update one row.
I'll run this from multiple sessions on the same row:

Franck@YB:~ $ timeout 60 psql -p 5433 -ef update1.sql >session1.txt &
Franck@YB:~ $ timeout 60 psql -p 5433 -ef update1.sql >session2.txt &
[1] 760
[2] 761

psql:update1.sql:5: ERROR:  40001: Operation expired: Transaction a83718c8-c8cb-4e64-ab54-3afe4f2073bc expired or aborted by a conflict: 40001
LOCATION:  HandleYBStatusAtErrorLevel, pg_yb_utils.c:405

[1]-  Done                    timeout 60 psql -p 5433 -ef update1.sql > session1.txt

Franck@YB:~ $ wait

[2]+  Exit 124                timeout 60 psql -p 5433 -ef update1.sql > session1.txt

Enter fullscreen mode Exit fullscreen mode

You might encounter the error message Transaction ... expired or aborted by a conflict. If you run the same operation multiple times, you may also see the following errors: Operation expired: Transaction aborted: kAborted, All transparent retries exhausted. Query error: Restart read required, or All transparent retries exhausted. Operation failed. Try again: Value write after transaction start. All of these messages indicate ERROR 40001, which are serialization errors that require the application to retry the transaction.

In Serializable transactions, the entire transaction must be retried. Unfortunately, the database cannot transparently handle this because it lacks knowledge of all actions the application took during the transaction. For instance, some rows may have already been read and displayed to the user or saved to a file. Since the database cannot roll back those actions, it is the application's responsibility to manage this situation effectively.

I've enabled \Timing on to track the elapsed time, and since I'm running this on my laptop, there is no significant client-server network delay:

Franck@YB:~ $ awk '/Time/{print 5*int($2/5)}' session?.txt | sort -n | uniq -c

    121 0
     44 5
     45 10
     12 15
      1 20
      1 25
      2 30
      1 35
      3 105
      2 110
      3 115
      1 120
Enter fullscreen mode Exit fullscreen mode

Most updates here took less than 5 milliseconds. However, please note that the program quickly failed at 40001, so this reflects the typical one-session workload on my laptop.

By default, yb_enable_read_committed_isolation is set to false. In this case, the Read Committed isolation level of YugabyteDB's transactional layer defaults to the stricter Snapshot Isolation. As a result, both READ COMMITTED and READ UNCOMMITTED in YSQL operate under Snapshot Isolation.

yb_enable_read_committed_isolation=true

Now, adjust this setting to ensure compatibility with your PostgreSQL application that lacks retry logic.

Franck@YB:~ $ docker rm -f yb

yb
[1]+  Exit 124                timeout 60 psql -p 5433 -ef update1.sql > session1.txt

Franck@YB:~ $ docker  run --rm -d --name yb       \
 -p7000:7000 -p9000:9000 -p5433:5433 -p9042:9042  \
 yugabytedb/yugabyte                \
 bin/yugabyted start --daemon=false               \
 --tserver_flags="yb_enable_read_committed_isolation=true"

fe3e84c995c440d1a341b2ab087510d25ba31a0526859f08a931df40bea43747

Franck@YB:~ $ curl -s http://localhost:9000/varz?raw | grep -E "\
(yb_enable_read_committed_isolation\
|ysql_output_buffer_size\
|ysql_sleep_before_retry_on_txn_conflict\
|ysql_max_write_restart_attempts\
|ysql_default_transaction_isolation\
)"

--yb_enable_read_committed_isolation=true
--ysql_max_write_restart_attempts=20
--ysql_output_buffer_size=262144
--ysql_sleep_before_retry_on_txn_conflict=true
--ysql_default_transaction_isolation=
Enter fullscreen mode Exit fullscreen mode

Running the same as above:

Franck@YB:~ $ psql -p 5433 -ec "
create table demo (id int primary key, val int);
insert into demo select generate_series(1,100000),0;
"

create table demo (id int primary key, val int);
insert into demo select generate_series(1,100000),0;

INSERT 0 100000

Franck@YB:~ $ timeout 60 psql -p 5433 -ef update1.sql >session1.txt &
Franck@YB:~ $ timeout 60 psql -p 5433 -ef update1.sql >session2.txt &
[1] 1032
[2] 1034

Franck@YB:~ $ wait

[1]-  Exit 124                timeout 60 psql -p 5433 -ef update1.sql > session1.txt
[2]+  Exit 124                timeout 60 psql -p 5433 -ef update1.sql > session2.txt

Enter fullscreen mode Exit fullscreen mode

I did not encounter any errors, even though both sessions were updating the same row for 60 seconds. It’s worth noting that the updates didn’t occur precisely at the same time, as the database had to retry several transactions, which is evident in the elapsed time:

Franck@YB:~ $ awk '/Time/{print 5*int($2/5)}' session?.txt | sort -n | uniq -c

    325 0
    199 5
    208 10
     39 15
     11 20
      3 25
      1 50
     34 105
     40 110
     37 115
     13 120
      5 125
      3 130
Enter fullscreen mode Exit fullscreen mode

While most transactions are still under 10 milliseconds, some take up to 120 milliseconds due to retries.

retry backoff

A common retry mechanism waits an exponentially increasing amount of time between each attempt, up to a maximum limit. This is implemented in YugabyteDB, and the following three parameters can be set at the session level to control it:

Franck@YB:~ $ psql -p 5433 -xec "
select name, setting, unit, category, short_desc
from pg_settings
where name like '%retry%backoff%';
"

select name, setting, unit, category, short_desc
from pg_settings
where name like '%retry%backoff%';

-[ RECORD 1 ]---------------------------------------------------------
name       | retry_backoff_multiplier
setting    | 2
unit       |
category   | Client Connection Defaults / Statement Behavior
short_desc | Sets the multiplier used to calculate the retry backoff.
-[ RECORD 2 ]---------------------------------------------------------
name       | retry_max_backoff
setting    | 1000
unit       | ms
category   | Client Connection Defaults / Statement Behavior
short_desc | Sets the maximum backoff in milliseconds between retries.
-[ RECORD 3 ]---------------------------------------------------------
name       | retry_min_backoff
setting    | 100
unit       | ms
category   | Client Connection Defaults / Statement Behavior
short_desc | Sets the minimum backoff in milliseconds between retries.
Enter fullscreen mode Exit fullscreen mode

With my local database, transactions are quick, and I don't have to wait long. When I add set retry_min_backoff to 10; to my update1.sql, the elapsed time is not significantly increased by this retry logic.

Franck@YB:~ $ awk '/Time/{print 5*int($2/5)}' session?.txt | sort -n | uniq -c

    338 0
    308 5
    302 10
     58 15
     12 20
      9 25
      3 30
      1 45
      1 50
Enter fullscreen mode Exit fullscreen mode

yb_debug_log_internal_restarts

The restarts are clear and transparent. If you wish to understand the reasons for restarts, or why they may not be possible, you can enable logging with yb_debug_log_internal_restarts=true.

# log internal restarts
export PGOPTIONS='-c yb_debug_log_internal_restarts=true'

# run concurrent sessions
timeout 60 psql -p 5433 -ef update1.sql >session1.txt &
timeout 60 psql -p 5433 -ef update1.sql >session2.txt &

# tail the current logfile
docker exec -i yb bash <<<'tail -F $(bin/ysqlsh -twAXc "select pg_current_logfile()")'
Enter fullscreen mode Exit fullscreen mode

restart log

Versions

This feature was implemented in YugabyteDB version 2.13, and I am currently using version 2.13.1. While it does not yet work when executing transactions through the DO or ANALYZE commands, it is functional for procedures. If you would like to see this feature available for the DO or ANALYZE commands, you can follow and comment on issue #12254 at https://github.com/yugabyte/yugabyte-db/issues/12254.

In conclusion

Implementing retry logic in an application is a design choice rather than a necessity when using YugabyteDB. A distributed database might encounter restart errors due to clock skew, but it is important to make these errors as transparent as possible to SQL applications.

If your goal is to prevent all transaction anomalies (for example, see this one), you can run your transactions in Serializable mode and handle the 40001 exception. Don’t be misled into thinking that this approach requires more code; in reality, without it, you would have to test for all possible race conditions, which can be more labor-intensive. By using Serializable isolation, the database guarantees that you will experience behavior similar to running transactions serially, meaning your unit tests will be sufficient to ensure data correctness.

However, if you have an existing PostgreSQL application, its behavior under the default isolation level has been validated through years of production use. The aim is not to eliminate potential anomalies—since the application likely has workarounds for them—instead, you want to scale out without modifying the code. This is where YugabyteDB is advantageous, as it offers the Read Committed isolation level, which does not require any additional error handling code.

Top comments (0)