Franck Pachot for YugabyteDB Distributed PostgreSQL Database

Posted on Jul 10, 2023 • Edited on Aug 17

YugabyteDB: set read-only for a single query to allow follower reads

#distributed #postgres #database #yugabytedb

Consistent reads go to the tablet leader to work on the current state, being guaranteed to see the latest writes, maybe waiting for concurrent writes to be committed.

In a geo-distributed deployment, where network latency can be higher than the expected response time, you may prefer to read from the nearest replica, leader or follower, to get faster response time.

When follower reads are allowed, YugabyteDB still reads from a consistent snapshot, but accepting a bounded staleness defined by yb_follower_read_staleness_ms which defaults to 30 seconds. All replicas are guaranteed to be consistent with this snapshot, even without reading from other followers, for two reasons: they exchange heartbeats (by default every 500ms), and the maximum clock skew is known (by default 500ms).

A staleness of 15 or 30 seconds is often acceptable for real-time analytics or reporting where you query a virtual snapshot without making any some reservations (locks).

This behavior is enabled in YugabyteDB by two settings:

the transaction must be declared as read-only, so that it cannot compromise write consistency. Basically, in an MVCC database, "read-only" is an isolation level that doesn't require any lock acquisition or read intents record.
yb_read_from_followers is set to on to explicitly allow the possible staleness because that is different from the monolithic PostgreSQL behavior of read-only transactions (which depends on the replication mode and settings). It explicitly allow a staleness of yb_follower_read_staleness_ms before the beginning of the query.

Dedicated connection pool

To set those, it is recommended to deploy with a dedicated connection pool with those initialization settings. Here are some examples:

SET on connection initialization declared in the connection pool configuration:

set yb_read_from_followers to on;
set default_transaction_read_only to on;

PGOPTION as environnement variable:

PGOPTIONS="-c yb_read_from_followers=on -c default_transaction_read_only=on"

PGOPTION in the JDBC URL:

jdbc:postgresql://localhost:5433/yugabyte?options=-c%20yb_read_from_followers=on%20-c%20default_transaction_read_only=on

dedicate a user for it:

alter user reader_role set yb_read_from_followers to on;
alter user reader_role set default_transaction_read_only to on;

Typically you need at least two connection pools in a CQRS architecture: one for the consistent reads and writes, and one for the queries.

Setting per transaction when sharing the same connection

Multiple connection pools will increase the number of connections and it is also a good practice to limit the number of connections and, in some cases, you want to allow follower reads for a few specific queries only.

Setting those two parameters before the query, and back to their initial value after it, may require additional roundtrips to the server. It is also not easy to maintain: forgetting to set it back, in the normal scenario or after an exception, would be a critical issue.

This blog post explores some way to set those parameters per query without increasing the number of roundtrips.

How to test it

Before trying some alternatives, I need a reliable way to test it. I create a simple table:

create table demo as select generate_series(1,1000) x;

and use my YBWR script:

\! curl -s https://raw.githubusercontent.com/FranckPachot/ybdemo/main/docker/yb-lab/client/ybwr.sql | grep -v '\watch' > ybwr.sql
\i ybwr.sql

This defines two prepared statements, snap_reset and snap_table to display the read statistics on the tablet (seek() and next() in the LSM-Tree).

First, I set yb_read_from_followers=on without setting a read-only transaction. This reads from the leaders (L) which are distributed to multiple nodes (10.0.0.39, 10.0.0.40, 10.0.0.41):

yugabyte=# execute snap_reset;
 ybwr metrics
--------------
(0 rows)

Time: 141.791 ms

yugabyte=# set yb_read_from_followers=on;
SET
Time: 12.080 ms

yugabyte=# select count(*) from demo where x between 1 and 42 ;
 count
-------
    42
(1 row)

Time: 13.412 ms

yugabyte=# execute snap_table;

 rocksdb_seek | rocksdb_next | rocksdb_insert |       dbname / relname / tserver / tabletid / leader
--------------+--------------+----------------+------------------------------------------------------------
            1 |          667 |                | yugabyte demo 0d4b4b64a93c43239d0ca8ed21286458 L 10.0.0.41
            1 |          645 |                | yugabyte demo 49b822c41fe34c059e0e24788810012a L 10.0.0.40
            1 |          685 |                | yugabyte demo aa45e3e86eaa41ca9254100fbe4b8b43 L 10.0.0.39
(3 rows)

Time: 450.742 ms

This has read from different nodes, where the table Leader (L) is. In this case (Sequential Scan on few rows) there's one seek() per node, which means that the response time will include the latency to each node, once per execution.

Now, if I do the same in a read-only transaction (with yb_read_from_followers stills set to on):

yugabyte=# execute snap_reset;
 ybwr metrics
--------------
(0 rows)

Time: 141.791 ms
yugabyte=# set default_transaction_read_only=on;
SET
Time: 12.080 ms
yugabyte=# select count(*) from demo where x between 1 and 42 ;
 count
-------
    42
(1 row)

Time: 13.412 ms
yugabyte=# set default_transaction_read_only=off;
SET
Time: 12.069 ms

 rocksdb_seek | rocksdb_next | rocksdb_insert |       dbname / relname / tserver / tabletid / leader
--------------+--------------+----------------+------------------------------------------------------------
            1 |          667 |                | yugabyte demo 0d4b4b64a93c43239d0ca8ed21286458   10.0.0.39
            1 |          645 |                | yugabyte demo 49b822c41fe34c059e0e24788810012a   10.0.0.39
            1 |          685 |                | yugabyte demo aa45e3e86eaa41ca9254100fbe4b8b43 L 10.0.0.39
(3 rows)

Time: 450.742 ms

In this 3 nodes cluster with Replication Factor 3, there's a replica on each node. I have read from the Leader only when it was near the node I'm connected to (10.0.0.39) and from the followers for the other, still reading from my local node. This validates the read from follower behavior.

I can also test the consequence, that with those settings, a read-only transaction reads from a virtual snapshot taken from 30 seconds ago:

yugabyte=# \dconfig yb*follower*
   List of configuration parameters
           Parameter           | Value
-------------------------------+-------
 yb_follower_read_staleness_ms | 30000
 yb_read_from_followers        | on
(2 rows)

yugabyte=# select count(*) from demo;
 count
-------
  1000
(1 row)

yugabyte=# insert into demo values(0);
INSERT 0 1

yugabyte=# select count(*) from demo;
 count
-------
  1001
(1 row)

yugabyte=# set default_transaction_read_only=on;
SET
yugabyte=# select count(*) from demo;
 count
-------
  1000
(1 row)

yugabyte=# \! sleep 30

yugabyte=# select count(*) from demo;
 count
-------
  1001
(1 row)

yugabyte=# set default_transaction_read_only=off;
SET
yugabyte=#

Now that I know how to test, I can show the different ways to set those parameters for a single query.

⚠️ the /+ Set() / hint is not a solution

I'll start with a way that may seem to work in some situation but only because of some side effects when setting the read point. YugabyteDB installs the pg_hint_plan which allows a Set() hint to set a parameter for one query. However, this is set only during the parsing phase, and is relevant only for query planner parameters. Reading from followers is a behavior of the Executor and is not controlled by pg_hint_plan.

Doing so was used in some situation, and even got its way into some documentation examples, by mistake. It was working as a side effect only, because the read time is pro-actively set during the parse phase and, in some limited cases (atomic transaction, no prepared statement, no generic plan) doesn't change at execution time.

Basically there is no way in PostgreSQL to set a parameter for the scope of a query only and I'm not aware of any extension doing this. There was a proposal in 2011 (proposal: set GUC variables for single query) with some discussions about a possible syntax and some doubts about the value of it in addition to the transaction scope with SET LOCAL.

Explicit transaction with SET LOCAL

SET LOCAL is in theory the best solution but it requires an explicit transaction:

start transaction;
set local transaction_read_only to on;
set local yb_read_from_followers to on;
select count(*) from demo where x between 1 and 42;
commit;

I don't like this solution because I consider that read-only must be at transaction level. Would it make sense to set this parameter back and forth in the same transaction? No, and you would get ERROR: cannot set transaction read-write mode inside a read-only transaction

Explicit transaction with READ ONLY

If yb_read_from_followers=on is already set, it is much better to define READ ONLY at the scope of the transaction:

start transaction read only;
select count(*) from demo where x between 1 and 42;
commit;

However, as your goal is a single-query, you don't want 3 roundtrips to the PostgreSQL backend to run this. It is not possible to use an autocommit statement (or you would have to set default_transaction_read_only and then be sure to set it back after).

You can send the explicit transaction in a single SQL command if your driver is able to get the query result from there. this is possible with psql or ysqlsh but this requires that the ; is escaped (\;) because psql interprets ; as a command separator before sending it as a statement separator. The difference is visible when you set \timing on as it shows the time for each command:

yugabyte=# \timing on
Timing is on.

yugabyte=# start transaction read only ;
START TRANSACTION
Time: 12.096 ms
yugabyte=*# select count(*) from demo where x between 1 and 42 ;
 count
-------
    42
(1 row)

Time: 13.410 ms
yugabyte=*# commit;
COMMIT
Time: 12.069 ms
yugabyte=#
yugabyte=#
yugabyte=# start transaction read only \;
yugabyte-# select count(*) from demo where x between 1 and 42 \;
yugabyte-# commit;
START TRANSACTION
 count
-------
    42
(1 row)

COMMIT
Time: 13.756 ms

You should test what happens with the drivers you use, For example do you know what JDBC sends when you setReadOnly(true) or JPA and Spring with @Transactional(readOnly = true)? How many calls to the database? Some client library will also allow to batch multi-statement commands so that you can send the start transaction and the select in one execution.

In pg_stat_statements you will see multiple statements in both cases, because query is a statement there, but in pg_stat_activity you will see the whole command in query.

ALTER FUNCTION ... SET

Another possibility to reduce the calls to the database is encapsulating your query as a function. There you can define parameters with the scope of the function execution:

create or replace function demo_query(a int, b int)
 returns setof demo as $SQL$
 select * from demo where x between a and b;
$SQL$ language sql
set transaction_read_only to on
-- cannot be set here (see https://github.com/yugabyte/yugabyte-db/issues/20482)
-- set yb_read_from_followers=on
;

This reads from followers in one atomic transaction if yb_read_from_followers is set before calling it (see issue #20482):

yugabyte=# set yb_read_from_followers=on
SET

yugabyte=# select count(*) from demo_query(1, 42) ;
 count
-------
    42
(1 row)

Time: 16.308 ms
yugabyte=# execute snap_table;
 rocksdb_seek | rocksdb_next | rocksdb_insert |       dbname / relname / tserver / tabletid / leader
--------------+--------------+----------------+------------------------------------------------------------
            1 |          669 |                | yugabyte demo 0d4b4b64a93c43239d0ca8ed21286458   10.0.0.39
            1 |          645 |                | yugabyte demo 49b822c41fe34c059e0e24788810012a   10.0.0.39
            1 |          685 |                | yugabyte demo aa45e3e86eaa41ca9254100fbe4b8b43 L 10.0.0.39
(3 rows)

Time: 454.406 ms

This works but you have to write a query with parameters. As far as I know there is no possibility to do the same with a view.

There is also no possibility to do the same with a DO block as it cannot return a result.

Other settings

I mentioned default_transaction_isolation and transaction_read_only but there are equivalent syntaxes:

within a session you can set default_transaction_read_only=on or set session characteristics as transaction read only to set the transaction mode for the next transactions (it doesn't try to change the current one if you have already started one - and some drivers with auto commit of issue a begin before).
within a transaction, you can set transaction_read_only=on or set transaction read only. You don't need SET LOCAL as this is a characteristic of the current transaction and the next one will use the default, but I think it is good to make it explicit.

Be careful, some drivers disable Auto Commit by starting a transaction before the call. Then setting the default for the next transaction will not be applied for the next statements.

my preference goes to start transaction read only as it clearly associate the mode with the transaction. It is the same as begin read only or begin work read only or begin work transaction read only. I often type begin as it is shorter but the SQL starndard is start transaction.

If you know a simpler way to set a parameter for a single query in PostgreSQL, please let me know.

DEV Community

YugabyteDB: set read-only for a single query to allow follower reads

Dedicated connection pool

Setting per transaction when sharing the same connection

How to test it

⚠️ the /+ Set() / hint is not a solution

Explicit transaction with SET LOCAL

Explicit transaction with READ ONLY

ALTER FUNCTION ... SET

Other settings

Top comments (0)

Read next

All About Parquet Part 06 - Encoding in Parquet | Optimizing for Storage

GitCone.com: Chat with all the repositories from this challenge, or with any other repository

Local AI Knowledge Base with Next.js, Ollama, and PostgreSQL

Optimizing PostgreSQL Mass Deletions with Table Partitioning

Dedicated connection pool

Setting per transaction when sharing the same connection

How to test it

⚠️ the /*+ Set() */ hint is not a solution

Explicit transaction with SET LOCAL

Explicit transaction with READ ONLY

ALTER FUNCTION ... SET

Other settings

Read next

All About Parquet Part 06 - Encoding in Parquet | Optimizing for Storage

GitCone.com: Chat with all the repositories from this challenge, or with any other repository

Local AI Knowledge Base with Next.js, Ollama, and PostgreSQL

Optimizing PostgreSQL Mass Deletions with Table Partitioning

⚠️ the /+ Set() / hint is not a solution