DEV Community

Indexing for a Scalable Serialization Isolation Level

The previous post used the primary key to record read intents in YugabyteDB. Serializable isolation can be scalable, but it requires optimal schema and indexes with a good knowledge of how the database works. Indexes define the predicate locks for PostgreSQL Serializable Snapshot Isolation, and the primary key defines the range locks used by YugabyteDB's Two-Phase Commit.
Here an example taken from nathanl/postgresql_serializable_isolation.sql which simply inserts two rows from two transactions:


create table users(
  id serial not null primary key,
  username varchar not null
);

-- create index on users(username asc);

-- session 1

begin transaction isolation level serializable;
select * from users where username = 'alice';
insert into users ("username") values ('alice');

-- session 2
\! psql -c '\timing on' -c "begin transaction isolation level serializable; select * from users where username = 'bob'; insert into users ("username") values ('bob'); commit;" &

-- session 1

\! sleep 1
commit;
\! sleep 1
select * from users;

Enter fullscreen mode Exit fullscreen mode

Let's examine the performance in PostgreSQL and YugabyteDB, with and without and index on "username"

PostgreSQL Seq Scan

Without an index on "username", the SELECT does a Sequential Scan. Session 2 can commit, but session 1 detects a conflict on commit:


COMMIT
Time: 2.844 ms

postgres=*# commit;
ERROR:  could not serialize access due to read/write dependencies among transactions
DETAIL:  Reason code: Canceled on identification as a pivot, during commit attempt.
HINT:  The transaction might succeed if retried.

Enter fullscreen mode Exit fullscreen mode

This serializable error is a false positive. The two sessions read different values, and they are serializable. PostgreSQL employs predicate locks but only records the predicates used for the scan operation. The lock encompasses the entire table since it was read with a Seq Scan.

PostgreSQL Index Scan

I run the same with the index creation uncommented, and there's no error:


COMMIT
Time: 2.581 ms

postgres=*# commit;
COMMIT
postgres=# \! sleep 1
postgres=# select * from users;
 id | username
----+----------
  1 | alice
  2 | bob

Enter fullscreen mode Exit fullscreen mode

In this case, PostgreSQL uses the index condition as a predicate lock, and they do not conflict between the two transactions.

Note that in this scenario, the "username" column should be declared unique instead of creating an additional index. This creates an implicit index and has the same effect:

create table users(
  id serial not null primary ley,
  username varchar not null unique
);

Enter fullscreen mode Exit fullscreen mode

YugabyteDB

YugabyteDB uses Wait-on-Conflict to prevent the serializable error, allowing both transactions to commit even without the index.

yugabyte=*# commit;
COMMIT

COMMIT
Time: 926.484 ms

Enter fullscreen mode Exit fullscreen mode

I waited one second before committing to the first session, and this delay is reflected in the response time of the second session, which waited for the first one. The reason for this is similar to what we have observed with PostgreSQL: the read lock intent is at the table level.

This is visible as STRONG_READ at the level of the "relation," which is blocked by the insert operation that acquires a WEAK_WRITE lock on the relation:

yugabyte=*# select
 substr(ybdetails->>'transactionid',1,8) transactionid , substr(ybdetails->>'blocked_by',1,8) blocked_by,
 relkind, locktype, mode, relname , ybdetails->'keyrangedetails'->'cols'
from pg_locks natural join (select oid relation, relname, relkind from pg_class) pg_class
order by transactionid, relkind desc, relname, case locktype when 'relation' then 1 when 'keyrange' then 2 when 'row' then 3 end, mode desc, locktype desc;
 transactionid | blocked_by | relkind | locktype |         mode         | relname | ?column?
---------------+------------+---------+----------+----------------------+---------+----------
 74e1329c      |            | r       | relation | WEAK_READ,WEAK_WRITE | users   | null
 74e1329c      |            | r       | relation | STRONG_READ          | users   | null
 74e1329c      |            | r       | row      | STRONG_WRITE         | users   | ["1"]
 74e1329c      |            | r       | row      | STRONG_READ          | users   | ["1"]
 c5611a22      | ["74e132   | r       | relation | STRONG_READ          | users   | null

Enter fullscreen mode Exit fullscreen mode

The index will add the range lock intents in the index, but there's still a conflict with the table lock intents at "relation" level:

yugabyte=*# select
 substr(ybdetails->>'transactionid',1,8) transactionid , substr(ybdetails->>'blocked_by',1,8) blocked_by,
 relkind, locktype, mode, relname , ybdetails->'keyrangedetails'->'cols'
from pg_locks natural join (select oid relation, relname, relkind from pg_class) pg_class
order by transactionid, relkind desc, relname, case locktype when 'relation' then 1 when 'keyrange' then 2 when 'row' then 3 end, mode desc, locktype desc;
 transactionid | blocked_by | relkind | locktype |         mode         |      relname       |                  ?column?
---------------+------------+---------+----------+----------------------+--------------------+---------------------------------------------
 02a108e3      |            | r       | relation | WEAK_READ,WEAK_WRITE | users              | null
 02a108e3      |            | r       | relation | STRONG_READ          | users              | null
 02a108e3      |            | r       | row      | STRONG_WRITE         | users              | ["1"]
 02a108e3      |            | r       | row      | STRONG_READ          | users              | ["1"]
 02a108e3      |            | i       | relation | WEAK_WRITE           | users_username_idx | null
 02a108e3      |            | i       | keyrange | WEAK_WRITE           | users_username_idx | ["\"alice\""]
 02a108e3      |            | i       | row      | STRONG_WRITE         | users_username_idx | ["\"alice\"", "\"H\\x80\\x00\\x00\\x01!\""]
 5b82a9e3      | ["02a108   | r       | relation | STRONG_READ          | users              | null

Enter fullscreen mode Exit fullscreen mode

As a consequence of it, simply adding the index in YugabyteDB doesn't reduce the wait. If we want to minimize the latency for this use case, it is preferable to define the username as a primary key rather than the generated identifier:

create table users(
  id serial not null unique,
  username varchar not null primary key
);

Enter fullscreen mode Exit fullscreen mode

In PostgreSQL, table rows are stored in heap tables, and all indexes, including the primary key, are considered secondary indexes. However, in YugabyteDB, the rows are stored in the primary key LSM Tree, so defining the primary key for the main access pattern is essential. There are no limitations because all secondary indexes are global and shared on their key. Another reason the primary key matters in YugabyteDB is to improve serializable locks. In PostgreSQL, even if the scan predicates define the read intents, they are stored in memory. On the other hand, YugabyteDB is distributed and doesn't use a single-node shared memory, so it stores the intents with the table row in the primary key index.

With such a definition, both transactions can be committed without waiting for the other:


COMMIT
Time: 147.917 ms

yugabyte=*# commit;
COMMIT

Enter fullscreen mode Exit fullscreen mode

Here are the locks just before the commits:

 transactionid | blocked_by | relkind | locktype |         mode         |   relname    |    ?column?
---------------+------------+---------+----------+----------------------+--------------+-----------------
 350a076c      |            | r       | relation | WEAK_READ,WEAK_WRITE | users        | null
 350a076c      |            | r       | relation | WEAK_READ            | users        | null
 350a076c      |            | r       | row      | STRONG_WRITE         | users        | ["\"alice\""]
 350a076c      |            | r       | row      | STRONG_READ          | users        | ["\"alice\""]
 350a076c      |            | r       | row      | STRONG_READ          | users        | ["\"alice\""]
 350a076c      |            | i       | relation | WEAK_READ,WEAK_WRITE | users_id_key | null
 350a076c      |            | i       | keyrange | WEAK_READ,WEAK_WRITE | users_id_key | ["1"]
 350a076c      |            | i       | row      | STRONG_WRITE         | users_id_key | ["1", "null"]
 350a076c      |            | i       | row      | STRONG_READ          | users_id_key | ["1", "null"]
 97f5bd76      |            | r       | relation | WEAK_READ,WEAK_WRITE | users        | null
 97f5bd76      |            | r       | relation | WEAK_READ            | users        | null
 97f5bd76      |            | r       | row      | STRONG_WRITE         | users        | ["\"bob\""]
 97f5bd76      |            | r       | row      | STRONG_READ          | users        | ["\"bob\""]
 97f5bd76      |            | r       | row      | STRONG_READ          | users        | ["\"bob\""]
 97f5bd76      |            | i       | relation | WEAK_READ,WEAK_WRITE | users_id_key | null
 97f5bd76      |            | i       | keyrange | WEAK_READ,WEAK_WRITE | users_id_key | ["101"]
 97f5bd76      |            | i       | row      | STRONG_WRITE         | users_id_key | ["101", "null"]
 97f5bd76      |            | i       | row      | STRONG_READ          | users_id_key | ["101", "null"]

Enter fullscreen mode Exit fullscreen mode

This situation does not involve any blocking. The STRONG_READ intents only affect individual rows, and each transaction accesses different rows.

The serializable isolation level can be scalable. However, like many other aspects of SQL database performance, its performance depends on the definition of primary keys and secondary indexes.

Top comments (0)