If you require a no-gap numbering system, using a sequence is not an option (see reasons here). Instead, you can utilize a table with fixed rows that you manually update. However, let's consider a scenario where you cannot have fixed rows and need to read the last value from the table in order to calculate the next number.
Which isolation level do you think you need? You must be sure that the max()
you have read stays the same until you commit the max()+1
insert. This usually requires a serializable isolation level to order the transactions one after the other. You don't want a phantom read to appear once you have calculated the next value.
I have previously discussed how an index can enhance the performance under a serializable isolation level. Here, I will create an index to achieve serialization without using the serializable isolation level.
Here is a simple table:
create table demo (
primary key(id)
, id int generated always as identity
, type varchar(10)
, num int
, unique(type, num)
);
My business rule states that each "type" is assigned a number, starting from one for the first row with this type and increasing in the order in which they were committed without allowing any gaps.
The following query can do that:
explain
insert into demo(type,num)
select type, nextnum from (
select 'x' as type , coalesce(max(num),0)+1 as nextnum
from demo
where type='x'
) next
;
In a Serializable isolation level, the SELECT acquires a lock to declare the read intent so that a write can detect a conflict between the read and write state. In PostgreSQL, it is a predicate lock, and in YugabyteDB, a lock on a key range. As I exposed in the previous posts of this series, an index can improve this to avoid lock escalation to the whole table. As, here, I have defined a unique constraint, it has created an index. The same index is also used to get the last value quickly, as it is always at the end of the index.
Without a unique index, an isolation level lower than Serializable would not be able to detect the conflict when two transactions read the same last value simultaneously and then attempt to insert that same value. This situation could lead to duplicate entries. However, at the serializable isolation level, such anomalies would be detected due to the read-write conflict, preventing duplicates from occurring.
As the index is unique, the write-write conflict when two sessions try to insert the same value avoids such an anomaly. For this reason, our logic can work in the Reading Committed isolation level.
In terms of performance, it is clear that it only scans one index entry, and there is only one Flush Request required to obtain the next value and insert the new row, which ensures that it does not add latency:
yugabyte=# delete from demo;
DELETE 10002
yugabyte=# -- add many types
yugabyte=# insert into demo(type, num) select format('#%s',type),num from generate_series(1,100) as type, generate_series(1,100) as num;
INSERT 0 10000
yugabyte=# -- prepared statement to call two times
yugabyte=# prepare insert_new(text) as
insert into demo(type,num)
select type, nextnum from (
select $1 as type , coalesce(max(num),0)+1 as nextnum
from demo where type=$1
) next returning *
;
yugabyte=# -- call a new type
yugabyte=# explain (analyze, dist) execute insert_new('#0');
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Insert on demo (cost=4.12..4.15 rows=1 width=46) (actual time=0.483..0.484 rows=1 loops=1)
Storage Table Write Requests: 1
Storage Index Write Requests: 1
-> Subquery Scan on next (cost=4.12..4.15 rows=1 width=46) (actual time=0.437..0.438 rows=1 loops=1)
-> Result (cost=4.12..4.13 rows=1 width=36) (actual time=0.432..0.432 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.00..4.12 rows=1 width=4) (actual time=0.429..0.429 rows=0 loops=1)
-> Index Only Scan Backward using demo_type_num_key on demo demo_1 (cost=0.00..4.12 rows=1 width=4) (actual time=0.427..0.427 rows=0 loops=1)
Index Cond: ((type = '#0'::text) AND (num IS NOT NULL))
Heap Fetches: 0
Storage Index Read Requests: 1
Storage Index Read Execution Time: 0.327 ms
Planning Time: 0.217 ms
Execution Time: 6.016 ms
Storage Read Requests: 1
Storage Read Execution Time: 0.327 ms
Storage Rows Scanned: 0
Storage Write Requests: 2
Catalog Read Requests: 0
Catalog Write Requests: 0
Storage Flush Requests: 1
Storage Flush Execution Time: 5.419 ms
Storage Execution Time: 5.746 ms
Peak Memory Usage: 56 kB
(24 rows)
yugabyte=# -- call an existing type
yugabyte=# explain (analyze, dist) execute insert_new('#1');
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Insert on demo (cost=4.12..4.15 rows=1 width=46) (actual time=0.568..0.569 rows=1 loops=1)
Storage Table Write Requests: 1
Storage Index Write Requests: 1
-> Subquery Scan on next (cost=4.12..4.15 rows=1 width=46) (actual time=0.516..0.517 rows=1 loops=1)
-> Result (cost=4.12..4.13 rows=1 width=36) (actual time=0.511..0.512 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.00..4.12 rows=1 width=4) (actual time=0.507..0.508 rows=1 loops=1)
-> Index Only Scan Backward using demo_type_num_key on demo demo_1 (cost=0.00..4.12 rows=1 width=4) (actual time=0.506..0.506 rows=1 loops=1)
Index Cond: ((type = '#1'::text) AND (num IS NOT NULL))
Heap Fetches: 0
Storage Index Read Requests: 1
Storage Index Read Execution Time: 0.387 ms
Storage Index Rows Scanned: 1
Planning Time: 0.198 ms
Execution Time: 5.136 ms
Storage Read Requests: 1
Storage Read Execution Time: 0.387 ms
Storage Rows Scanned: 1
Storage Write Requests: 2
Catalog Read Requests: 0
Catalog Write Requests: 0
Storage Flush Requests: 1
Storage Flush Execution Time: 3.766 ms
Storage Execution Time: 4.153 ms
Peak Memory Usage: 56 kB
(25 rows)
In both cases, there's one Read Request to read the maximum value, two Write Requests to insert in the table and index, and only one Flush Request to wait on Raft consensus, which is the latency to the follower.
I utilized a secondary index to evaluate the worst-case scenario. However, the primary key will likely be (type, num)
. In YugabyteDB, unlike in PostgreSQL, the table is stored based on the primary key, which means that an insert is a single write operation. This design has the advantage of colocating rows with the same type, which are likely to be queried together.
Remember that SQL isolation levels were established without considering Multi-Version Concurrency Control (MVCC) reads, explicit locks, or indexes, as these elements are not included in the SQL standard. To achieve efficient and consistent concurrency control, it is essential to understand how your database works and how it acquires locks. This is also why YugabyteDB is runtime-compatible with PostgreSQL, ensuring similar behavior. Wire protocol and dialect compatibility are insufficient to successfully migrate applications between PostgreSQL-compatible databases.
Top comments (0)