DEV Community

Franck Pachot for AWS Heroes

Posted on • Edited on

Scalable Sequence for PostgreSQL

A best practice for scalability is to avoid sequences, with a UUID for the primary key. However there are some good reasons for SQL sequences: they are smaller, guarantee uniqueness (not a high probability but real mathematical guarantee), they cluster the rows that are inserted together, which is good for B-Tree indexes and maybe for further queries. I explained in a previous post that sequences can be scalable thanks to caching.

However, PostgreSQL caches the sequences per-connection (different from Oracle that caches in the instance shared memory). Sequences can be a bottleneck in case of a connection storm where all have a cold cache, and here is a workaround. Note that re-connecting in PostgreSQL is expensive for many other reasons and better fix that with a connection pool that is not too dynamic (if you need it to be dynamic, you should have a look at FlexiPool)

Nextval() overload

The PostgreSQL nextval() function takes a regclass as input. If a text is provided, it is casted to the regclass of the sequence with this name.

I'll extend this, with a function that takes a name as text and look for a specific suffix, with % followed by a number.

  • If there's no such suffix, the postgres function is called with name::regclass.
  • If there's such suffix, I replace the %[0-9]+ part with a random number modulo the value passed after the %.

For example, if I call with seq%4 it will, at random, take the next value from seq0,seq1,seq2 or seq3. Of course, they must be created beforehand.

The idea is that when I declare something like:

create table demo ( id bigint default nextval('seq%4') primary key)
Enter fullscreen mode Exit fullscreen mode

where "seq%4" is not a sequence, it will read from any of the 4 sequences created, from seq0 to seq3.

As my goal is to distribute the sequences that are read when there's a connection storm (all with cold cache), instead of a random number, I use pg_backed_pid() so that the same session uses the same sequence but different sessions will avoid the hotspot.

Here is the function:

create function nextval(name text) returns bigint as $$
declare
 prefix text;
 buckets int;
 sequence regclass;
begin
 prefix :=   regexp_replace(name,'^(.*)%([0-9]+)$','\1');
 if prefix = name then
  sequence:=name::regclass;
 else
  buckets := regexp_replace(name,'^(.*)%([0-9]+)$','\2');
  sequence:=(prefix||(pg_backend_pid()%buckets)::text)::regclass;
 end if;
 return pg_catalog.nextval(sequence);
end;
$$ language plpgsql;

Enter fullscreen mode Exit fullscreen mode

Multiple interleaved sequences

In this demo, I'll scale to 8 sequences. I create 8 sequences prefixed with seq and add a number from 0 to 7. This is what I need to call my function with select nextval('seq%8') as it will replace the suffix with a modulo 8.

Because the goal is to have unique numbers, each sequence must start from a different number, 1 to 8 when starting with no data, and will increment by 8. I use a cache of 100, so that the first read from seq0 will cache all multiples of 8, from 0 to 792. The seq1 will have the numbers, from 1 to 793, and so on.


yugabyte=#

select format(
'create sequence if not exists %I minvalue %s start with %s cache 100 increment by %s'
,'seq'||n,n,n,1+max(n)over()) as "CREATE SEQUENCE"
from generate_series(0,7) n;
\gexec

                                   CREATE SEQUENCE
-------------------------------------------------------------------------------------
 create sequence if not exists seq0 minvalue 0 start with 0 cache 100 increment by 8
 create sequence if not exists seq1 minvalue 1 start with 1 cache 100 increment by 8
 create sequence if not exists seq2 minvalue 2 start with 2 cache 100 increment by 8
 create sequence if not exists seq3 minvalue 3 start with 3 cache 100 increment by 8
 create sequence if not exists seq4 minvalue 4 start with 4 cache 100 increment by 8
 create sequence if not exists seq5 minvalue 5 start with 5 cache 100 increment by 8
 create sequence if not exists seq6 minvalue 6 start with 6 cache 100 increment by 8
 create sequence if not exists seq7 minvalue 7 start with 7 cache 100 increment by 8
(8 rows)


yugabyte=# \gexec
CREATE SEQUENCE
CREATE SEQUENCE
CREATE SEQUENCE
CREATE SEQUENCE
CREATE SEQUENCE
CREATE SEQUENCE
CREATE SEQUENCE
CREATE SEQUENCE
yugabyte=#

Enter fullscreen mode Exit fullscreen mode

Testing single sequence

Of course, as I overload the postgres function, I must guarantee the same behavior when called with a text:

yugabyte=# select nextval('seq1'),* from seq1;
 nextval | last_value | log_cnt | is_called
---------+------------+---------+-----------
       1 |          1 |       0 | f
(1 row)

yugabyte=# select nextval('seq1'),* from seq1;
 nextval | last_value | log_cnt | is_called
---------+------------+---------+-----------
       9 |        793 |       0 | t
(1 row)

yugabyte=# \connect
psql (13.7, server 11.2-YB-2.17.0.0-b0)
You are now connected to database "yugabyte" as user "yugabyte".
yugabyte=# select nextval('seq1'),* from seq1;
 nextval | last_value | log_cnt | is_called
---------+------------+---------+-----------
     801 |        793 |       0 | t
(1 row)

yugabyte=# select nextval('seq1'),* from seq1;
 nextval | last_value | log_cnt | is_called
---------+------------+---------+-----------
     809 |       1593 |       0 | t
(1 row)

Enter fullscreen mode Exit fullscreen mode

This is the normal behavior in PostgreSQL: the cache is per-session. If I reconnect, a new value is fetched to warm the cache. You can see that when querying the sequence itself: the first call to nextval() returned 1 (the start value) and updated the sequence to 793 (to skipcache*increment by). The next calls within the same session get those 99 numbers without accessing to the sequence. Another connection starts at the next after 793 and updates the sequence to 1593.

Note that if I call the function with a regclass it will directly call the postgres function. You can use that if you have a real sequence with the %'+number prefix. Or you can change the prefix. I used '%' to look like a modulo.

However, the goal of sequences created with increment by 8 is to call my function with seq%8.

Testing scalable sequence

Now I'll call the function with my special pattern nextval('seq%8'):

yugabyte=# select nextval('seq%8');
 nextval
---------
       2
(1 row)

yugabyte=# \connect
psql (13.7, server 11.2-YB-2.17.0.0-b0)
You are now connected to database "yugabyte" as user "yugabyte".
yugabyte=# select nextval('seq%8');
 nextval
---------
       5
(1 row)

yugabyte=# \connect
psql (13.7, server 11.2-YB-2.17.0.0-b0)
You are now connected to database "yugabyte" as user "yugabyte".
yugabyte=# select nextval('seq%8');
 nextval
---------
     805
(1 row)

yugabyte=# \connect
psql (13.7, server 11.2-YB-2.17.0.0-b0)
You are now connected to database "yugabyte" as user "yugabyte".
yugabyte=# select nextval('seq%8');
 nextval
---------
       0
(1 row)
Enter fullscreen mode Exit fullscreen mode

As you can see from the number picked-up, each new connection read from another sequence, except the third one for which the pid had the same modulo 8 and then read from the same with a cold cache.

Unique and minimal gap

I want to be sure that I have no duplicates, and also expects no gaps when inserted from the same session.

I create a table with default nextval('seq%8') as the default value for the id and insert 1000 rows from many sessions.

yugabyte=# create table demo (
            id bigint default nextval('seq%8') primary key
            , n int
);
CREATE TABLE

yugabyte=# insert into demo(n) select generate_series(1,1000);
INSERT 0 1000
yugabyte=# \connect
psql (13.7, server 11.2-YB-2.17.0.0-b0)
You are now connected to database "yugabyte" as user "yugabyte".

yugabyte=# insert into demo(n) select generate_series(1,1000);
INSERT 0 1000
yugabyte=# \connect
psql (13.7, server 11.2-YB-2.17.0.0-b0)
You are now connected to database "yugabyte" as user "yugabyte".

...
Enter fullscreen mode Exit fullscreen mode

I've run those inserts/reconnect many times, from multiple sessions. There were no duplicate key errors and here is a range of ids:

Image description

This looks like generated from one sequence with increment by 1 but has the advantage to distribute the access to multiple sequences when they need to cache another range.

Why and When?

A cached sequence is probably enough for scalability in a well-designed application (using a connection pool). With a cache of 100 the sequence is read and updated only once every 100 calls. However, if your connection pool is too dynamic and starts hundreds of connections at the same time, they will all access the sequence at the same time.

Another reason to use this is when you don't use cached sequence (defined with cache 1 which is the PostgreSQL default). But, again, this is probably not the best design. A cached sequence has another advantage: concurrent sessions are generating IDs on different ranges which may reduce some hotspots on the B-Tree that supports the primary key.

In YugabyteDB, you dont have this hotspot problem as you will probably have hash-sharding on ID. But reading from multiple sequences can be a nice addition to caching, especially in geo-distributed clusters where the latency adds to the contention in case of connection storms.

Note that this is different from Oracle Database Scalable Sequences which are more like a Partitioned Sequence for which the goal is to avoid B-Tree hotspots. With my function, the numbers are interleaved with increment by to keep numbers low.

Top comments (0)