Franck Pachot for AWS Heroes

Posted on Nov 30, 2024

Aurora Limitless - Sequences

#aws #postgres #sharded #database

Sequences are commonly used in SQL to generate unique numbers of small size and clustered to minimize the scattering of rows. However, managing sequences at scale in a distributed database can be difficult, particularly when sharding is involved. Let's examine how this is implemented in Aurora Limitless.

I have a two-shard Aurora Limitless cluster:

postgres_limitless=> select * from rds_aurora.limitless_subclusters order by 1;
 subcluster_id | subcluster_type
---------------+-----------------
 2             | router
 3             | router
 4             | shard
 5             | shard
(4 rows)

I have created the PgBench tables as described in the previous post. "pgbench_history" has no primary key:

postgres_limitless=> \d pgbench_history
              Partitioned table "public.pgbench_history"
 Column |            Type             | Collation | Nullable | Default
--------+-----------------------------+-----------+----------+---------
 tid    | integer                     |           |          |
 bid    | integer                     |           |          |
 aid    | integer                     |           |          |
 delta  | integer                     |           |          |
 mtime  | timestamp without time zone |           |          |
 filler | character(22)               |           |          |
Partition key: HASH (aid)
Number of partitions: 2 (Use \d+ to list them.)

Let's try to add a sequence as a primary key. I can create the sequence explicitly or implicitly with the BIGSERIAL datatype:

postgres_limitless=> alter table pgbench_history add id bigserial;
ALTER TABLE

The definition of the sequence from the PostgreSQL catalog shows a max value of 250000:

postgres_limitless=> \x
Expanded display is on.
postgres_limitless=> select * from pg_sequences where schemaname='public';
-[ RECORD 1 ]-+-----------------------
schemaname    | public
sequencename  | pgbench_history_id_seq
sequenceowner | postgres
data_type     | bigint
start_value   | 1
min_value     | 1
max_value     | 250000
increment_by  | 1
cycle         | f
cache_size    | 1
last_value    |

Keep in mind that I'm connected to a router, and the PostgreSQL catalog views do not provide a complete picture.

The Aurora Limitless global view provides a definition for the entire cluster.

postgres_limitless=> select * from rds_aurora.limitless_global_sequence_view;
 schema_name |     sequence_name      | lastval | minvalue |      maxvalue       | start | inc | cache
-------------+------------------------+---------+----------+---------------------+-------+-----+-------
 public      | pgbench_history_id_seq | 1000000 |        1 | 9223372036854775807 |     1 |   1 |     1
(1 row)

The max value here is what PostgreSQL would have set - the maximum of a bigint. However, the last value is relatively high - one million.

Another Aurora Limitless view shows more info:

postgres_limitless=> select * from rds_aurora.limitless_sequence_metadata_view limit 5;
-[ RECORD 1 ]-----+-----------------------
subcluster_id     | 2
sequence_name     | pgbench_history_id_seq
schema_name       | public
active_minvalue   | 1
active_maxvalue   | 250000
reserved_minvalue | 500001
reserved_maxvalue | 750000
chunk_size        | 250000
chunk_state       | 1
local_last_value  |
-[ RECORD 2 ]-----+-----------------------
subcluster_id     | 3
sequence_name     | pgbench_history_id_seq
schema_name       | public
active_minvalue   | 250001
active_maxvalue   | 500000
reserved_minvalue | 750001
reserved_maxvalue | 1000000
chunk_size        | 250000
chunk_state       | 1
local_last_value  |

Subclusters 2 and 3 function as the routers, indicating that there is a sequence for each router rather than for each shard. These sequences have a chunk size of 250000, which is used to reserve ranges of values on each router. The router identified by subcluster_id 2 has an active range of values from 1 to 250000. The router with subcluster_id 3 has an active range from 250001 to 500000.

Aurora Limitless utilizes this metadata to create a PostgreSQL sequence on each router, ensuring that the ranges do not overlap. For instance, I have observed that the router I am connected to has a sequence with a minimum value of 1 and a maximum value of 250000, which corresponds to the definition of the router with subcluster_id 2.

I can confirm my assumption by querying the local version of the global view:

postgres_limitless=> select * from rds_aurora.limitless_local_sequence_metadata_view;
-[ RECORD 1 ]-----+-----------------------
subcluster_id     | 2
sequence_name     | pgbench_history_id_seq
schema_name       | public
active_minvalue   | 1
active_maxvalue   | 250000
reserved_minvalue | 500001
reserved_maxvalue | 750000
chunk_size        | 250000
chunk_state       | 1
local_last_value  |

A parameter determines the size of each chunk:

postgres_limitless=> show rds_aurora.limitless_sequence_chunk_size;
 rds_aurora.limitless_sequence_chunk_size
------------------------------------------
 250000
(1 row)

I run PgBench transactions that add a row in "pgbench_history" with "mtime" from the wall clock:

postgres_limitless=> \! pgbench -c 10 -P 15 -t 100000 -r -f simple-update.sql
pgbench (16.2, server 16.4)
starting vacuum...end.
progress: 15.0 s, 4.1 tps, lat 790.825 ms stddev 228.164, 0 failed
progress: 30.0 s, 12.9 tps, lat 780.043 ms stddev 202.696, 0 failed
progress: 45.0 s, 12.2 tps, lat 822.944 ms stddev 239.396, 0 failed
progress: 60.0 s, 12.7 tps, lat 784.624 ms stddev 217.702, 0 failed
progress: 75.0 s, 13.0 tps, lat 768.676 ms stddev 216.321, 0 failed
...

The ten connections are running in from multiple routers. Each router is assigning the next value from its sequence within its reserved range
:

postgres_limitless=> select * from pgbench_history 
                     order by mtime limit 10
;
 tid | bid |  aid  | delta |           mtime            | filler |   id
-----+-----+-------+-------+----------------------------+--------+--------
   1 |   1 | 33344 |  3767 | 2024-11-30 13:59:36.904277 |        |      2
   5 |   1 | 53196 |  3601 | 2024-11-30 13:59:36.90429  |        | 250005
   9 |   1 | 28785 | -1508 | 2024-11-30 13:59:36.904318 |        | 250003
   5 |   1 | 21155 |   -74 | 2024-11-30 13:59:36.904355 |        |      1
   4 |   1 | 18697 |  4201 | 2024-11-30 13:59:36.904436 |        | 250001
   6 |   1 | 66855 | -2115 | 2024-11-30 13:59:36.904513 |        | 250004
   8 |   1 | 82644 |  1249 | 2024-11-30 13:59:36.904715 |        |      3
   5 |   1 | 92352 |  4698 | 2024-11-30 13:59:36.904855 |        | 250002
   3 |   1 |  1526 |  2719 | 2024-11-30 13:59:36.904982 |        | 250006
   8 |   1 |  1474 |  4422 | 2024-11-30 13:59:36.905173 |        | 250007
(10 rows)

I mentioned that I wanted to make this "id" column the primary key for the table, but this is not possible because the table is shared on another column:

postgres_limitless=> alter table pgbench_history add primary key(id);
ERROR:  PRIMARY KEY constraint on sharded table must include the shard key
DETAIL:  PRIMARY KEY constraint on table "pgbench_history" lacks column "aid" which is part of the shard key.

I can add the sharding key to the primary key definition:

postgres_limitless=> alter table pgbench_history add primary key(id,aid);
ALTER TABLE

This creates an index for fast access by an ID. In theory, there should not be duplicates thanks to the reserved sequence ranges, but the shards do not verify this.

Note that it is impossible to set a chunk size of 1 to try to get an ordered sequence:

postgres_limitless=> set rds_aurora.limitless_sequence_chunk_size to 1;
SET
postgres_limitless=> alter table pgbench_history add id bigserial;
ERROR:  MINVALUE (1) must be less than MAXVALUE (1)

Sequences should not be utilized to generate no-gap numbers. They are used to generate unique numbers in a scalable manner. Aurora Limitless achieves this by implementing sequences in the routers and utilizing reserved ranges. This approach is analogous to sequence caching, which sets aside ranges of numbers. PostgreSQL caches sequence ranges for each connection, while YugabyteDB can cache them per connection or per server node. In contrast, Aurora Limitless employs the PostgreSQL cache within a single router and incorporates an additional mechanism to reserve larger ranges for the routers.

DEV Community

Aurora Limitless - Sequences

Top comments (0)

Okay