DynamoDB-style Limits for Predictable SQL Performance?

#aws #postgres #distributed #database

I compare Aurora DSQL to DynamoDB rather than other SQL databases. One reason DynamoDB ensures predictable performance is due to its many limits. In contrast to SQL databases, where a single connection can consume a lot of memory, lock multiple tables, and overload the CPU, DynamoDB restricts operations to get a single document, query a single collection, or scan a table.

Aurora DSQL requires only one synchronization per commit, which reduces the impact of multi-region latency. However, DML operations affecting an unbounded number of rows cannot be executed in a single flush. For comparison, YugabyteDB handles simple queries similarly because all writes are buffered and then flushed at once to minimize Raft consensus delays. However, if you insert ten thousand rows, the buffer may flush multiple times, as buffers need to be limited in size to prevent excessive memory usage.

For instance, when I insert one thousand rows in YugabyteDB, I observe a single Flush Request for those thousand Write Requests, which takes 19.848 ms. This latency is similar to the single-row latency mentioned in a previous post when synchronizing between us-east-1 and us-east-2 and is the same for YugabyteDB and Aurora DSQL:

yugabyte=#  explain (analyze, dist, costs off)
           insert into demo (a) select generate_series(1,1000)
;
                                   QUERY PLAN
--------------------------------------------------------------------------------
 Insert on demo (actual time=6.540..6.540 rows=0 loops=1)
   ->  Subquery Scan on "*SELECT*" (actual time=0.018..2.141 rows=1000 loops=1)
         ->  ProjectSet (actual time=0.003..0.103 rows=1000 loops=1)
               ->  Result (actual time=0.001..0.001 rows=1 loops=1)
                     Storage Table Write Requests: 1
 Planning Time: 0.037 ms
 Execution Time: 39.303 ms
 Storage Read Requests: 0
 Storage Rows Scanned: 0
 Storage Write Requests: 1000
 Catalog Read Requests: 0
 Catalog Write Requests: 0
 Storage Flush Requests: 1
 Storage Flush Execution Time: 31.922 ms
 Storage Execution Time: 31.922 ms
 Peak Memory Usage: 56 kB
(16 rows)

Inserting a single or a thousand rows results in similar predictable response times.

When I insert one million rows into a YugabyteDB table, 326 flush requests are generated. This happens because the write buffer is set to 3072, defined by ysql_session_max_batch_size.
As a result, the process takes a few seconds to complete as each flush waits for a Raft sync:

yugabyte=# explain (analyze, dist, costs off)
           insert into demo (a) select generate_series(1,1000000)
;
                                      QUERY PLAN
--------------------------------------------------------------------------------------
 Insert on demo (actual time=12229.075..12229.076 rows=0 loops=1)
   ->  Subquery Scan on "*SELECT*" (actual time=0.019..2902.648 rows=1000000 loops=1)
         ->  ProjectSet (actual time=0.003..145.900 rows=1000000 loops=1)
               ->  Result (actual time=0.001..0.001 rows=1 loops=1)
                     Storage Table Write Requests: 1
 Planning Time: 0.039 ms
 Execution Time: 12301.987 ms
 Storage Read Requests: 0
 Storage Rows Scanned: 0
 Storage Write Requests: 1000000
 Catalog Read Requests: 0
 Catalog Write Requests: 0
 Storage Flush Requests: 326
 Storage Flush Execution Time: 153.357 ms
 Storage Execution Time: 153.357 ms
 Peak Memory Usage: 56 kB
(16 rows)

There are additional reasons to flush before the buffer is full, as I outlined in my article on Write Buffering to Reduce Raft Consensus Latency in YugabyteDB. However, these reasons pertain to features unavailable in Aurora DSQL, such as Read Committed isolation level, Savepoints, PL/pgSQL, and Triggers. Therefore, I will not include them in this comparison. Let's focus on the primary reason: the number of rows in a buffer. How does Aurora DSQL ensure a single flush?

I've run the same on Aurora DSQL:

postgres=> create table demo (
 id uuid default gen_random_uuid() primary key
 , a int, b int, c int
);
CREATE TABLE
Time: 310.909 ms

postgres=> insert into demo (a) select generate_series(1,1000000);
ERROR:  54000: transaction row limit exceeded
Time: 93.550 ms

This may surprise SQL database users, but it's familiar to those who work with DynamoDB: there are limits on the number of writes you can perform in a single call or transaction. While YugabyteDB can handle larger data sets by proportionally increasing the response time through multiple flushes, Aurora DSQL rejects transactions that cannot be completed in one flush.

These limitations are well-documented, and I have come across a specific row limit of 1,000 described in the quotas section:

docs.aws.amazon.com

This number is approximate. I can insert a maximum of 10070 rows in my table in one transaction:

postgres=> insert into demo (a) select generate_series(1,10000);
INSERT 0 10000
Time: 253.437 ms

postgres=> insert into demo (a) select generate_series(1,10070);
INSERT 0 10070
Time: 249.032 ms

postgres=> insert into demo (a) select generate_series(1,10071);
ERROR:  54000: transaction row limit exceeded
Time: 94.322 ms

I face the same limitation with other DML statements. For instance, a delete operation will begin to write delete intents for 10000 rows and then fail.

postgres=> delete from demo;
ERROR:  54000: transaction row limit exceeded
Time: 4990.719 ms (00:04.991)

I need to delete in multiple batches, which will ultimately take the same amount of time as multiple flushes, but controlled by the application rather than the database. However, this method has the advantage of providing user feedback, indicating that it may not be a recommended operation on this database.

postgres=> with ids as (select id from demo limit 10000)
           delete from demo where id in (select id from ids where id is not null);
DELETE 10000
Time: 48810.792 ms (00:48.811)

postgres=> with ids as (select id from demo limit 10000)
           delete from demo where id in (select id from ids where id is not null);
DELETE 10000
Time: 37988.087 ms (00:37.988)

postgres=> with ids as (select id from demo limit 10000)
           delete from demo where id in (select id from ids where id is not null);
DELETE 70
Time: 328.073 ms

The tables in Aurora DSQL, like in YugabyteDB, are stored in their primary key index. When a row is inserted, if there are secondary indexes, each insertion will also create an index entry, which counts toward the row limit:

postgres=> create index on demo(a);
CREATE INDEX
Time: 248.525 ms

postgres=> create index on demo(b);
CREATE INDEX
Time: 76.789 ms

With the two new indexes, the limit drops to a maximum of 3,356 rows:

postgres=> insert into demo (a) select generate_series(1,10000);
ERROR:  54000: transaction row limit exceeded
Time: 85.142 ms

postgres=> insert into demo (a) select generate_series(1,3357);
ERROR:  54000: transaction row limit exceeded
Time: 56.523 ms

postgres=> insert into demo (a) select generate_series(1,3356);
INSERT 0 3356
Time: 154.182 ms

Can you guess how many rows I can update by changing the value of one indexed column? Each row has three write intents: one for the table and two for the index. Updating an index key is deleting the existing index entry and adding a new one. Therefore, updating this indexed column is equivalent to updating two indexes.

postgres=> with ids as (select id from demo limit 3357)
           update demo set b=42 where id in (select id from ids where id is not null)
;
ERROR:  54000: transaction row limit exceeded
Time: 11409.024 ms (00:11.409)
postgres=> with ids as (select id from demo limit 3356)
           update demo set b=42 where id in (select id from ids where id is not null)
;
UPDATE 3356
Time: 11477.928 ms (00:11.478)
postgres=>

It's not easy to guess because it depends on the updated values. If the value does not change, there's no need to update the index key.

During this test, I encountered additional limitations. However, these limitations are not architectural. They simply have not been implemented yet in this preview.
One issue I faced was trying to create a partial index.

postgres=> create index on demo(c) where b is not null ;
ERROR:  0A000: WHERE not supported for CREATE INDEX

The other issue was creating an index before deleting the rows:

postgres=> create index on demo(c);
ERROR:  0A000: adding an index on a table that already has data is not supported yet

During this preview, it's essential to differentiate between the limitations that will be lifted when Aurora DSQL becomes Generally Available and those related to architectural trade-offs.

When I compare Aurora DSQL to DynamoDB, I'm thinking about the developer experience rather than the technology. DynamoDB users appreciate the limits in place that help prevent runaway queries. In contrast, SQL users may find such limitations uncomfortable.
A NoSQL database requires upfront design and typically serves a single use case.
In contrast, SQL databases are more flexible and can adapt to evolving applications, allowing for the addition of new indexes to support new use cases. However, in the case of Aurora DSQL, making such changes can disrupt existing use cases.

This database service is intended to be used with one database per microservice. This approach aligns with the principles of a serverless database and microservices. For general-purpose SQL databases, there are other managed services on AWS, which I described in a previous post: