DynamoDB vs RDS at 10K, 100K, and 1M RPS: a pre-deployment simulation comparison

#aws #dynamodb #rds #cloudarchitecture

I have made this mistake exactly once. About three years into my AWS career, I inherited a Lambda-based API with DynamoDB on the backend and was tasked with migrating it to Aurora PostgreSQL - the data model had grown relational and the team wanted proper foreign key constraints.

The migration went smoothly in UAT. We promoted to production on a Tuesday night. By Thursday morning, Lambda concurrency was exhausted, Aurora was throwing connection pool errors, and I was sitting in a war room with the CTO trying to explain why a database migration - not a code change - had caused a full API outage at 80K RPS.

I had never tested what the connection behaviour would look like at production load. I had assumed it would be fine.

It was not fine.

You cannot assume your way through database selection at scale.

The methodology

I ran a structured simulation comparison using pinpole's pre-deployment canvas. Two separate canvases - one for DynamoDB, one for RDS/Aurora - with a third for Aurora Serverless v2 as a middle-ground comparison.

Canvas topology (both configurations):

Route 53 → CloudFront → API Gateway → Lambda → [DynamoDB | RDS + Proxy]
WAF (in front of CloudFront) · SQS (write decoupling path) · ElastiCache (RDS scenario)

Note on RDS Proxy: If you're running Lambda against RDS at any significant scale without RDS Proxy managing the connection pool, you will exhaust database connections under burst load. This is essentially the architecture bug that caused my Tuesday night disaster. RDS Proxy is always present in the RDS/Aurora configurations below.

All configurations at production-realistic specs: Lambda at 1,769 MB (1 vCPU equivalent), 30-second timeout; API Gateway at 10K RPS burst limit; DynamoDB in both on-demand and provisioned modes; RDS PostgreSQL on db.r6g instances; Aurora Serverless v2 with ACU limits appropriate to each tier.

I ran four traffic patterns at each RPS level: Constant (steady baseline), Ramp (linear growth to peak over 10 minutes), Spike (sudden 10× burst), and Wave (oscillating between 30% and 100% of peak). Each run saved to execution history for comparison.

Results at 10K RPS

Configuration	p50	p99	Monthly Estimate
DynamoDB on-demand	2ms	7ms	~$8,400
DynamoDB provisioned (9K RCU / 1K WCU)	2ms	7ms	~$1,150
RDS PostgreSQL db.r6g.2xlarge + Proxy	3ms	11ms	~$870
Aurora MySQL db.r6g.2xlarge + Proxy	4ms	13ms	~$980
Aurora Serverless v2 (avg 4 ACU)	4ms	15ms	~$720

The biggest surprise at 10K RPS: DynamoDB on-demand is nearly 10× more expensive than a well-configured RDS instance for sustained, predictable traffic. DynamoDB's reputation as the "serverless database" leads engineers to assume it is cheap at modest scales. For a product with consistent diurnal load patterns, provisioned capacity is rarely the wrong answer, and on-demand is rarely the right one.

Under the Spike pattern (10K → 100K instantaneous), DynamoDB on-demand absorbed the spike without configuration changes. RDS PostgreSQL with a fixed instance showed connection pool pressure - p99 climbed to 38ms for about 90 seconds.

Results at 100K RPS

Configuration	p50	p99	Monthly Estimate
DynamoDB provisioned (auto-scaling)	2ms	9ms	~$4,200
RDS PostgreSQL db.r6g.8xlarge + Proxy	3ms	14ms	~$3,100
Aurora MySQL db.r6g.8xlarge + Proxy	4ms	16ms	~$3,400
Aurora Serverless v2 (avg 18 ACU)	4ms	19ms	~$2,900

At 100K RPS, DynamoDB on-demand becomes structurally expensive. The per-request pricing model that looks benign at 10K RPS scales linearly. Provisioned DynamoDB with auto-scaling changes the picture significantly. RDS cost is still competitive - fixed instance overhead is now amortised more efficiently.

The Spike pattern at this tier produced the most diagnostic information. DynamoDB auto-scaling took 3-7 minutes to fully respond - during which pinpole flagged elevated p99 and recommended more aggressive scale-out settings. This is behaviour you need to know before deployment.

Results at 1M RPS

Configuration	p50	p99	Monthly Estimate
DynamoDB provisioned (high WCU, DAX)	1ms	4ms	~$28,000
RDS PostgreSQL read replicas + Proxy	4ms	22ms	~$18,000
Aurora Global + Proxy	3ms	15ms	~$24,000

At 1M RPS, DynamoDB with provisioned capacity and DAX caching is competitive on cost and substantially superior on latency. The operational complexity of the RDS path has increased materially - you now need read replicas, connection pooling strategy, and careful instance sizing - while the cost gap has narrowed.

The actual decision framework

The database selection cannot be made responsibly without running the numbers at your anticipated traffic volume. The right answer at 10K RPS is sometimes the wrong answer at 100K RPS. The three factors that matter:

1. Access pattern complexity. If your queries require joins, complex filtering, or ad-hoc analytical access, RDS is the correct starting point regardless of the cost model. DynamoDB's cost advantage evaporates if you are engineering around its access pattern constraints.

2. Traffic predictability. Predictable diurnal load → provisioned DynamoDB or fixed RDS instance. Genuinely unpredictable or bursty traffic → DynamoDB on-demand or Aurora Serverless v2. Do not pay on-demand pricing for predictable traffic.

3. Scale trajectory. If you are at 10K RPS today and heading for 1M RPS in 18 months, the database you choose now needs to perform well at that scale. Running the 1M RPS simulation before making the 10K RPS decision is an hour of canvas work, not a Thursday morning war room.

Full comparison with complete per-node metrics at each tier, Aurora Serverless v2 deep-dive, and the access pattern decision matrix →