I have made this mistake exactly once. About three years into my AWS career, I inherited a Lambda-based API with DynamoDB on the backend and was tasked with migrating it to Aurora PostgreSQL - the data model had grown relational and the team wanted proper foreign key constraints.
The migration went smoothly in UAT. We promoted to production on a Tuesday night. By Thursday morning, Lambda concurrency was exhausted, Aurora was throwing connection pool errors, and I was sitting in a war room with the CTO trying to explain why a database migration - not a code change - had caused a full API outage at 80K RPS.
I had never tested what the connection behaviour would look like at production load. I had assumed it would be fine.
It was not fine.
You cannot assume your way through database selection at scale.
The methodology
I ran a structured simulation comparison using pinpole's pre-deployment canvas. Two separate canvases - one for DynamoDB, one for RDS/Aurora - with a third for Aurora Serverless v2 as a middle-ground comparison.
Canvas topology (both configurations):
Route 53 → CloudFront → API Gateway → Lambda → [DynamoDB | RDS + Proxy]
WAF (in front of CloudFront) · SQS (write decoupling path) · ElastiCache (RDS scenario)
Note on RDS Proxy: If you're running Lambda against RDS at any significant scale without RDS Proxy managing the connection pool, you will exhaust database connections under burst load. This is essentially the architecture bug that caused my Tuesday night disaster. RDS Proxy is always present in the RDS/Aurora configurations below.
All configurations at production-realistic specs: Lambda at 1,769 MB (1 vCPU equivalent), 30-second timeout; API Gateway at 10K RPS burst limit; DynamoDB in both on-demand and provisioned modes; RDS PostgreSQL on db.r6g instances; Aurora Serverless v2 with ACU limits appropriate to each tier.
I ran four traffic patterns at each RPS level: Constant (steady baseline), Ramp (linear growth to peak over 10 minutes), Spike (sudden 10× burst), and Wave (oscillating between 30% and 100% of peak). Each run saved to execution history for comparison.
Results at 10K RPS
| Configuration | p50 | p99 | Monthly Estimate |
|---|---|---|---|
| DynamoDB on-demand | 2ms | 7ms | ~$8,400 |
| DynamoDB provisioned (9K RCU / 1K WCU) | 2ms | 7ms | ~$1,150 |
| RDS PostgreSQL db.r6g.2xlarge + Proxy | 3ms | 11ms | ~$870 |
| Aurora MySQL db.r6g.2xlarge + Proxy | 4ms | 13ms | ~$980 |
| Aurora Serverless v2 (avg 4 ACU) | 4ms | 15ms | ~$720 |
The biggest surprise at 10K RPS: DynamoDB on-demand is nearly 10× more expensive than a well-configured RDS instance for sustained, predictable traffic. DynamoDB's reputation as the "serverless database" leads engineers to assume it is cheap at modest scales. For a product with consistent diurnal load patterns, provisioned capacity is rarely the wrong answer, and on-demand is rarely the right one.
Under the Spike pattern (10K → 100K instantaneous), DynamoDB on-demand absorbed the spike without configuration changes. RDS PostgreSQL with a fixed instance showed connection pool pressure - p99 climbed to 38ms for about 90 seconds.
Results at 100K RPS
| Configuration | p50 | p99 | Monthly Estimate |
|---|---|---|---|
| DynamoDB provisioned (auto-scaling) | 2ms | 9ms | ~$4,200 |
| RDS PostgreSQL db.r6g.8xlarge + Proxy | 3ms | 14ms | ~$3,100 |
| Aurora MySQL db.r6g.8xlarge + Proxy | 4ms | 16ms | ~$3,400 |
| Aurora Serverless v2 (avg 18 ACU) | 4ms | 19ms | ~$2,900 |
At 100K RPS, DynamoDB on-demand becomes structurally expensive. The per-request pricing model that looks benign at 10K RPS scales linearly. Provisioned DynamoDB with auto-scaling changes the picture significantly. RDS cost is still competitive - fixed instance overhead is now amortised more efficiently.
The Spike pattern at this tier produced the most diagnostic information. DynamoDB auto-scaling took 3-7 minutes to fully respond - during which pinpole flagged elevated p99 and recommended more aggressive scale-out settings. This is behaviour you need to know before deployment.
Results at 1M RPS
| Configuration | p50 | p99 | Monthly Estimate |
|---|---|---|---|
| DynamoDB provisioned (high WCU, DAX) | 1ms | 4ms | ~$28,000 |
| RDS PostgreSQL read replicas + Proxy | 4ms | 22ms | ~$18,000 |
| Aurora Global + Proxy | 3ms | 15ms | ~$24,000 |
At 1M RPS, DynamoDB with provisioned capacity and DAX caching is competitive on cost and substantially superior on latency. The operational complexity of the RDS path has increased materially - you now need read replicas, connection pooling strategy, and careful instance sizing - while the cost gap has narrowed.
The actual decision framework
The database selection cannot be made responsibly without running the numbers at your anticipated traffic volume. The right answer at 10K RPS is sometimes the wrong answer at 100K RPS. The three factors that matter:
1. Access pattern complexity. If your queries require joins, complex filtering, or ad-hoc analytical access, RDS is the correct starting point regardless of the cost model. DynamoDB's cost advantage evaporates if you are engineering around its access pattern constraints.
2. Traffic predictability. Predictable diurnal load → provisioned DynamoDB or fixed RDS instance. Genuinely unpredictable or bursty traffic → DynamoDB on-demand or Aurora Serverless v2. Do not pay on-demand pricing for predictable traffic.
3. Scale trajectory. If you are at 10K RPS today and heading for 1M RPS in 18 months, the database you choose now needs to perform well at that scale. Running the 1M RPS simulation before making the 10K RPS decision is an hour of canvas work, not a Thursday morning war room.
Full comparison with complete per-node metrics at each tier, Aurora Serverless v2 deep-dive, and the access pattern decision matrix →
Top comments (0)