Cloud Database Cost Optimization: RDS, Cloud SQL, and Cosmos DB Compared

#cloud #database #cost #optimization

Compute costs are visible and easy to reason about: vCPUs times hours times price. Database costs are different. Each managed database platform has its own pricing model with its own hidden multipliers, and the levers that reduce cost on RDS have almost nothing to do with the levers that reduce cost on Cosmos DB.

Most teams approach database cost optimization by right-sizing instances. That helps, but it is rarely the biggest lever. The bigger problems are structural: Multi-AZ enabled on databases that never need failover, non-production databases running at 3am on a Sunday, Cosmos DB containers provisioned for anticipated peak load that arrived six months ago and left, Cloud SQL storage that grew automatically and can never shrink.

This piece covers where the money actually goes on each platform, and the specific changes that reduce it.

Why Database Bills Are Harder to Optimize Than Compute

Compute instance costs are transparent. Database costs have multipliers embedded in the pricing model that are easy to miss when you first configure an instance.

Platform	Base pricing unit	HA multiplier	Storage behavior	Hidden cost
Amazon RDS	Instance class (vCPU/RAM) per hour	2x for Multi-AZ	gp2/gp3/io1, auto-grow optional	Cross-AZ data transfer at $0.01/GB
Google Cloud SQL	Instance tier (vCPU/RAM) per hour	~2x for High Availability	Auto-grows, never auto-shrinks	Egress charges for cross-region replicas
Azure Cosmos DB	Request Units per second (RU/s)	N/A	$0.25/GB-month	Region multiplier: N write regions = N x RU cost

The HA multiplier on RDS and Cloud SQL is the most commonly misapplied cost driver. Multi-AZ doubles your instance cost in exchange for automatic failover to a standby replica. For production databases serving user traffic, that is a reasonable trade. For a development database that a single engineer uses 4 hours per day, it is $50 to $200 per month in pure waste.

Cosmos DB's region multiplier is less intuitive. Multi-region writes replicate every write to every write region in real time. Three write regions means three times the RU cost. Most teams that enable this do so for read latency improvements, not write availability, and could achieve the same result with single-write-region plus read replicas at a fraction of the cost.

Amazon RDS: Where the Money Goes and How to Cut It

An RDS bill has four components: instance hours, storage, IOPS (if using provisioned IOPS storage), and data transfer. Instance hours dominate for most workloads.

Non-production scheduling is the first cut to make. A db.t3.medium Multi-AZ instance in us-east-1 costs $97/month running continuously. Running it only during business hours (9 hours per day, 22 working days per month) drops that to $28/month: a 71% reduction with zero architectural change. Start and stop operations take under 2 minutes for RDS, and the instance retains all data and configuration. zopnight automates this scheduling across environments so no engineer has to remember to stop databases at end of day.

Disable Multi-AZ on non-production instances. This is a single checkbox change that saves exactly 50% of instance cost. Development, staging, and QA environments do not need automatic failover. If the database goes down, engineers wait for it to restart. A db.r6g.large Multi-AZ instance at $371/month becomes a single-AZ instance at $185/month.

Configuration	Monthly cost	Annual cost	Notes
db.t3.medium Multi-AZ, always-on	$97	$1,164	Common non-prod default
db.t3.medium Single-AZ, always-on	$48	$576	Disable Multi-AZ
db.t3.medium Single-AZ, scheduled (9hr/day weekdays)	$14	$168	Add scheduling
db.t3.medium Single-AZ, 1-year Reserved	$30	$360	Reserved, always-on

Migrate gp2 storage to gp3. gp2 IOPS scale with storage size (3 IOPS per GB, minimum 100, maximum 16,000). A 1TB gp2 volume provides 3,000 IOPS. A 1TB gp3 volume provides 3,000 IOPS baseline, but you can provision up to 16,000 IOPS independently for $0.20 per provisioned IOPS-month above the 3,000 baseline. For databases that need 3,000 IOPS or fewer, gp3 is the same price as gp2 at the same storage size. For databases that were running gp2 at large storage sizes purely to get more IOPS, gp3 allows you to right-size the storage volume separately from IOPS, reducing storage costs by 20 to 40%.

Buy Reserved Instances for stable production databases. RDS Reserved Instances for a 1-year, no-upfront commitment deliver 36% savings over on-demand. A 3-year, all-upfront commitment delivers 69% savings. For any production RDS instance that has been running for 6 months with no expected changes to instance class, Reserved Instances are the lowest-effort, highest-return optimization available.

Google Cloud SQL: Committed Use and the Storage Trap

Cloud SQL pricing is structurally similar to RDS: you pay for the instance tier, high availability, and storage. The two differences that matter for cost optimization are Committed Use Discounts (which work differently from RDS Reserved Instances) and the storage auto-grow trap.

Committed Use Discounts (CUDs) apply automatically when you commit to a minimum spend level for 1 or 3 years. Unlike RDS Reserved Instances, you do not select specific instance types. You commit to a spend amount and Cloud SQL applies the discount across matching resource usage. A 1-year CUD saves 25%. A 3-year CUD saves 52%. For a db-n1-standard-8 instance running at $486/month on-demand, a 3-year CUD reduces that to $233/month, saving $3,036 per year on a single instance.

Instance	On-demand/month	1-year CUD/month	3-year CUD/month	3-year annual saving
db-n1-standard-4	$243	$182	$117	$1,512
db-n1-standard-8	$486	$365	$233	$3,036
db-n1-highmem-8	$535	$401	$257	$3,336

The storage auto-grow trap is less obvious. Cloud SQL can automatically increase storage when a database approaches capacity. This is a useful safety feature. The problem is that Cloud SQL storage never automatically decreases. A database that ingested a large dataset during a migration, then deleted it, still pays for the peak storage size permanently. The only way to reduce it is to create a new Cloud SQL instance with smaller storage and migrate data to it.

Audit your Cloud SQL instances for the gap between allocated storage and used storage. Instances where used storage is below 40% of allocated storage are candidates for recreation with right-sized storage. A 500GB instance at $85/month that is actually using 80GB costs $68/month more than necessary. That is $816/year per instance.

Query optimization reduces required instance size. Cloud SQL Query Insights is a free tool that identifies slow queries, lock contention, and missing indexes. Teams that review slow queries monthly and add missing indexes consistently reduce their required instance tier by one level within 90 days. Moving from db-n1-standard-8 to db-n1-standard-4 saves $243/month ($2,916/year) per instance with no change to application code.

Cosmos DB: Request Units Are a Trap for the Unprepared

Cosmos DB does not charge for instances. It charges for Request Units per second (RU/s), which is an abstraction over the compute and memory required to serve your query patterns. Every database operation consumes RUs: a point read of a 1KB document costs 1 RU, a cross-partition query can cost 100 RUs or more depending on its complexity and the data it scans.

There are three pricing modes, and choosing the wrong one for your traffic pattern is the most common Cosmos DB cost mistake.

Workload	Manual provisioned	Autoscale	Serverless
400 RU/s constant	$23/month	$35/month	varies by usage
400 RU/s average, 4000 RU/s peak	$233/month	$58/month	depends on request count
4000 RU/s constant	$233/month	$350/month	not cost-effective
40,000 RU/s constant	$2,336/month	$3,504/month	not applicable (5000 RU/s limit)

Serverless Cosmos DB charges $0.25 per million RUs consumed, with no minimum. For a non-production database receiving 2 million requests per day at an average of 2 RUs each, that is 4 million RUs per day, costing $1/day or $30/month. The same workload on minimum provisioned throughput (400 RU/s) costs $23/month regardless of actual usage. Serverless is cheaper until your database is consistently using 90% of minimum provisioned throughput.

The multi-region write multiplier is where teams get surprised by large bills. Enabling multi-region writes replicates every write to every write region and multiplies your RU cost by the number of write regions. A container provisioned at 10,000 RU/s with 3 write regions costs $1,752/month. The same container with 1 write region and 2 read regions costs $584/month for the provisioned throughput plus $0.08/GB for replication transfer. For most use cases, single-write-region with read replicas achieves the latency goals at one-third the cost.

The Universal Win: Non-Production Database Scheduling

Every platform covered in this piece has one optimization that requires no application changes, no architectural decisions, and no commitment: stop non-production databases when nobody is using them.

Development, staging, and QA databases typically serve engineers during business hours. They run the other 16 hours of the day because nobody turned them off. They run all weekend because the script to restart them is manual and nobody wants to deal with it Monday morning.

Platform	Instance	Always-on/month	Scheduled (9hr/day, weekdays)/month	Monthly saving	Annual saving
RDS	db.t3.medium Multi-AZ	$97	$28	$69	$828
RDS	db.r6g.large Multi-AZ	$371	$107	$264	$3,168
Cloud SQL	db-n1-standard-4 HA	$486	$140	$346	$4,152
Cosmos DB	400 RU/s provisioned	$23	$7	$16	$192

For a team running 5 non-production RDS instances of mixed sizes, scheduling alone typically saves $800 to $1,500 per month. For Cloud SQL, the numbers are similar. For Cosmos DB serverless, there is no instance to stop: you only pay for what you use, which is another reason serverless is the right choice for non-production Cosmos DB workloads.

The implementation barrier is operational. Someone has to remember to stop databases, and then start them again. Automated scheduling removes that barrier entirely.

An Optimization Priority Order for Each Platform

Not all optimizations are equal. Here is the sequence that produces the most savings with the least risk, for each platform.

Platform	Priority	Action	Estimated saving	Risk	Effort
RDS	1	Schedule non-prod instances (stop/start)	60-71% of non-prod cost	None	Low
RDS	2	Disable Multi-AZ on non-prod	50% of non-prod instance cost	None	Low
RDS	3	Migrate gp2 storage to gp3	20-40% of storage cost	Low	Low
RDS	4	Purchase Reserved Instances for prod	36-69% of prod instance cost	Low	Low
Cloud SQL	1	Schedule non-prod instances	60-71% of non-prod cost	None	Low
Cloud SQL	2	Disable HA on non-prod	~50% of non-prod instance cost	None	Low
Cloud SQL	3	Purchase 3-year Committed Use Discount	52% of prod instance cost	Low	Low
Cloud SQL	4	Recreate over-provisioned storage instances	$68-136/month per instance	Medium	Medium
Cosmos DB	1	Switch non-prod to serverless mode	Eliminates idle RU cost	None	Low
Cosmos DB	2	Switch from multi-region write to single-write + read replicas	50-67% of throughput cost	Medium	Medium
Cosmos DB	3	Move variable-load prod containers to autoscale	20-40% vs manual provisioned	Low	Low
Cosmos DB	4	Audit and add missing indexes to reduce RU consumption	15-40% of query RU cost	Low	Medium

The pattern across all three platforms is the same: the highest-impact optimizations are operational (scheduling, disabling unnecessary HA), not architectural (right-sizing, instance migration). Start there. The architectural changes follow once you have eliminated the structural waste.

Database cost optimization is not a one-time audit. Workloads change, teams add new environments, and provisioned capacity drifts upward over time. Schedule a quarterly review of non-production environment count, instance sizes, and pricing mode choices. The savings compound.