The Cloud SQL Bill That Taught Me Everything About Over-Provisioning
My database was running at 0.8% CPU utilisation.
I discovered this three months after going live, while investigating why our GCP bill seemed higher than expected for our traffic volume. The number was so low I thought there was an error in Cloud Monitoring. There wasn't.
I'd been paying for a machine that could handle roughly 100x more load than we were actually putting on it. Classic over-provisioning, but seeing it in real numbers was genuinely embarrassing.
Here's everything I learned about right-sizing Cloud SQL instances, with the specific metrics and commands that will save you from making the same mistakes.
The Real Cost of "Playing It Safe"
When you're spinning up your first production Cloud SQL instance, the console gives you a dropdown of machine types: db-f1-micro, db-n1-standard-1, db-n1-standard-2, and so on. The descriptions are helpful but vague: "1 vCPU, 3.75GB memory" tells you the specs, not whether you need them.
I picked db-n1-standard-2 because it seemed reasonable for a production database. Not too small, not excessive. The middle option. That decision was based on absolutely no data.
The problem with "reasonable" is that it's usually wrong. Either you're under-provisioned and your app breaks, or you're over-provisioned and you're burning money. In my case, it was the latter.
What the Metrics Actually Tell You
The key insight is that Cloud Monitoring shows you exactly what your database is doing. You just have to know where to look.
CPU Utilisation
This is the most important metric for right-sizing your instance.
Where to find it: Cloud Console → SQL → your instance → Monitoring tab → CPU utilization
What to look for:
- Average utilisation over the past 30 days
- P95 and P99 peaks (the highest 5% and 1% of usage)
- Time of day patterns
How to interpret it:
- Under 20% average: you can probably downgrade
- 20-50%: you're sized appropriately
- 50-80%: keep an eye on growth trends
- Over 80% sustained: consider upgrading
My average was 0.8%. My P99 was around 3%. I could have run the same workload on a db-f1-micro instance and saved roughly 70% on compute costs.
Memory Utilisation
Where to find it: Same monitoring tab → Memory utilization
What matters: You want to see consistent memory usage without swap. If memory utilisation is consistently above 90% or you're seeing any swap usage, that's a performance problem waiting to happen.
What I found: Memory usage was sitting around 15% with zero swap. Another sign I was massively over-provisioned.
Connection Count
Where to find it: Monitoring tab → Database connections
What to look for: Peak active connections compared to your instance's connection limit.
Connection limits by instance:
-
db-f1-micro: 25 connections -
db-n1-standard-1: 100 connections -
db-n1-standard-2: 200 connections
My peak connections were hitting around 11. Even a db-f1-micro would have been comfortable.
The Commands That Actually Matter
Once you know your utilisation is low, here are the specific commands to check what you're currently running and how to change it.
Check Your Current Instance Configuration
gcloud sql instances describe YOUR_INSTANCE_NAME --format="table(
name,
settings.tier,
settings.dataDiskSizeGb,
settings.availabilityType,
settings.backupConfiguration.enabled
)"
This gives you a clean summary of what you're paying for:
-
settings.tier: your machine type (the expensive part) -
settings.dataDiskSizeGb: disk size -
settings.availabilityType: whether HA is enabled -
settings.backupConfiguration.enabled: backup settings
Downgrade Your Instance Tier
If your CPU utilisation is consistently low, this is the biggest cost saving:
gcloud sql instances patch YOUR_INSTANCE_NAME --tier=db-f1-micro
Important: This will restart your instance. Plan for a few minutes of downtime.
Machine type costs (rough monthly estimates for PostgreSQL in us-central1):
-
db-f1-micro: ~$7/month -
db-n1-standard-1: ~$25/month -
db-n1-standard-2: ~$50/month -
db-n1-standard-4: ~$100/month
Moving from standard-2 to f1-micro saves around $43/month per instance. That adds up fast if you're running multiple environments.
Turn Off High Availability (Where Appropriate)
High Availability runs a standby replica in a different zone, roughly doubling your instance cost. You want this in production. You probably don't need it in staging or development.
Check if HA is enabled:
gcloud sql instances describe YOUR_INSTANCE_NAME --format="value(settings.availabilityType)"
Turn it off:
gcloud sql instances patch YOUR_INSTANCE_NAME --availability-type=ZONAL
Turn it back on:
gcloud sql instances patch YOUR_INSTANCE_NAME --availability-type=REGIONAL
This change also requires a restart, so plan accordingly.
The Storage Problem You Can't Fix Easily
Here's the frustrating part: Cloud SQL storage auto-increases but never auto-decreases. If your data grows to 50GB and then you delete 40GB, you're still paying for 50GB forever.
I had 100GB provisioned and was using 240MB. That's 0.24% utilisation. Storage isn't the most expensive part of Cloud SQL, but it's still $10/month I didn't need to spend.
Check your actual storage usage:
gcloud sql instances describe YOUR_INSTANCE_NAME --format="value(settings.dataDiskSizeGb)"
Then connect to your database and check actual usage:
-- For PostgreSQL
SELECT pg_size_pretty(pg_database_size('your_database_name'));
-- For MySQL
SELECT
table_schema AS "Database",
ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) AS "Size (MB)"
FROM information_schema.tables
GROUP BY table_schema;
The only fix for oversized storage: export your data, delete the instance, and recreate it with a smaller disk. This is disruptive enough that you probably won't do it unless the over-provisioning is severe.
The lesson: Size your initial disk conservatively. 10GB is the minimum and sufficient for most applications starting out. You can always increase it later without downtime.
Backup Configuration That Actually Makes Sense
Cloud SQL defaults to 7 days of automated backup retention. For production, that makes sense. For staging environments that get refreshed weekly, you're paying to store backups of data you'd never restore.
Check your backup settings:
gcloud sql instances describe YOUR_INSTANCE_NAME --format="table(
settings.backupConfiguration.enabled,
settings.backupConfiguration.retainedBackups,
settings.backupConfiguration.pointInTimeRecoveryEnabled
)"
Reduce backup retention for non-critical instances:
gcloud sql instances patch YOUR_INSTANCE_NAME --backup-retain-count=3
Turn off point-in-time recovery (PITR) for non-critical instances:
gcloud sql instances patch YOUR_INSTANCE_NAME --no-backup-point-in-time-recovery
PITR keeps transaction logs to allow recovery to any specific timestamp. It's useful for production but adds storage costs and complexity for environments where you'd just restore from the most recent daily backup anyway.
The Monitoring Dashboard You Should Actually Use
Instead of checking individual metrics manually, set up a custom dashboard that shows everything relevant at once.
Create a monitoring workspace (if you don't have one):
gcloud alpha monitoring dashboards create --config-from-file=dashboard-config.yaml
Dashboard configuration (dashboard-config.yaml):
displayName: "Cloud SQL Cost Optimization"
mosaicLayout:
tiles:
- width: 6
height: 4
widget:
title: "CPU Utilization"
xyChart:
dataSets:
- timeSeriesQuery:
timeSeriesFilter:
filter: 'resource.type="cloudsql_database"'
metricFilter:
filter: 'metric.type="cloudsql.googleapis.com/database/cpu/utilization"'
- width: 6
height: 4
widget:
title: "Memory Utilization"
xyChart:
dataSets:
- timeSeriesQuery:
timeSeriesFilter:
filter: 'resource.type="cloudsql_database"'
metricFilter:
filter: 'metric.type="cloudsql.googleapis.com/database/memory/utilization"'
- width: 6
height: 4
widget:
title: "Active Connections"
xyChart:
dataSets:
- timeSeriesQuery:
timeSeriesFilter:
filter: 'resource.type="cloudsql_database"'
metricFilter:
filter: 'metric.type="cloudsql.googleapis.com/database/postgresql/num_backends"'
This gives you a single view of the three metrics that matter most for cost optimization.
What I Wish I'd Known Before Clicking "Create"
The real lesson here isn't about any specific setting. It's about the mindset.
Start smaller than you think you need. Scaling up is a one-line command and a few minutes of downtime. Scaling down requires migration and planning.
Use actual data, not gut feel. Cloud Monitoring exists for a reason. If you don't have usage patterns yet, start with the smallest instance that can handle your expected load and scale up based on real metrics.
Environment-specific configuration matters. Production and staging have different availability requirements, different backup needs, and different cost tolerances. Configure them differently.
GCP defaults optimize for reliability, not cost. That's the right choice for a platform, but it means you need to actively optimize for your actual usage patterns.
The Bottom Line
My 0.8% CPU utilisation was embarrassing, but it taught me more about cloud cost optimization than months of reading best practices guides. The specific numbers forced me to understand what each metric actually means and how it translates to real money.
If you're setting up Cloud SQL for the first time, open the monitoring dashboard before you pick your instance tier. The metrics will tell you what you actually need, not what feels reasonable.
And if you're already running Cloud SQL instances, spend ten minutes checking your utilisation numbers. You might be surprised at what you find.
Top comments (0)