DEV Community

Cover image for What 23 AWS audits of Series A-C SaaS companies taught me about where the money actually leaks
Anushka B
Anushka B

Posted on • Originally published at aicloudstrategist.com

What 23 AWS audits of Series A-C SaaS companies taught me about where the money actually leaks

I run cloud cost audits for Indian SaaS founders. Over the last three months I've done 23 of them, all Series A-C, monthly AWS spend ranging from Rs 2.5 lakh to Rs 38 lakh. Here's what the data actually says about where money leaks in a well-run engineering org.

Median waste per account: $3,400/month. Not in the top 10 line items of Cost Explorer. In three places most teams don't check on a Tuesday.

Pattern 1: Savings Plan drift

Of the 23 accounts, 18 had an active Compute Savings Plan. Coverage on purchase day averaged 62%. Coverage on audit day averaged 41%.

What happens: team buys a 1-year SP sized to steady-state EC2 + Fargate + Lambda. Six months later, a new service ships on Graviton, a team migrates to ECS on Fargate, autoscaling groups grow. The commit doesn't move. On-demand spend climbs underneath the dashboard.

The fix is boring:

aws ce get-savings-plans-coverage \
  --time-period Start=2026-03-01,End=2026-04-01 \
  --granularity MONTHLY \
  --metrics SpendCoveredBySavingsPlans OnDemandCost \
  --query 'SavingsPlansCoverages[].{Coverage:Coverage.CoveragePercentage,OnDemand:Coverage.OnDemandCost.Amount}'
Enter fullscreen mode Exit fullscreen mode

Run it monthly. If coverage drops below 55%, buy a top-up SP sized to the gap. In 18 accounts this recovered $800 to $2,100 per month.

Pattern 2: Orphaned EBS and the cross-region egress you forgot about

This is where the archaeology lives.

Orphaned snapshots from terminated instances. gp2 volumes that should have been gp3 two years ago (gp3 is ~20% cheaper at the same IOPS up to 3000). And the one that keeps showing up: a replication job or log shipper quietly moving data across regions because someone set it up for DR or compliance and the config outlived the reason.

One account was paying $430/month to replicate S3 objects from us-east-1 to ap-south-1 for a DR posture they'd abandoned 14 months earlier when they consolidated into a single region.

Finding orphaned EBS:

aws ec2 describe-volumes \
  --filters Name=status,Values=available \
  --query 'Volumes[].{ID:VolumeId,Size:Size,Type:VolumeType,Created:CreateTime}' \
  --output table
Enter fullscreen mode Exit fullscreen mode

Anything in state available older than 30 days is a candidate for deletion. Snapshot it first if you're nervous. Most teams I audit have 10-40 of these.

Cross-region egress hunt:

aws ce get-cost-and-usage \
  --time-period Start=2026-03-01,End=2026-04-01 \
  --granularity MONTHLY \
  --metrics UnblendedCost \
  --filter '{"Dimensions":{"Key":"USAGE_TYPE_GROUP","Values":["EC2: Data Transfer - Inter AZ","EC2: Data Transfer - Region to Region"]}}' \
  --group-by Type=DIMENSION,Key=USAGE_TYPE
Enter fullscreen mode Exit fullscreen mode

Average finding in this bucket across 23 accounts: $600/month. Range: $80 to $2,400.

Pattern 3: Observability over-spend

The most expensive log line is the one nobody reads.

Three sub-patterns I see repeatedly:

  1. CloudWatch Logs with retention set to Never Expire or 365 days on application groups at DEBUG verbosity. One account had 2.1TB of INFO-level ALB access logs retained for 400 days. $1,900/month.

  2. Datadog or New Relic ingesting every custom metric from every pod with high cardinality tags. Cardinality of user_id as a metric tag scales linearly with your user base. One account had 180,000 unique metric series.

  3. X-Ray or APM tracing at 100% sample rate in production. 100% sampling is a staging default that leaked to prod and stayed there.

To check log retention across a region in one go:

aws logs describe-log-groups \
  --query 'logGroups[?retentionInDays==`null` || retentionInDays>`30`].{Name:logGroupName,Retention:retentionInDays,StoredBytes:storedBytes}' \
  --output table
Enter fullscreen mode Exit fullscreen mode

Set retention to 14-30 days on anything that isn't an audit or compliance log. Move compliance logs to S3 with a lifecycle rule to Glacier at day 30. Cost drops by 60-80% on the observability line.

The pattern behind the patterns

None of these are architecture problems. They're attention problems. Every engineering org I audit has someone who could fix these in an afternoon. What's missing is the trigger to look.

A quarterly external review catches all three before they compound. If you're curious what yours looks like, I run a priority audit at Rs 2,000 (~$25) via Razorpay with a 48-hour turnaround. Bring a Cost Explorer CSV and I'll tell you where your $3,400 is hiding.

https://aicloudstrategist.com/audit

  • Anushka B, founder, AICloudStrategist

Top comments (0)