Last month I analyzed 249 EC2 instances across 3 AWS accounts for a mid-market fintech company. The result? $256,000/year in potential savings. That's $21,000 every month going to waste.
The scary part: this isn't unusual. According to Flexera's 2024 State of the Cloud Report, 32% of cloud spend is wasted. Most companies have no idea they're overpaying.
Here are the 5 most common culprits I find in almost every AWS account—and how to fix them.
1. Oversized EC2 Instances
The problem: You launched a t3.xlarge because you weren't sure what you'd need. Now it's been running at 5% CPU for 6 months.
How common: In my analysis, 70-80% of instances were oversized by at least one size class.
How to detect:
# Check average CPU over last 14 days
aws cloudwatch get-metric-statistics \
--namespace AWS/EC2 \
--metric-name CPUUtilization \
--dimensions Name=InstanceId,Value=i-xxxxx \
--start-time $(date -d '14 days ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date +%Y-%m-%dT%H:%M:%S) \
--period 86400 \
--statistics Average
The fix: If average CPU is below 20% for 2+ weeks, downsize. A t3.medium at $30/month vs t3.xlarge at $120/month adds up fast.
Potential savings: 50-75% per instance
2. Unattached EBS Volumes
The problem: You terminated an instance but forgot to delete its volumes. Now you're paying for storage nobody uses.
How common: I typically find 5-15 orphan volumes per account.
How to detect:
aws ec2 describe-volumes \
--filters Name=status,Values=available \
--query 'Volumes[*].[VolumeId,Size,CreateTime]' \
--output table
If a volume shows status: available, it's not attached to anything.
The fix: Delete them. But first, create a snapshot if you're paranoid (snapshots are ~$0.05/GB/month vs $0.10/GB/month for volumes).
Potential savings: $0.10/GB/month per volume
3. Old EBS Snapshots
The problem: Automated backups create snapshots daily. Nobody deletes them. A year later you have 365 snapshots of the same volume.
How to detect:
aws ec2 describe-snapshots \
--owner-ids self \
--query 'Snapshots[?StartTime<=`2024-01-01`].[SnapshotId,VolumeSize,StartTime]' \
--output table
The fix:
- Keep the last 7-30 days (depending on your compliance needs)
- Delete everything older
- Set up a lifecycle policy with AWS Data Lifecycle Manager
Potential savings: Can be thousands/month for active accounts
4. EBS gp2 Volumes (Instead of gp3)
The problem: gp3 launched in 2020 with 20% lower base cost AND better performance. Yet most accounts still have gp2 volumes because "it works, why change it?"
How common: In my analysis, 60%+ of volumes were still gp2.
How to detect:
aws ec2 describe-volumes \
--query 'Volumes[?VolumeType==`gp2`].[VolumeId,Size,VolumeType]' \
--output table
The fix: Migrate to gp3. It's a live operation, no downtime required:
aws ec2 modify-volume \
--volume-id vol-xxxxx \
--volume-type gp3
Potential savings: 20% per volume, zero effort
5. Idle RDS Instances
The problem: A dev database someone created for testing 8 months ago. A staging DB that nobody uses anymore. They're running 24/7.
How common: 1-3 per account, often in non-prod environments.
How to detect: Check connections over the last 14 days:
aws cloudwatch get-metric-statistics \
--namespace AWS/RDS \
--metric-name DatabaseConnections \
--dimensions Name=DBInstanceIdentifier,Value=my-db \
--start-time $(date -d '14 days ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date +%Y-%m-%dT%H:%M:%S) \
--period 86400 \
--statistics Maximum
If max connections = 0 for 2 weeks, it's idle.
The fix:
- For dev/test: Stop it (you can start it when needed)
- For truly unused: Take a final snapshot and delete it
- Consider Aurora Serverless for dev workloads (scales to zero)
Potential savings: $50-500/month per idle instance
Real-World Example: Mid-Market Fintech
Here's a real analysis I ran recently for a fintech company with 3 AWS accounts:
| Metric | Value |
|---|---|
| EC2 instances analyzed | 249 |
| Recommendations generated | 72 |
| Monthly savings potential | $21,340 |
| Annual savings potential | $256,076 |
The breakdown:
- Oversized instances: 60%+ were running at <20% average CPU
- gp2 → gp3 migrations: Immediate 20% savings, zero downtime required
- Orphan resources: Multiple unattached volumes and outdated snapshots
This aligns with industry benchmarks—Flexera reports 32% of cloud spend is typically wasted, and Gartner finds 2-3x overprovisioning is common.
The Real Problem
Running these checks manually is tedious. And even if you do it once, waste accumulates again within weeks.
That's why I built CloudPruneAI—it scans your AWS accounts automatically and generates Infrastructure as Code (CDK) to implement the fixes. Instead of a report you'll forget about, you get deployable code.
But even if you never use a tool, run these 5 checks today. You might be surprised what you find.
What's the biggest AWS cost surprise you've discovered? I'd love to hear your stories in the comments.
Top comments (4)
I love how you not only presented the problem but showed the examples on how to fix them. We're not running in AWS but this is a common problem across all providers like you mention.
We're not running on AWS but I have worked at other places before and a personal one for me was when I left a m5.6xlarge on over the long weekend......
Hi @travis Thanks for your feedback, happy to know You enjoy the post. You mentioned You are not running on AWS, where are You running? Do You think this kind of tools can be usefull Cross-plataform? I mean, Azure, GCP, Oracle. And maybe add terraform as output code?
Hey! We're on GCP right now, running cloud run (cause lazy) but if things start taking off for us we'll look to move into GKE and start breaking the app into smaller deployable modules. In regards to cross-platform you better believe people are wasting money and with many people tighting the spending belt they're going to be looking for ways to save money and infra costs is an easy way to do that. Do ya'll monitor logging costs too? I know at a former company we once got up to 40k a month in logs. Another thought tho is what edge do you have over just going to the billing dashboard?
Hey Travis! Great questions.
About logging costs: Yes! CloudWatch Logs is actually one of our main targets. We detect log groups without retention policies (the silent killer - logs accumulating forever). Your $40k/month story is exactly what we help prevent.
On the edge over billing dashboard: The billing dashboard tells you what you're spending, but not why or how to fix it. Cloudpruneai goes deeper:
We analyze resource utilization (CPU, memory, connections) not just cost
We identify the root cause (oversized instance vs zombie vs missing lifecycle policy)
Most importantly: we generate deployable CDK code to fix it
So instead of "you spent $5K on EC2" → you get "instance i-abc123 is using 8% CPU, here's the code to downsize it to t3.medium and save $840/year"
About multi-cloud/Terraform, it's definitely on our radar. AWS + CDK first to nail the experience, then expand. Would GCP + Terraform be interesting for your team when you scale to GKE?