Why Your AWS Bill Doubled Overnight (And How to Plug the Leaks)
We've all been there.
You open the AWS Billing Dashboard, expecting the usual $50–$100, only to see a vertical spike that looks like a mountain range. The immediate reaction is:
"We must have massive traffic!"
But let's be real — traffic rarely doubles overnight. Your misconfigurations, however, certainly can.
If you're staring down a bill that's spiraling out of control, here is your emergency checklist to find the invisible drains on your budget.
1. The NAT Gateway "Processing" Trap
NAT Gateways are the silent killers of AWS budgets. You aren't just paying for the uptime — you're paying for every gigabyte that passes through.
The Leak:
Sending high-bandwidth internal traffic (like S3 uploads) through a NAT Gateway instead of using a VPC Endpoint.
The Fix:
Use VPC Endpoints for S3 and DynamoDB to keep that traffic off the expensive NAT "highway."
# Check your NAT Gateway data transfer costs
aws ec2 describe-nat-gateways --query 'NatGateways[*].{ID:NatGatewayId,State:State}'
A single misconfigured service pushing gigabytes through NAT can silently add hundreds of dollars to your bill.
2. Cross-AZ Data Transfer — The Invisible Tax
High availability is great, but cross-Availability Zone (AZ) traffic comes with a literal invisible tax.
The Leak:
Your app server in us-east-1a is constantly chatting with a database in us-east-1b.
The Fix:
Keep your "chatty" services within the same AZ where possible, or use Service Discovery to prioritize local traffic.
# Check which AZ your instances are running in
aws ec2 describe-instances --query 'Reservations[*].Instances[*].{ID:InstanceId,AZ:Placement.AvailabilityZone}'
3. Ghost EBS Volumes
When you terminate an EC2 instance, the Elastic Block Store (EBS) volume doesn't always go away with it.
The Leak:
"Unattached" volumes sitting in your console, doing absolutely nothing except costing you monthly rent.
The Fix:
Go to EC2 Console → Volumes → Filter by State = Available
# Find all unattached EBS volumes via CLI
aws ec2 describe-volumes --filters Name=status,Values=available \
--query 'Volumes[*].{ID:VolumeId,Size:Size,State:State}'
If it's not In-use — delete it or snapshot it and move on.
4. Broken Auto Scaling
Auto Scaling is designed to save you money, but it only works if it knows how to breathe.
The Leak:
Your "Scale Up" policy works perfectly during peak hours, but your "Scale Down" policy is either missing or blocked by a single stuck process.
The Fix:
Audit your CloudWatch alarms. Ensure your cooldown periods aren't too long and that your termination policies are actually firing.
# List your Auto Scaling groups and their activities
aws autoscaling describe-scaling-activities --auto-scaling-group-name your-group-name
5. The CloudWatch Ingestion Spike
Logs are vital — until they cost more than the app they're monitoring.
The Leak:
You left a service in Debug mode, and now you're paying for terabytes of CloudWatch log ingestion.
The Fix:
Set a retention policy. Don't keep logs for "Forever" by default.
# Set a 30-day retention policy on a log group
aws logs put-retention-policy \
--log-group-name /your/log/group \
--retention-in-days 30
14 to 30 days is usually plenty for dev environments.
6. S3 Without a Lifecycle Policy
Storage is cheap — but it's not free.
The Leak:
Storing every version of every file in Standard Storage for years with no cleanup plan.
The Fix:
Implement S3 Lifecycle Policies to move old data automatically.
{
"Rules": [{
"Status": "Enabled",
"Transitions": [{
"Days": 30,
"StorageClass": "STANDARD_IA"
}, {
"Days": 90,
"StorageClass": "GLACIER"
}]
}]
}
Move to Infrequent Access after 30 days. Move to Glacier after 90. Your future self will thank you.
7. Idle Load Balancers
An ALB (Application Load Balancer) costs roughly $16–$20/month just to exist — even if nothing is using it.
The Leak:
Leftover load balancers from a project or staging environment you forgot to tear down.
The Fix:
# Find load balancers with no targets
aws elbv2 describe-load-balancers \
--query 'LoadBalancers[*].{Name:LoadBalancerName,DNS:DNSName}'
If it has zero targets and zero requests — delete it immediately.
8. Snapshot Hoarding
Backups are important — but do you really need a snapshot of a test server from 2022?
The Leak:
Automated backups that never expire.
The Fix:
Use AWS Backup to centralize management and set hard expiration dates on snapshots.
# List all your snapshots and their ages
aws ec2 describe-snapshots --owner-ids self \
--query 'Snapshots[*].{ID:SnapshotId,Size:VolumeSize,Date:StartTime}'
Quick Emergency Checklist
| # | Leak | Quick Fix |
|---|---|---|
| 1 | NAT Gateway traffic | Use VPC Endpoints |
| 2 | Cross-AZ traffic | Keep services in same AZ |
| 3 | Ghost EBS volumes | Filter Available → Delete |
| 4 | Broken Auto Scaling | Audit CloudWatch alarms |
| 5 | CloudWatch debug logs | Set 14-30 day retention |
| 6 | S3 no lifecycle | Add Lifecycle Policy |
| 7 | Idle Load Balancers | Zero targets → Delete |
| 8 | Old snapshots | Set expiration in AWS Backup |
The Bottom Line
AWS is a "pay-for-what-you-use" model — but if you aren't careful, you're also "paying-for-what-you-forgot-to-turn-off."
Run through this checklist every month. Set a calendar reminder. Your AWS bill will thank you.
*What's the biggest hidden cost you've ever found in your AWS bill? Drop it in the comments *
Top comments (0)