The average company wastes 32% of its cloud spend — that's not a guess, it's what Flexera's State of the Cloud report has found year after year. If your team hasn't done a formal cost review in the last quarter, you're almost certainly burning money on oversized instances, forgotten resources, and on-demand pricing you could eliminate today.
This guide covers practical, immediately actionable strategies for cutting your AWS and Azure bills. No theoretical frameworks — just the specific optimizations that save the most money, ranked by impact.
The Cloud Cost Pyramid
Start from the top — biggest savings first:
┌─────────────┐
│ Commitment │ ← 30-60% savings
│ Discounts │
├─────────────┤
│ Right-Sizing │ ← 20-40% savings
│ │
├─────────────┤
│ Storage │ ← 10-30% savings
│ Optimization │
├─────────────┤
│ Network │ ← 5-15% savings
│ Costs │
├─────────────┤
│ Scheduling │ ← 5-20% savings
│ & Cleanup │
└─────────────┘
1. Commitment Discounts (30-60% Savings)
This is the single highest-impact optimization. If you're running workloads 24/7 on on-demand pricing, you're leaving money on the table.
AWS Savings Plans
Savings Plan Types:
1. Compute Savings Plans (up to 66% off)
- Applies to EC2, Fargate, Lambda
- Flexible: any instance family, region, OS
- Best for most organizations
2. EC2 Instance Savings Plans (up to 72% off)
- Locked to instance family in a region
- Slightly cheaper than Compute plans
- Good when you know your instance types
3. SageMaker Savings Plans (up to 64% off)
- For ML workloads
How to calculate your commitment:
# Simple Savings Plan calculator
import json
def calculate_savings_plan(
monthly_on_demand: float,
baseline_percentage: float = 0.60,
savings_rate: float = 0.40,
commitment_years: int = 1,
) -> dict:
"""Calculate optimal Savings Plan commitment.
Args:
monthly_on_demand: Current monthly on-demand spend
baseline_percentage: % of spend that's consistent (0.0-1.0)
savings_rate: Discount rate for the plan (0.30-0.66)
commitment_years: 1 or 3 year commitment
"""
# Your baseline = what you consistently use
baseline_monthly = monthly_on_demand * baseline_percentage
# Commitment amount (hourly)
commitment_hourly = (baseline_monthly * (1 - savings_rate)) / 730
# Annual savings
annual_savings = baseline_monthly * savings_rate * 12
# Total commitment
total_commitment = commitment_hourly * 730 * 12 * commitment_years
return {
"current_monthly_spend": monthly_on_demand,
"baseline_monthly": baseline_monthly,
"commitment_hourly": round(commitment_hourly, 2),
"monthly_savings": round(baseline_monthly * savings_rate, 2),
"annual_savings": round(annual_savings, 2),
"total_commitment": round(total_commitment, 2),
"break_even_months": round(
total_commitment / (baseline_monthly * savings_rate), 1
) if baseline_monthly * savings_rate > 0 else float("inf"),
}
# Example: $50K/month on-demand EC2 spend
result = calculate_savings_plan(
monthly_on_demand=50000,
baseline_percentage=0.65, # 65% is consistent baseline
savings_rate=0.40, # 40% discount with 1-year Compute plan
)
print(json.dumps(result, indent=2))
# Annual savings: ~$156,000
Azure Reserved Instances
Azure's equivalent — commit to a 1 or 3 year term for significant discounts.
Azure RI discounts:
- VMs: up to 72% (3-year)
- SQL Database: up to 55%
- Cosmos DB: up to 65%
- Azure Databricks: up to 49% (with pre-purchase)
- Storage: up to 38% (reserved capacity)
Rule of thumb: If a resource runs more than 50% of the time, a 1-year reservation saves money. If it runs more than 30% of the time, a 3-year reservation saves money.
2. Right-Sizing (20-40% Savings)
Most instances are oversized. Engineers pick a "safe" instance size during initial setup and never revisit it.
Finding Oversized Instances
import boto3
from datetime import datetime, timedelta
def find_oversized_instances(region: str = "eu-west-1") -> list[dict]:
"""Find EC2 instances with consistently low CPU utilization."""
ec2 = boto3.client("ec2", region_name=region)
cloudwatch = boto3.client("cloudwatch", region_name=region)
instances = ec2.describe_instances(
Filters=[{"Name": "instance-state-name", "Values": ["running"]}]
)
oversized = []
for reservation in instances["Reservations"]:
for instance in reservation["Instances"]:
instance_id = instance["InstanceId"]
instance_type = instance["InstanceType"]
# Get average CPU over last 14 days
response = cloudwatch.get_metric_statistics(
Namespace="AWS/EC2",
MetricName="CPUUtilization",
Dimensions=[
{"Name": "InstanceId", "Value": instance_id}
],
StartTime=datetime.utcnow() - timedelta(days=14),
EndTime=datetime.utcnow(),
Period=86400, # Daily averages
Statistics=["Average", "Maximum"],
)
if response["Datapoints"]:
avg_cpu = sum(
d["Average"] for d in response["Datapoints"]
) / len(response["Datapoints"])
max_cpu = max(
d["Maximum"] for d in response["Datapoints"]
)
if avg_cpu < 20 and max_cpu < 50:
oversized.append({
"instance_id": instance_id,
"instance_type": instance_type,
"avg_cpu": round(avg_cpu, 1),
"max_cpu": round(max_cpu, 1),
"recommendation": "Downsize by 1-2 sizes",
})
return oversized
# Run the analysis
results = find_oversized_instances()
for r in results:
print(
f"{r['instance_id']} ({r['instance_type']}): "
f"avg CPU {r['avg_cpu']}%, max {r['max_cpu']}% "
f"-> {r['recommendation']}"
)
Right-Sizing Decision Matrix
| Avg CPU | Max CPU | Memory Usage | Recommendation |
|---|---|---|---|
| < 10% | < 30% | < 30% | Downsize by 2 sizes or consider serverless |
| 10-30% | < 50% | < 50% | Downsize by 1 size |
| 30-60% | < 80% | < 70% | Current size OK |
| > 60% | > 80% | > 70% | Consider upsizing |
Instance Family Selection
Don't just resize — pick the right family:
Common mistake: Using m5.xlarge for a CPU-bound workload
Better choice: c5.large (compute-optimized, half the cost)
Common mistake: Using m5.2xlarge for an in-memory cache
Better choice: r5.xlarge (memory-optimized, same RAM, less CPU cost)
AWS instance families:
- t3/t4g: Burstable, web servers, dev environments
- m6i/m7g: General purpose, balanced workloads
- c6i/c7g: CPU-intensive (data processing, batch)
- r6i/r7g: Memory-intensive (caches, databases)
- g5: GPU (ML training/inference)
- i3/i4i: Storage-optimized (databases)
Graviton (ARM) instances: 20% cheaper, often better performance
Switch t3 → t4g, m5 → m7g, c5 → c7g for instant savings
3. Storage Optimization (10-30% Savings)
Storage costs creep up silently. Nobody notices until the bill is $20K/month.
S3 Lifecycle Policies
{
"Rules": [
{
"ID": "MoveToIA",
"Status": "Enabled",
"Filter": {
"Prefix": "logs/"
},
"Transitions": [
{
"Days": 30,
"StorageClass": "STANDARD_IA"
},
{
"Days": 90,
"StorageClass": "GLACIER_IR"
},
{
"Days": 365,
"StorageClass": "DEEP_ARCHIVE"
}
],
"Expiration": {
"Days": 2555
}
},
{
"ID": "CleanupIncomplete",
"Status": "Enabled",
"Filter": {},
"AbortIncompleteMultipartUpload": {
"DaysAfterInitiation": 7
}
},
{
"ID": "DeleteOldVersions",
"Status": "Enabled",
"Filter": {},
"NoncurrentVersionTransitions": [
{
"NoncurrentDays": 30,
"StorageClass": "STANDARD_IA"
}
],
"NoncurrentVersionExpiration": {
"NoncurrentDays": 90
}
}
]
}
S3 Cost Comparison
| Storage Class | Cost per GB/month | Retrieval Cost | Use Case |
|---|---|---|---|
| Standard | $0.023 | None | Active data |
| Intelligent-Tiering | $0.023 + monitoring | None | Unknown access patterns |
| Standard-IA | $0.0125 | $0.01/GB | Monthly access |
| Glacier Instant | $0.004 | $0.03/GB | Quarterly access |
| Glacier Flexible | $0.0036 | $0.01/GB + time | Annual access |
| Deep Archive | $0.00099 | $0.02/GB + 12hrs | Compliance archives |
Quick win: Enable S3 Intelligent-Tiering on buckets with unknown access patterns. It automatically moves data between tiers and typically saves 30-40%.
EBS Volume Cleanup
def find_unused_ebs_volumes(region: str = "eu-west-1") -> list[dict]:
"""Find unattached EBS volumes costing you money."""
ec2 = boto3.client("ec2", region_name=region)
volumes = ec2.describe_volumes(
Filters=[{"Name": "status", "Values": ["available"]}]
)
unused = []
total_cost = 0
for vol in volumes["Volumes"]:
size_gb = vol["Size"]
vol_type = vol["VolumeType"]
# Approximate monthly cost
cost_per_gb = {
"gp2": 0.10, "gp3": 0.08, "io1": 0.125,
"io2": 0.125, "st1": 0.045, "sc1": 0.015,
}
monthly_cost = size_gb * cost_per_gb.get(vol_type, 0.10)
total_cost += monthly_cost
unused.append({
"volume_id": vol["VolumeId"],
"size_gb": size_gb,
"type": vol_type,
"monthly_cost": round(monthly_cost, 2),
"created": str(vol["CreateTime"]),
})
print(f"Found {len(unused)} unused volumes")
print(f"Total wasted: ${total_cost:.2f}/month")
return unused
4. Network Cost Reduction (5-15% Savings)
Data transfer is the hidden cloud tax. Cross-AZ, cross-region, and internet egress add up fast.
Key Network Cost Rules
AWS Data Transfer Costs:
- Same AZ, same VPC: FREE
- Cross-AZ (within region): $0.01/GB each way
- Cross-region: $0.02/GB
- Internet egress: $0.09/GB (first 10TB)
- CloudFront egress: $0.085/GB (cheaper than direct)
Cost reduction strategies:
1. Use VPC endpoints for AWS services (S3, DynamoDB)
- Eliminates NAT Gateway charges ($0.045/GB)
- Free for Gateway endpoints (S3, DynamoDB)
2. Keep traffic in the same AZ when possible
- Use AZ-aware routing in ALB
- Configure services to prefer same-AZ replicas
3. Use CloudFront for egress
- Cheaper than direct internet egress
- Also reduces latency
4. Compress data in transit
- Enable gzip/brotli on ALB
- Compress S3 objects before transfer
VPC Endpoint Cost Savings
# Terraform: S3 Gateway Endpoint (FREE)
resource "aws_vpc_endpoint" "s3" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.eu-west-1.s3"
vpc_endpoint_type = "Gateway"
route_table_ids = aws_route_table.private[*].id
}
# This eliminates NAT Gateway charges for S3 traffic
# If you transfer 1TB/month to S3:
# Without endpoint: 1000 GB × $0.045 = $45/month (NAT)
# With endpoint: $0/month
5. Scheduling and Cleanup (5-20% Savings)
Non-production environments don't need to run 24/7.
Auto-Shutdown for Dev/Staging
import boto3
from datetime import datetime
def manage_dev_instances(action: str = "stop"):
"""Stop dev instances outside business hours."""
ec2 = boto3.client("ec2", region_name="eu-west-1")
# Find instances tagged as dev/staging
instances = ec2.describe_instances(
Filters=[
{"Name": "tag:Environment", "Values": ["dev", "staging"]},
{"Name": "tag:AutoShutdown", "Values": ["true"]},
{
"Name": "instance-state-name",
"Values": ["running" if action == "stop" else "stopped"],
},
]
)
instance_ids = [
i["InstanceId"]
for r in instances["Reservations"]
for i in r["Instances"]
]
if not instance_ids:
print(f"No instances to {action}")
return
if action == "stop":
ec2.stop_instances(InstanceIds=instance_ids)
print(f"Stopped {len(instance_ids)} instances")
elif action == "start":
ec2.start_instances(InstanceIds=instance_ids)
print(f"Started {len(instance_ids)} instances")
# Schedule with EventBridge:
# Stop at 7 PM: manage_dev_instances("stop")
# Start at 8 AM: manage_dev_instances("start")
# = 13 hours off per weekday + weekends
# = ~60% reduction in dev instance costs
Resource Cleanup Automation
def cleanup_old_resources(dry_run: bool = True) -> dict:
"""Find and optionally delete old/unused resources."""
ec2 = boto3.client("ec2", region_name="eu-west-1")
savings = {"monthly_savings": 0, "resources": []}
# 1. Old snapshots (> 90 days, no AMI reference)
snapshots = ec2.describe_snapshots(OwnerIds=["self"])
cutoff = datetime.utcnow() - timedelta(days=90)
for snap in snapshots["Snapshots"]:
if snap["StartTime"].replace(tzinfo=None) < cutoff:
size_gb = snap["VolumeSize"]
cost = size_gb * 0.05 # $0.05/GB/month for snapshots
savings["monthly_savings"] += cost
savings["resources"].append({
"type": "snapshot",
"id": snap["SnapshotId"],
"size_gb": size_gb,
"monthly_cost": round(cost, 2),
"age_days": (
datetime.utcnow() - snap["StartTime"].replace(tzinfo=None)
).days,
})
if not dry_run:
ec2.delete_snapshot(SnapshotId=snap["SnapshotId"])
# 2. Unattached Elastic IPs ($3.60/month each if not attached)
addresses = ec2.describe_addresses()
for addr in addresses["Addresses"]:
if "AssociationId" not in addr:
savings["monthly_savings"] += 3.60
savings["resources"].append({
"type": "elastic_ip",
"id": addr["AllocationId"],
"monthly_cost": 3.60,
})
if not dry_run:
ec2.release_address(AllocationId=addr["AllocationId"])
print(f"Potential monthly savings: ${savings['monthly_savings']:.2f}")
print(f"Resources to clean: {len(savings['resources'])}")
return savings
Monthly Cost Review Checklist
Run this checklist on the first of every month:
| Check | Tool | Target |
|---|---|---|
| Unused instances | AWS Compute Optimizer | Downsize or terminate |
| Unattached EBS volumes | Cost Explorer | Delete or snapshot |
| Old snapshots | Custom script | Delete if > 90 days |
| Unattached Elastic IPs | Console/script | Release |
| S3 access patterns | S3 Analytics | Apply lifecycle policies |
| Reserved coverage | Savings Plans report | Cover 60-70% baseline |
| NAT Gateway traffic | VPC Flow Logs | Replace with VPC endpoints |
| Cross-AZ data transfer | Cost Explorer | Optimize routing |
Summary
Cloud cost optimization isn't a one-time project — it's an ongoing practice:
| Strategy | Typical Savings | Effort | Impact Time |
|---|---|---|---|
| Savings Plans/RIs | 30-60% | Low | Immediate |
| Right-sizing | 20-40% | Medium | 1-2 weeks |
| Storage tiering | 10-30% | Low | Days |
| Network optimization | 5-15% | Medium | 1-2 weeks |
| Scheduling/cleanup | 5-20% | Low | Immediate |
Start with commitment discounts and right-sizing — that's where 70% of the savings come from.
If you want ready-made scripts, dashboards, and automation templates for cloud cost optimization and data infrastructure, check out DataStack Pro.
Top comments (0)