Cloud Cost Optimization Toolkit
A battle-tested collection of scripts, queries, and dashboard configurations for slashing cloud spend without sacrificing performance. This toolkit covers the full optimization lifecycle — discovery, analysis, right-sizing, reserved capacity planning, and ongoing alerting. Built from real-world engagements where these techniques consistently delivered 25-40% cost reductions across AWS, Azure, and GCP environments.
Key Features
- Waste Discovery Scripts — Identify unattached EBS volumes, idle load balancers, orphaned snapshots, and unused elastic IPs
- Right-Sizing Engine — CPU/memory utilization analysis with instance family recommendations using CloudWatch/Azure Monitor metrics
- Reserved Instance Planner — Break-even calculators for RIs, Savings Plans, and Azure Reservations with commitment scenarios
- Spot/Preemptible Advisor — Workload classification for spot eligibility with interruption rate data by instance type and AZ
- Budget Alert Templates — Pre-configured budget alerts at 50%, 80%, and 100% thresholds with SNS/email/Slack notifications
- Cost Anomaly Detection — Queries to detect spend spikes using day-over-day and week-over-week comparisons
- Tag Compliance Reports — Find untagged resources that can't be attributed to cost centers
- Multi-Cloud Dashboard — Unified cost views with normalized metrics across AWS, Azure, and GCP
Quick Start
# AWS: Find all idle resources in your account
python3 src/aws/waste_finder.py \
--profile production \
--region us-east-1 \
--output-format json \
--min-idle-days 14
# AWS: Right-sizing recommendations
python3 src/aws/rightsizer.py \
--profile production \
--lookback-days 30 \
--cpu-threshold 40 \
--memory-threshold 30
# Azure: Find orphaned disks and IPs
python3 src/azure/orphan_scanner.py \
--subscription YOUR_SUBSCRIPTION_ID \
--output report.csv
Architecture
┌─────────────────────────────────────────────────────────┐
│ Cost Optimization Pipeline │
│ │
│ ┌──────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Collect │───►│ Analyze │───►│ Recommend │ │
│ │ │ │ │ │ │ │
│ │CloudWatch│ │ Utilization │ │ Right-size │ │
│ │Cost Exp. │ │ Trending │ │ Reserve │ │
│ │Azure Mon.│ │ Anomaly Det. │ │ Spot/Preempt │ │
│ └──────────┘ └──────────────┘ └──────┬───────┘ │
│ │ │
│ ┌──────────────────────────────────────────▼────────┐ │
│ │ Report & Alert │ │
│ │ ┌──────────┐ ┌───────────┐ ┌────────────────┐ │ │
│ │ │Dashboard │ │Budget │ │ Slack/Email │ │ │
│ │ │(Grafana) │ │Alerts │ │ Notifications │ │ │
│ │ └──────────┘ └───────────┘ └────────────────┘ │ │
│ └───────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
Usage Examples
AWS Cost Explorer Query — Top Spend by Service
# src/aws/cost_breakdown.py
import boto3
from datetime import datetime
def get_monthly_cost_by_service(profile: str = "default") -> dict:
"""Retrieve current month's cost grouped by AWS service."""
session = boto3.Session(profile_name=profile)
client = session.client("ce", region_name="us-east-1")
end = datetime.utcnow().strftime("%Y-%m-%d")
start = datetime.utcnow().replace(day=1).strftime("%Y-%m-%d")
response = client.get_cost_and_usage(
TimePeriod={"Start": start, "End": end},
Granularity="MONTHLY",
Metrics=["UnblendedCost"],
GroupBy=[{"Type": "DIMENSION", "Key": "SERVICE"}],
)
costs = {}
for group in response["ResultsByTime"][0]["Groups"]:
service = group["Keys"][0]
amount = float(group["Metrics"]["UnblendedCost"]["Amount"])
if amount > 0.01:
costs[service] = round(amount, 2)
return dict(sorted(costs.items(), key=lambda x: x[1], reverse=True))
Budget Alert CloudFormation Template
# src/aws/budget-alert.yaml
AWSTemplateFormatVersion: '2010-09-09'
Parameters:
MonthlyBudget:
Type: Number
Default: 5000
AlertEmail:
Type: String
Default: cloud-team@example.com
Resources:
MonthlyBudgetAlert:
Type: AWS::Budgets::Budget
Properties:
Budget:
BudgetName: monthly-total-spend
BudgetLimit:
Amount: !Ref MonthlyBudget
Unit: USD
TimeUnit: MONTHLY
BudgetType: COST
NotificationsWithSubscribers:
- Notification:
NotificationType: ACTUAL
ComparisonOperator: GREATER_THAN
Threshold: 80
Subscribers:
- SubscriptionType: EMAIL
Address: !Ref AlertEmail
Right-Sizing Decision Logic
# src/common/rightsizer_logic.py
from dataclasses import dataclass
@dataclass
class SizingRecommendation:
instance_id: str
current_type: str
recommended_type: str
avg_cpu: float
avg_memory: float
monthly_savings: float
confidence: str # "high", "medium", "low"
def classify_instance(
avg_cpu: float, peak_cpu: float,
avg_mem: float, peak_mem: float
) -> str:
"""Classify instance utilization for right-sizing."""
if avg_cpu < 5 and avg_mem < 10:
return "idle" # Candidate for termination
elif peak_cpu < 40 and peak_mem < 40:
return "oversized" # Downsize by 1-2 instance sizes
elif peak_cpu > 90 or peak_mem > 90:
return "undersized" # Upsize or add autoscaling
else:
return "right-sized" # No action needed
Configuration
# configs/optimization-config.yaml
waste_detection:
ebs_unattached_days: 14 # Flag EBS volumes unattached for N days
elb_idle_days: 7 # Flag ALBs with zero healthy targets
snapshot_age_days: 90 # Flag snapshots older than N days
elastic_ip_unattached: true # Flag all unassociated EIPs
rightsizing:
lookback_days: 30 # Metric analysis window
avg_cpu_threshold: 40 # Below this = oversized
avg_memory_threshold: 30 # Below this = oversized
peak_cpu_threshold: 90 # Above this = undersized
min_data_points: 100 # Require sufficient data before recommending
budgets:
monthly_limit_usd: 5000
alert_thresholds: [50, 80, 100]
notification_channels:
- type: email
address: cloud-team@example.com
Best Practices
- Run waste detection weekly — Orphaned resources accumulate fast; schedule scripts via cron or Lambda
- Use 30-day lookback minimum — Shorter windows miss weekly or monthly batch patterns
- Don't right-size databases blindly — Memory-optimized instances may look underutilized by CPU but need the RAM
- Start with Savings Plans over RIs — More flexible; use RIs only for predictable, steady-state workloads
- Tag enforcement before cost analysis — You can't allocate costs to teams if 30% of resources are untagged
- Automate the easy wins — Schedule EBS snapshot cleanup, unused EIP release, and dev environment shutdown
Troubleshooting
| Issue | Cause | Fix |
|---|---|---|
| Cost Explorer API returns empty results | CE needs 24h after account activation | Wait one day; ensure Cost Explorer is enabled in Billing console |
| Right-sizer shows 0% memory usage | CloudWatch doesn't collect memory by default | Install CloudWatch Agent on EC2 instances to push memory metrics |
| Budget alert not firing | Budget uses calendar month; created mid-month | Threshold may not be hit until next full month — check forecasted alerts |
Script returns AccessDenied
|
IAM policy missing ce:GetCostAndUsage
|
Attach AWSBillingReadOnlyAccess or custom policy with CE permissions |
This is 1 of 11 resources in the Cloud Architecture Pro toolkit. Get the complete [Cloud Cost Optimization Toolkit] with all files, templates, and documentation for $39.
Or grab the entire Cloud Architecture Pro bundle (11 products) for $149 — save 30%.
Top comments (0)