You're in a sprint planning meeting and someone drops the monthly cloud bill on the table. It's up again. Nobody knows exactly why. The DevOps lead says it's probably the new microservices rollout. Finance says it's been climbing for six months. Everyone nods and moves on.
This is a story playing out in thousands of engineering orgs right now. According to the FinOps Foundation's State of FinOps 2026 survey — which aggregated data from 1,200+ organizations running over $69 billion in cloud spend — teams without a structured optimization practice waste 32 to 40 percent of their cloud budget every month.
For a $500K/year cloud budget, that's $160K to $200K disappearing into idle instances, orphaned volumes, and on-demand pricing on workloads that have been running predictably for years.
Here's the 5-tactic playbook to fix it.
Tactic 1 — Rightsize Before Anything Else
Pull 30 days of CPU and memory utilization data. Any instance averaging below 20% CPU with 40%+ memory headroom is a rightsizing candidate.
AWS CLI quickstart:
# Find all EC2 instances and their current types
aws ec2 describe-instances \
--query "Reservations[*].Instances[*].[InstanceId,InstanceType,State.Name]" \
--output table
# Then check CPU utilization via CloudWatch for each ID
aws cloudwatch get-metric-statistics \
--namespace AWS/EC2 \
--metric-name CPUUtilization \
--dimensions Name=InstanceId,Value=<INSTANCE_ID> \
--start-time 2026-05-01T00:00:00Z \
--end-time 2026-05-29T00:00:00Z \
--period 86400 \
--statistics Average
AWS Cost Explorer's rightsizing recommendations do this automatically with a single click. GCP Active Assist and Azure Advisor are equivalent. Rightsizing typically delivers 15–25% compute savings.
For Kubernetes, use OpenCost — it's free, CNCF-sandbox, and gives you per-pod cost visibility that native cloud consoles miss entirely.
Tactic 2 — Stop Running Steady-State Workloads at On-Demand Prices
Reserved Instances and Savings Plans deliver 40–72% savings vs on-demand for the same compute. If a service has been running continuously for 3+ months, you should not be paying on-demand for it.
Comparison: AWS commitment options
| Type | Savings vs On-Demand | Flexibility | Best For |
|---|---|---|---|
| 1-Year Reserved Instance | ~40% | Low (instance-locked) | Known, stable workloads |
| 3-Year Reserved Instance | ~62–72% | Very low | Long-term steady state |
| Compute Savings Plan (1yr) | ~40% | High (any family/region) | Evolving architectures |
| Spot Instances | 60–90% | Very high (interruptible) | Batch, CI/CD jobs |
Start with Compute Savings Plans unless you know exactly which instance types you'll need for the next 3 years — you almost certainly don't.
Tactic 3 — Schedule Non-Prod Environments to Stop at Night
Your dev/staging/test environments do not need to run at 2 AM on Saturday. Scheduling them to stop outside business hours reduces those environment costs by 65–70%.
For AWS, the Instance Scheduler is the native option. For Kubernetes, combine HPA with scheduled scale-to-zero on non-production namespaces. Add a Slack /wakeup staging command via a simple Lambda so engineers can spin up on demand without leaving things running permanently.
FinOps Foundation benchmark: Teams that implement environment scheduling see 10–20% reduction in total cloud spend. It's the easiest win with the least technical risk.
Tactic 4 — Audit Storage and Snapshots
Storage waste is invisible until you look for it. Three areas consistently surface quick wins:
Unattached EBS volumes:
aws ec2 describe-volumes \
--filters Name=status,Values=available \
--query "Volumes[*].[VolumeId,Size,CreateTime]" \
--output table
Any volume in available state isn't attached to anything. Delete or snapshot-and-delete.
Snapshot retention: Set a maximum 30-day retention policy for non-critical snapshots. Older snapshots should move to cheaper tiers. Most teams find they're keeping hundreds of snapshots that serve no real disaster recovery purpose.
S3 lifecycle policies: Data not accessed in 90 days → Infrequent Access. Data older than 180 days → Glacier. This alone can cut S3 costs 30–40% for data-heavy workloads.
Tactic 5 — Tag Everything, Then Enforce It
Without cost allocation tags, nobody owns the bill. The minimal effective tag set: team, app, env (prod/staging/dev), cost-center.
Enforce tagging at provisioning with AWS Service Control Policies:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Deny",
"Action": ["ec2:RunInstances", "rds:CreateDBInstance"],
"Resource": "*",
"Condition": {
"Null": {
"aws:RequestedRegion": "false",
"aws:ResourceTag/team": "true",
"aws:ResourceTag/env": "true"
}
}
}]
}
Once tagging is clean, a cost-per-team dashboard changes behavior faster than any top-down mandate.
Tool Comparison: Native vs Third-Party
| Tool | Best For | Cost | Multi-Cloud? |
|---|---|---|---|
| AWS Cost Explorer | AWS-native rightsizing, reservations | Free (+ $0.01/API call) | No |
| GCP Active Assist | GCP idle resource detection | Free | No |
| OpenCost | Kubernetes cost allocation | Free (open source) | Yes |
| CloudHealth (Broadcom) | Large enterprise multi-cloud | Paid | Yes |
| CAST AI | Kubernetes rightsizing + automation | Freemium | Yes |
| Kubecost | Kubernetes cost + CNCF-compatible | Freemium | Yes |
The Results You Can Expect
| Tactic | Effort | Typical Savings | Time to Results |
|---|---|---|---|
| Rightsizing | Medium | 15–25% compute | 2–4 weeks |
| Reservations / Savings Plans | Low | 40–72% on committed spend | Immediate |
| Environment scheduling | Low | 10–20% total spend | 1 billing cycle |
| Storage cleanup | Low | 5–15% total spend | 1 billing cycle |
| Tagging governance | High (setup) | Enables all others | 30–60 days |
FinOps Foundation data shows that mature programs reduce total cloud waste to 15–20% — roughly half the 32–40% baseline. You won't eliminate waste entirely, but getting from 35% to 18% on a $1M budget is $170K back in engineering budget per year.
If you want the full guide with deeper dives on commitment strategy and FinOps culture, the original post is on lucas8.com.
Top comments (0)