DEV Community

Spicy
Spicy

Posted on

Stop Wasting 32% of Your Cloud Budget: A FinOps Playbook for DevOps Engineers

You're in a sprint planning meeting and someone drops the monthly cloud bill on the table. It's up again. Nobody knows exactly why. The DevOps lead says it's probably the new microservices rollout. Finance says it's been climbing for six months. Everyone nods and moves on.

This is a story playing out in thousands of engineering orgs right now. According to the FinOps Foundation's State of FinOps 2026 survey — which aggregated data from 1,200+ organizations running over $69 billion in cloud spend — teams without a structured optimization practice waste 32 to 40 percent of their cloud budget every month.

For a $500K/year cloud budget, that's $160K to $200K disappearing into idle instances, orphaned volumes, and on-demand pricing on workloads that have been running predictably for years.

Here's the 5-tactic playbook to fix it.

Tactic 1 — Rightsize Before Anything Else

Pull 30 days of CPU and memory utilization data. Any instance averaging below 20% CPU with 40%+ memory headroom is a rightsizing candidate.

AWS CLI quickstart:

# Find all EC2 instances and their current types
aws ec2 describe-instances \
  --query "Reservations[*].Instances[*].[InstanceId,InstanceType,State.Name]" \
  --output table

# Then check CPU utilization via CloudWatch for each ID
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=<INSTANCE_ID> \
  --start-time 2026-05-01T00:00:00Z \
  --end-time 2026-05-29T00:00:00Z \
  --period 86400 \
  --statistics Average
Enter fullscreen mode Exit fullscreen mode

AWS Cost Explorer's rightsizing recommendations do this automatically with a single click. GCP Active Assist and Azure Advisor are equivalent. Rightsizing typically delivers 15–25% compute savings.

For Kubernetes, use OpenCost — it's free, CNCF-sandbox, and gives you per-pod cost visibility that native cloud consoles miss entirely.

Tactic 2 — Stop Running Steady-State Workloads at On-Demand Prices

Reserved Instances and Savings Plans deliver 40–72% savings vs on-demand for the same compute. If a service has been running continuously for 3+ months, you should not be paying on-demand for it.

Comparison: AWS commitment options

Type Savings vs On-Demand Flexibility Best For
1-Year Reserved Instance ~40% Low (instance-locked) Known, stable workloads
3-Year Reserved Instance ~62–72% Very low Long-term steady state
Compute Savings Plan (1yr) ~40% High (any family/region) Evolving architectures
Spot Instances 60–90% Very high (interruptible) Batch, CI/CD jobs

Start with Compute Savings Plans unless you know exactly which instance types you'll need for the next 3 years — you almost certainly don't.

Tactic 3 — Schedule Non-Prod Environments to Stop at Night

Your dev/staging/test environments do not need to run at 2 AM on Saturday. Scheduling them to stop outside business hours reduces those environment costs by 65–70%.

For AWS, the Instance Scheduler is the native option. For Kubernetes, combine HPA with scheduled scale-to-zero on non-production namespaces. Add a Slack /wakeup staging command via a simple Lambda so engineers can spin up on demand without leaving things running permanently.

FinOps Foundation benchmark: Teams that implement environment scheduling see 10–20% reduction in total cloud spend. It's the easiest win with the least technical risk.

Tactic 4 — Audit Storage and Snapshots

Storage waste is invisible until you look for it. Three areas consistently surface quick wins:

Unattached EBS volumes:

aws ec2 describe-volumes \
  --filters Name=status,Values=available \
  --query "Volumes[*].[VolumeId,Size,CreateTime]" \
  --output table
Enter fullscreen mode Exit fullscreen mode

Any volume in available state isn't attached to anything. Delete or snapshot-and-delete.

Snapshot retention: Set a maximum 30-day retention policy for non-critical snapshots. Older snapshots should move to cheaper tiers. Most teams find they're keeping hundreds of snapshots that serve no real disaster recovery purpose.

S3 lifecycle policies: Data not accessed in 90 days → Infrequent Access. Data older than 180 days → Glacier. This alone can cut S3 costs 30–40% for data-heavy workloads.

Tactic 5 — Tag Everything, Then Enforce It

Without cost allocation tags, nobody owns the bill. The minimal effective tag set: team, app, env (prod/staging/dev), cost-center.

Enforce tagging at provisioning with AWS Service Control Policies:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Deny",
    "Action": ["ec2:RunInstances", "rds:CreateDBInstance"],
    "Resource": "*",
    "Condition": {
      "Null": {
        "aws:RequestedRegion": "false",
        "aws:ResourceTag/team": "true",
        "aws:ResourceTag/env": "true"
      }
    }
  }]
}
Enter fullscreen mode Exit fullscreen mode

Once tagging is clean, a cost-per-team dashboard changes behavior faster than any top-down mandate.

Tool Comparison: Native vs Third-Party

Tool Best For Cost Multi-Cloud?
AWS Cost Explorer AWS-native rightsizing, reservations Free (+ $0.01/API call) No
GCP Active Assist GCP idle resource detection Free No
OpenCost Kubernetes cost allocation Free (open source) Yes
CloudHealth (Broadcom) Large enterprise multi-cloud Paid Yes
CAST AI Kubernetes rightsizing + automation Freemium Yes
Kubecost Kubernetes cost + CNCF-compatible Freemium Yes

The Results You Can Expect

Tactic Effort Typical Savings Time to Results
Rightsizing Medium 15–25% compute 2–4 weeks
Reservations / Savings Plans Low 40–72% on committed spend Immediate
Environment scheduling Low 10–20% total spend 1 billing cycle
Storage cleanup Low 5–15% total spend 1 billing cycle
Tagging governance High (setup) Enables all others 30–60 days

FinOps Foundation data shows that mature programs reduce total cloud waste to 15–20% — roughly half the 32–40% baseline. You won't eliminate waste entirely, but getting from 35% to 18% on a $1M budget is $170K back in engineering budget per year.


If you want the full guide with deeper dives on commitment strategy and FinOps culture, the original post is on lucas8.com.

Top comments (0)