DEV Community

June Gu
June Gu

Posted on

FinOps for SREs: Cutting Costs Without Breaking Things

FinOps for SREs: Cutting Costs Without Breaking Things

Tags: aws finops sre reliability devops


Most FinOps advice starts with a cost dashboard. This series starts with a different question: how do we cut costs without violating our SLOs?

I'm an SRE at a subsidiary of one of Korea's largest tech companies, managing four AWS accounts connected via a Transit Gateway hub-spoke architecture. When I was asked to reduce cloud spend, I didn't open AWS Cost Explorer first. I opened our SigNoz dashboards and checked our error budgets.

That's the difference between FinOps and SRE-driven FinOps.

The SRE Guarantee

Before any cost optimization begins, I guarantee three things:

1. Error Budget Protection
No optimization will be executed if it risks breaching SLOs. If our error budget is below 50%, all FinOps work stops — reliability comes first.

2. Assured Minimum Downtime
Every change has a rollback plan, a maintenance window, and a blast radius assessment. Zero-downtime is the target. Documented, brief downtime during a maintenance window is the floor. Unplanned downtime is unacceptable.

3. Reliability Over Savings
If forced to choose between $500/month in savings and a 0.01% availability risk, we choose availability. Always. The cost of an outage — in customer trust, in engineering hours, in incident response — exceeds any monthly savings.

This guarantee isn't just a principle. It's encoded in every check of the aws-finops-toolkit — the open-source CLI I built to automate this workflow.


The Series

This series walks through the complete FinOps workflow I used to identify $48-67K/year in savings across four AWS accounts — starting with analysis, through passive cleanup, to active downsizing with SRE guardrails.

Part 0: The Pre-Flight Checklist

9 checks before cutting any cost. Traffic analysis, SLO status, cache dependencies, incident history, RI/SP coverage, and more. This is the analysis phase — never optimize what you don't fully understand.

→ OSS: finops preflight command (aws-finops-toolkit)

Part 1: How I Found $12K/Year in AWS Waste

Passive waste — things nobody uses. Abandoned VPCs ($748/mo), orphan CloudWatch log groups ($110-165/mo), S3 lifecycle vs Intelligent-Tiering ($75-104/mo). Zero risk to production. Total: $933-1,017/month.

→ OSS: finops scanvpc_waste, cloudwatch_waste, s3_lifecycle checks

Part 2: Downsizing Without Downtime

Active optimization — shrinking running infrastructure with SRE guardrails. EC2/EKS right-sizing with PDBs, NAT Gateway replacement, Spot with drain handlers, RDS right-sizing with read replicas and cold cache planning, ElastiCache scheduling, and Reserved Instances (commit last, not first). Total: $787-1,087/month.

→ OSS: finops scanec2_rightsizing, nat_gateway, spot_candidates, rds_rightsizing, elasticache_scheduling, reserved_instances, unused_resources checks


Combined Savings

Phase Monthly Annual
Part 1: Passive waste cleanup $933-1,017 $11.2-12.2K
Part 2: Active downsizing $787-1,087 $9.4-13K
Total identified $1,720-2,104 $20.6-25.2K
P0-P2 roadmap (pending) $3,995-5,565 $48-67K

Every optimization in this series passed through the SRE guarantee. Not a single SLO was breached. Not a single unplanned outage occurred.


The Toolkit

Everything in this series maps to aws-finops-toolkit — an open-source CLI that automates the discovery:

# Pre-flight analysis before any change
finops preflight --target pn-sh-rds-prod --profile dodo-dev --apm signoz

# Scan for cost waste across accounts
finops scan --profiles dev,staging,prod

# Generate report for stakeholders
finops report --format html --output finops-report.html
Enter fullscreen mode Exit fullscreen mode

The tool finds the opportunities. The SRE decides which ones are safe to execute.


This is the introduction to the "FinOps for SREs" series. Start with Part 0: The Pre-Flight Checklist or jump to the part most relevant to your situation.

I'm June, an SRE with 5+ years of experience at Korea's top tech companies including Coupang (NYSE: CPNG) and NAVER Corporation. I write about real-world infrastructure problems. Find me on LinkedIn.

Top comments (0)