TL;DR: Cloud exceptions in spreadsheets rot. Policy-as-code puts them in git with automatic expiry dates, git reviews, and cost tracking. Here's how.
Tools like CleanCloud move exceptions into Git with expiry dates, PR reviews, and enforceable cost thresholds.
The Exception Spreadsheet Nobody Trusts
It starts innocently enough. Your team runs a cloud hygiene scan. A bastion host is intentionally stopped. A dev database has been idle for 45 days but is still in use. A NAT gateway with no traffic (seasonal workload). Legitimate exceptions, all of them.
So you create a list.
Three months later:
- Nobody knows which exceptions are still valid
- Who approved them?
- When do they expire?
- The spreadsheet hasn't been updated since 2024
- You're suppressing the alerts instead of fixing the waste
Stats:
- ~40% of FinOps exceptions have zero documented expiry
- Average exception age before questioned: 6+ months
- Typical cost of "forgotten" exceptions: $10-50K+/month
The exceptions spreadsheet is the FinOps equivalent of commenting out a security alert.
The waste is still running.
You've just stopped seeing it.
What If Exceptions Were Code?
This is exactly the problem policy-as-code solves.
Instead of tracking exceptions in spreadsheets or tickets, tools like CleanCloud treat them as version-controlled configuration — living alongside your infrastructure, with built-in expiry, review, and enforcement.
Instead of a spreadsheet, your exceptions live in Git.
cleancloud.yaml (repo root)
# cleancloud.yaml - commit to your repo root
defaults:
confidence: MEDIUM # skip low-signal findings
min_cost: 10 # ignore cheap findings
exceptions:
- rule_id: aws.ec2.instance.stopped
resource_id: i-0bastion1234 # bastion host
reason: "Bastion - started on demand for debugging"
expires_at: "2026-12-31" # auto-expires (forces review)
- rule_id: aws.rds.instance.idle
resource_id: "db-test-*" # wildcard (all test databases)
reason: "Test databases are ephemeral and intentionally idle"
- rule_id: aws.nat.idle
resource_id: nat-12345678
reason: "NAT gateway for seasonal workload (Jan-Mar only)"
expires_at: "2026-03-31" # expires after season ends
thresholds:
fail_on_confidence: HIGH # CI gate: block on HIGH findings
fail_on_cost: 500 # CI gate: block if waste > $500/month
Now every exception is:
- Reviewable - Show up in pull requests (why is documented)
- Auditable - Git history shows who approved what
- Self-expiring - No more "forgotten" exceptions (automatic)
- Enforceable - CI fails if exceptions are violated
- Version-controlled - Treated like infrastructure code (because it is)
How It Works in Practice
1. Run scans with your config
cleancloud scan --provider aws --all-regions
# Automatically picks up cleancloud.yaml from repo root
Without cleancloud.yaml: 47 findings
With cleancloud.yaml: 12 findings (the exceptions are suppressed)
2. Enforce in CI/CD
# .github/workflows/cloud-hygiene.yml
name: Cloud Hygiene Check
on:
schedule:
- cron: '0 9 * * MON' # Weekly scan
jobs:
scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Scan cloud for waste
uses: getcleancloud/scan-action@v1
with:
provider: aws
all-regions: true
fail-on-cost: 500 # Exit code 2 if waste > $500/month
Result: Your build fails if exceptions expire or waste threshold is breached. No surprises.
3. Update exceptions via PR
When your bastion exception expires (2026-12-31), the next scan will fail:
Cloud Hygiene Check: FAILED
[HIGH] Stopped EC2 instance: i-0bastion1234
Reason: Exception expired on 2026-12-31
Action: Remove the exception or update expires_at
Your team reviews the PR, decides whether to renew or delete it. Zero magic. Total visibility.
Real-World Example: Multi-Account Exception
Managing exceptions across your AWS org:
exceptions:
# Production RDS: kept for failover (intentional)
- rule_id: aws.rds.instance.idle
account_id: "123456789012" # prod account
region: us-east-1
resource_id: db-failover
reason: "Standby RDS in active-passive setup"
expires_at: "2027-03-31" # annual review date
# Staging: ephemeral, safe to ignore
- rule_id: aws.rds.instance.idle
account_id: "210987654321" # staging account
resource_id: "db-*" # glob pattern
reason: "Staging databases are ephemeral"
# No expires_at = permanent exception (reviewed manually)
The Policy-as-Code Difference
| Approach | Exceptions | Audit Trail | Expiry | Cost |
|---|---|---|---|---|
| Spreadsheet | Unstructured | Slack message | Manual | Free (but $50K waste) |
| Ticketing | In Jira | Comments | Forgotten | Free (but lost time) |
| UI Toggle | In vendor SaaS | Dashboard logs | No | Vendor cost |
| Policy-as-Code | In git | Full history | Automatic | Free (and tighter control) |
Getting Started (5 Minutes)
Step 1: Try it without exceptions
pipx install cleancloud
cleancloud demo # See sample findings
Step 2: Add a config file
Create cleancloud.yaml in your repo root (the YAML above).
Step 3: Scan with your config
cleancloud scan --provider aws --all-regions
# Now suppresses the exceptions listed in cleancloud.yaml
Step 4: Commit to git
git add cleancloud.yaml
git commit -m "Add cloud hygiene exceptions with expiry dates"
git push
Your exceptions are now:
- Version-controlled
- Code-reviewed
- Automatically expiring
- Auditable
Why This Matters for Your Team
- Platform teams: Enforce waste thresholds across departments without manual hunting
- FinOps teams: Audit trail + expiry dates = zero "forgotten" waste
- DevOps/SREs: Exceptions treated like infrastructure code (belong in git)
- Security/Compliance: Every exception is a documented, reviewable approval
Next Steps
Learn more:
- Full guide: https://www.getcleancloud.com/blog/policy-as-code-cloud-governance.html
- Configuration reference: https://github.com/cleancloud-io/cleancloud/blob/main/docs/configuration.md
- Try it: cleancloud demo --category ai (also detects idle AI/ML waste)
GitHub: https://github.com/cleancloud-io/cleancloud
What's your current approach to managing exceptions? Spreadsheets, Jira, or something else? Drop a comment below.
Originally published on getcleancloud.com
Top comments (0)