DEV Community

Suresh
Suresh

Posted on • Originally published at getcleancloud.com

Stop Managing Cloud Exceptions in Spreadsheets — Use Policy-as-Code Instead

TL;DR: Cloud exceptions in spreadsheets rot. Policy-as-code puts them in git with automatic expiry dates, git reviews, and cost tracking. Here's how.

Tools like CleanCloud move exceptions into Git with expiry dates, PR reviews, and enforceable cost thresholds.


The Exception Spreadsheet Nobody Trusts

It starts innocently enough. Your team runs a cloud hygiene scan. A bastion host is intentionally stopped. A dev database has been idle for 45 days but is still in use. A NAT gateway with no traffic (seasonal workload). Legitimate exceptions, all of them.

So you create a list.

Three months later:

  • Nobody knows which exceptions are still valid
  • Who approved them?
  • When do they expire?
  • The spreadsheet hasn't been updated since 2024
  • You're suppressing the alerts instead of fixing the waste

Stats:

  • ~40% of FinOps exceptions have zero documented expiry
  • Average exception age before questioned: 6+ months
  • Typical cost of "forgotten" exceptions: $10-50K+/month

The exceptions spreadsheet is the FinOps equivalent of commenting out a security alert.

The waste is still running.

You've just stopped seeing it.


What If Exceptions Were Code?

This is exactly the problem policy-as-code solves.

Instead of tracking exceptions in spreadsheets or tickets, tools like CleanCloud treat them as version-controlled configuration — living alongside your infrastructure, with built-in expiry, review, and enforcement.

Instead of a spreadsheet, your exceptions live in Git.

cleancloud.yaml (repo root)

# cleancloud.yaml - commit to your repo root

defaults:
  confidence: MEDIUM      # skip low-signal findings
  min_cost: 10            # ignore cheap findings

exceptions:
  - rule_id: aws.ec2.instance.stopped
    resource_id: i-0bastion1234  # bastion host
    reason: "Bastion - started on demand for debugging"
    expires_at: "2026-12-31"      # auto-expires (forces review)

  - rule_id: aws.rds.instance.idle
    resource_id: "db-test-*"      # wildcard (all test databases)
    reason: "Test databases are ephemeral and intentionally idle"

  - rule_id: aws.nat.idle
    resource_id: nat-12345678
    reason: "NAT gateway for seasonal workload (Jan-Mar only)"
    expires_at: "2026-03-31"      # expires after season ends

thresholds:
  fail_on_confidence: HIGH        # CI gate: block on HIGH findings
  fail_on_cost: 500               # CI gate: block if waste > $500/month
Enter fullscreen mode Exit fullscreen mode

Now every exception is:

  • Reviewable - Show up in pull requests (why is documented)
  • Auditable - Git history shows who approved what
  • Self-expiring - No more "forgotten" exceptions (automatic)
  • Enforceable - CI fails if exceptions are violated
  • Version-controlled - Treated like infrastructure code (because it is)

How It Works in Practice

1. Run scans with your config

cleancloud scan --provider aws --all-regions
# Automatically picks up cleancloud.yaml from repo root
Enter fullscreen mode Exit fullscreen mode

Without cleancloud.yaml: 47 findings
With cleancloud.yaml: 12 findings (the exceptions are suppressed)

2. Enforce in CI/CD

# .github/workflows/cloud-hygiene.yml
name: Cloud Hygiene Check

on:
  schedule:
    - cron: '0 9 * * MON'  # Weekly scan

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Scan cloud for waste
        uses: getcleancloud/scan-action@v1
        with:
          provider: aws
          all-regions: true
          fail-on-cost: 500      # Exit code 2 if waste > $500/month
Enter fullscreen mode Exit fullscreen mode

Result: Your build fails if exceptions expire or waste threshold is breached. No surprises.

3. Update exceptions via PR

When your bastion exception expires (2026-12-31), the next scan will fail:

Cloud Hygiene Check: FAILED
  [HIGH] Stopped EC2 instance: i-0bastion1234
  Reason: Exception expired on 2026-12-31

  Action: Remove the exception or update expires_at
Enter fullscreen mode Exit fullscreen mode

Your team reviews the PR, decides whether to renew or delete it. Zero magic. Total visibility.


Real-World Example: Multi-Account Exception

Managing exceptions across your AWS org:

exceptions:
  # Production RDS: kept for failover (intentional)
  - rule_id: aws.rds.instance.idle
    account_id: "123456789012"        # prod account
    region: us-east-1
    resource_id: db-failover
    reason: "Standby RDS in active-passive setup"
    expires_at: "2027-03-31"          # annual review date

  # Staging: ephemeral, safe to ignore
  - rule_id: aws.rds.instance.idle
    account_id: "210987654321"        # staging account
    resource_id: "db-*"               # glob pattern
    reason: "Staging databases are ephemeral"
    # No expires_at = permanent exception (reviewed manually)
Enter fullscreen mode Exit fullscreen mode

The Policy-as-Code Difference

Approach Exceptions Audit Trail Expiry Cost
Spreadsheet Unstructured Slack message Manual Free (but $50K waste)
Ticketing In Jira Comments Forgotten Free (but lost time)
UI Toggle In vendor SaaS Dashboard logs No Vendor cost
Policy-as-Code In git Full history Automatic Free (and tighter control)

Getting Started (5 Minutes)

Step 1: Try it without exceptions

pipx install cleancloud
cleancloud demo                    # See sample findings
Enter fullscreen mode Exit fullscreen mode

Step 2: Add a config file

Create cleancloud.yaml in your repo root (the YAML above).

Step 3: Scan with your config

cleancloud scan --provider aws --all-regions
# Now suppresses the exceptions listed in cleancloud.yaml
Enter fullscreen mode Exit fullscreen mode

Step 4: Commit to git

git add cleancloud.yaml
git commit -m "Add cloud hygiene exceptions with expiry dates"
git push
Enter fullscreen mode Exit fullscreen mode

Your exceptions are now:

  • Version-controlled
  • Code-reviewed
  • Automatically expiring
  • Auditable

Why This Matters for Your Team

  • Platform teams: Enforce waste thresholds across departments without manual hunting
  • FinOps teams: Audit trail + expiry dates = zero "forgotten" waste
  • DevOps/SREs: Exceptions treated like infrastructure code (belong in git)
  • Security/Compliance: Every exception is a documented, reviewable approval

Next Steps

Learn more:

GitHub: https://github.com/cleancloud-io/cleancloud


What's your current approach to managing exceptions? Spreadsheets, Jira, or something else? Drop a comment below.


Originally published on getcleancloud.com

Top comments (0)