Suresh

Posted on Dec 30, 2025 • Edited on Dec 31, 2025

I built a read-only AWS/Azure hygiene scanner (because auto-delete is too risky)

#aws #azure #devops #opensource

After getting burned by an auto-cleanup tool that deleted a "test" database (it wasn't a test), I built CleanCloud.

The Problem

Modern cloud environments are messy:

Teams spin up resources constantly
Deployments create and destroy infrastructure
Resources get orphaned when instances are terminated
Nobody knows what's safe to delete

Most cloud hygiene tools fall into two camps:

Auto-delete everything → Too dangerous for production
Flag everything → Too noisy to be useful

Both approaches fail when you have:

Elastic infrastructure (autoscaling, spot instances)
Multiple teams with different ownership
Resources that look unused but are actually important

Why Auto-Delete Fails

I learned this the hard way.

A "smart" cleanup tool we tried:

Saw a database with no connections for 7 days
Assumed it was orphaned
Deleted it automatically
Turned out it was a quarterly reporting database

Cost of that mistake: 3 days of recovery, angry CFO, lost trust in automation.

The blast radius of deleting the wrong resource is orders of magnitude higher than leaving it running for a few more weeks.

CleanCloud's Approach: Signal First, Act Later

Instead of automating cleanup, CleanCloud answers a safer question:

"Which resources deserve human review — and how confident are we?"

Core principles:

1. Read-Only Always

# Required AWS permissions - notice no Delete* or Modify*
{
  "Action": [
    "ec2:DescribeVolumes",
    "ec2:DescribeSnapshots",
    "logs:DescribeLogGroups",
    "s3:ListAllMyBuckets"
  ]
}

No write permissions. Ever. Safe to run in production.

2. Conservative Signals

Not just "is this unattached?" but:

How long has it been unattached? (14+ days = HIGH confidence)
Multiple signals required (age + state + tags)
Explicit confidence levels: LOW, MEDIUM, HIGH

Example:

🔴 HIGH confidence: Volume unattached for 45 days
🟡 MEDIUM confidence: Volume unattached for 10 days  
🟢 LOW confidence: Volume unattached for 3 days (probably autoscaling)

3. Review-Only Recommendations

CleanCloud never says "delete this." It says:

"This volume has been unattached for 45 days, has no tags, and doesn't match any known deployment patterns. Worth reviewing."

Humans make the final call.

What It Detects

AWS Rules (4 currently)

Unattached EBS volumes (14+ days = HIGH confidence)
Old snapshots (365+ days = HIGH confidence)
CloudWatch logs with infinite retention (30+ days = HIGH confidence)
Untagged resources (ownership unclear = MEDIUM confidence)

Azure Rules (4 currently)

Unattached managed disks (14+ days = HIGH confidence)
Old snapshots (90+ days = HIGH confidence)
Unused public IPs (immediate = HIGH confidence)
Untagged resources (MEDIUM confidence)

Week 1 Results

Released last week. Here's what happened:

Stats:

300+ downloads (170 real users, rest are PyPI mirrors)
0 production incidents (because read-only!)
Most common finding: 15-30 unattached EBS volumes per AWS account

User feedback themes:

"Finally, a tool I can trust in production"
"Found $2K/month in waste in first scan"
"Love that it explains WHY something was flagged"

Quick Start

# Install
pip install cleancloud

# Validate credentials
cleancloud doctor --provider aws

# Scan single region
cleancloud scan --provider aws --region us-east-1

# Scan all active regions
cleancloud scan --provider aws --all-regions

# Output to JSON
cleancloud scan \
  --provider aws \
  --all-regions \
  --output json \
  --output-file results.json

Example Output

$ cleancloud scan --provider aws --region us-east-1

🔍 Scanning region us-east-1

Found 12 findings:
  HIGH confidence: 8
  MEDIUM confidence: 4

Top findings:
  • vol-0abc123 - Unattached volume (45 days, 100GB) - ~$10/mo
  • snap-0def456 - Old snapshot (120 days, 500GB) - ~$25/mo
  • log-group-xyz - Infinite retention (2.1GB stored) - ~$6/mo

💰 Estimated monthly waste: ~$156

Review findings and decide what to delete.

CI/CD Integration

Built for pipelines with predictable exit codes:

# GitHub Actions example
- name: Run hygiene scan
  run: |
    pip install cleancloud
    cleancloud scan \
      --provider aws \
      --all-regions \
      --fail-on-confidence HIGH

Exit codes:

0 = Success (no policy violations)
1 = Configuration error
2 = Policy violation (findings detected)
3 = Missing credentials

Use cases:

Block PRs with HIGH confidence findings
Generate weekly hygiene reports
Enforce tagging standards
Prevent resource leaks in development

Authentication: OIDC First

No long-lived credentials needed:

AWS (GitHub Actions)

- name: Configure AWS credentials (OIDC)
  uses: aws-actions/configure-aws-credentials@v4
  with:
    role-to-assume: arn:aws:iam::ACCOUNT:role/CleanCloudReadOnly
    aws-region: us-east-1

- name: Scan
  run: cleancloud scan --provider aws

Azure (GitHub Actions)

- name: Azure Login (OIDC)
  uses: azure/login@v2
  with:
    client-id: ${{ secrets.AZURE_CLIENT_ID }}
    tenant-id: ${{ secrets.AZURE_TENANT_ID }}
    subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}

- name: Scan
  run: cleancloud scan --provider azure

No AWS_SECRET_ACCESS_KEY or AZURE_CLIENT_SECRET needed. ✅

What CleanCloud is NOT

Not a cost optimization tool

Doesn't access billing data
Doesn't recommend rightsizing
Focuses on hygiene, not savings

Not a FinOps platform

No dashboards
No cost tracking
Just clean signals

Not an auto-remediation service

Will never delete anything
Will never modify resources
Will never tag resources

This is a strategic design choice, not a limitation.

Privacy & Telemetry

CleanCloud collects zero telemetry.

No analytics. No tracking. No phone-home.

Why?

Security tools shouldn't send data anywhere
Works in air-gapped environments
No opt-out flags needed
Zero risk of leaking account info

We improve based on:

GitHub issues
Direct feedback
Community contributions

What's Next

v0.3.1 just shipped with:

Complete documentation overhaul
Smarter AWS region auto-detection
Enhanced diagnostics with security grading
Fixed region detection bugs

Coming soon:

GCP support
Additional rules (unused Elastic IPs, old AMIs)
Rule filtering (--rules flag)
Historical tracking

Not planned:

Automated cleanup
Cost optimization
Billing data access

CleanCloud will remain focused on safe hygiene detection, not automation.

Design Philosophy

Three core principles:

1. Conservative by Default

Age-based confidence thresholds
Multiple signals required
Prefer false negatives over false positives

2. Read-Only Always

No Delete* permissions
No Tag* permissions
No modification APIs
Safe for production

3. Review-Only Recommendations

Findings are candidates for review, not automated action
Clear reasoning for each finding
Humans stay in control

Who Is This For?

Primary users:

SRE teams
Platform engineers
Infrastructure teams

Stakeholders:

Security (read-only = passes security reviews)
Compliance (SOC2/ISO27001 friendly)
FinOps (identifies waste without aggressive optimization)

Not for:

Teams wanting auto-cleanup
Cost optimization as primary goal
Aggressive savings recommendations

Real Talk: Why I Built This

I've seen too many "smart" automation tools cause outages:

Auto-scaler that scaled to zero during a traffic spike
Cleanup tool that deleted "unused" security groups (broke production)
Cost optimizer that downsized a database (performance disaster)

The pattern: Automation is confident. Humans are cautious. Production requires caution.

CleanCloud is designed for teams who value trust over automation.

Try It

GitHub: https://github.com/cleancloud-io/cleancloud

Install: pip install cleancloud

Docs: Complete setup guides for AWS and Azure

Looking for feedback:

What cloud hygiene tools do you currently use?
Would read-only signals be useful for your team?
What features would make this production-ready for you?

Open source, MIT license. Contributions welcome!

If you found this useful:

⭐ Star the repo
💬 Share your cloud hygiene horror stories in the comments
🐛 Report issues or suggest features

Built for SRE teams who value trust over automation.

DEV Community

I built a read-only AWS/Azure hygiene scanner (because auto-delete is too risky)

The Problem

Why Auto-Delete Fails

CleanCloud's Approach: Signal First, Act Later

1. Read-Only Always

2. Conservative Signals

3. Review-Only Recommendations

What It Detects

AWS Rules (4 currently)

Azure Rules (4 currently)

Week 1 Results

Quick Start

Example Output

CI/CD Integration

Authentication: OIDC First

AWS (GitHub Actions)

Azure (GitHub Actions)

What CleanCloud is NOT

Privacy & Telemetry

What's Next

Design Philosophy

1. Conservative by Default

2. Read-Only Always

3. Review-Only Recommendations

Who Is This For?

Real Talk: Why I Built This

Try It

Top comments (0)