After getting burned by an auto-cleanup tool that deleted a "test" database (it wasn't a test), I built CleanCloud.
The Problem
Modern cloud environments are messy:
- Teams spin up resources constantly
- Deployments create and destroy infrastructure
- Resources get orphaned when instances are terminated
- Nobody knows what's safe to delete
Most cloud hygiene tools fall into two camps:
- Auto-delete everything → Too dangerous for production
- Flag everything → Too noisy to be useful
Both approaches fail when you have:
- Elastic infrastructure (autoscaling, spot instances)
- Multiple teams with different ownership
- Resources that look unused but are actually important
Why Auto-Delete Fails
I learned this the hard way.
A "smart" cleanup tool we tried:
- Saw a database with no connections for 7 days
- Assumed it was orphaned
- Deleted it automatically
- Turned out it was a quarterly reporting database
Cost of that mistake: 3 days of recovery, angry CFO, lost trust in automation.
The blast radius of deleting the wrong resource is orders of magnitude higher than leaving it running for a few more weeks.
CleanCloud's Approach: Signal First, Act Later
Instead of automating cleanup, CleanCloud answers a safer question:
"Which resources deserve human review — and how confident are we?"
Core principles:
1. Read-Only Always
# Required AWS permissions - notice no Delete* or Modify*
{
"Action": [
"ec2:DescribeVolumes",
"ec2:DescribeSnapshots",
"logs:DescribeLogGroups",
"s3:ListAllMyBuckets"
]
}
No write permissions. Ever. Safe to run in production.
2. Conservative Signals
Not just "is this unattached?" but:
- How long has it been unattached? (14+ days = HIGH confidence)
- Multiple signals required (age + state + tags)
- Explicit confidence levels: LOW, MEDIUM, HIGH
Example:
🔴 HIGH confidence: Volume unattached for 45 days
🟡 MEDIUM confidence: Volume unattached for 10 days
🟢 LOW confidence: Volume unattached for 3 days (probably autoscaling)
3. Review-Only Recommendations
CleanCloud never says "delete this." It says:
"This volume has been unattached for 45 days, has no tags, and doesn't match any known deployment patterns. Worth reviewing."
Humans make the final call.
What It Detects
AWS Rules (4 currently)
- Unattached EBS volumes (14+ days = HIGH confidence)
- Old snapshots (365+ days = HIGH confidence)
- CloudWatch logs with infinite retention (30+ days = HIGH confidence)
- Untagged resources (ownership unclear = MEDIUM confidence)
Azure Rules (4 currently)
- Unattached managed disks (14+ days = HIGH confidence)
- Old snapshots (90+ days = HIGH confidence)
- Unused public IPs (immediate = HIGH confidence)
- Untagged resources (MEDIUM confidence)
Week 1 Results
Released last week. Here's what happened:
Stats:
- 300+ downloads (170 real users, rest are PyPI mirrors)
- 0 production incidents (because read-only!)
- Most common finding: 15-30 unattached EBS volumes per AWS account
User feedback themes:
- "Finally, a tool I can trust in production"
- "Found $2K/month in waste in first scan"
- "Love that it explains WHY something was flagged"
Quick Start
# Install
pip install cleancloud
# Validate credentials
cleancloud doctor --provider aws
# Scan single region
cleancloud scan --provider aws --region us-east-1
# Scan all active regions
cleancloud scan --provider aws --all-regions
# Output to JSON
cleancloud scan \
--provider aws \
--all-regions \
--output json \
--output-file results.json
Example Output
$ cleancloud scan --provider aws --region us-east-1
🔍 Scanning region us-east-1
Found 12 findings:
HIGH confidence: 8
MEDIUM confidence: 4
Top findings:
• vol-0abc123 - Unattached volume (45 days, 100GB) - ~$10/mo
• snap-0def456 - Old snapshot (120 days, 500GB) - ~$25/mo
• log-group-xyz - Infinite retention (2.1GB stored) - ~$6/mo
💰 Estimated monthly waste: ~$156
Review findings and decide what to delete.
CI/CD Integration
Built for pipelines with predictable exit codes:
# GitHub Actions example
- name: Run hygiene scan
run: |
pip install cleancloud
cleancloud scan \
--provider aws \
--all-regions \
--fail-on-confidence HIGH
Exit codes:
-
0= Success (no policy violations) -
1= Configuration error -
2= Policy violation (findings detected) -
3= Missing credentials
Use cases:
- Block PRs with HIGH confidence findings
- Generate weekly hygiene reports
- Enforce tagging standards
- Prevent resource leaks in development
Authentication: OIDC First
No long-lived credentials needed:
AWS (GitHub Actions)
- name: Configure AWS credentials (OIDC)
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::ACCOUNT:role/CleanCloudReadOnly
aws-region: us-east-1
- name: Scan
run: cleancloud scan --provider aws
Azure (GitHub Actions)
- name: Azure Login (OIDC)
uses: azure/login@v2
with:
client-id: ${{ secrets.AZURE_CLIENT_ID }}
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
- name: Scan
run: cleancloud scan --provider azure
No AWS_SECRET_ACCESS_KEY or AZURE_CLIENT_SECRET needed. ✅
What CleanCloud is NOT
Not a cost optimization tool
- Doesn't access billing data
- Doesn't recommend rightsizing
- Focuses on hygiene, not savings
Not a FinOps platform
- No dashboards
- No cost tracking
- Just clean signals
Not an auto-remediation service
- Will never delete anything
- Will never modify resources
- Will never tag resources
This is a strategic design choice, not a limitation.
Privacy & Telemetry
CleanCloud collects zero telemetry.
No analytics. No tracking. No phone-home.
Why?
- Security tools shouldn't send data anywhere
- Works in air-gapped environments
- No opt-out flags needed
- Zero risk of leaking account info
We improve based on:
- GitHub issues
- Direct feedback
- Community contributions
What's Next
v0.3.1 just shipped with:
- Complete documentation overhaul
- Smarter AWS region auto-detection
- Enhanced diagnostics with security grading
- Fixed region detection bugs
Coming soon:
- GCP support
- Additional rules (unused Elastic IPs, old AMIs)
- Rule filtering (
--rulesflag) - Historical tracking
Not planned:
- Automated cleanup
- Cost optimization
- Billing data access
CleanCloud will remain focused on safe hygiene detection, not automation.
Design Philosophy
Three core principles:
1. Conservative by Default
- Age-based confidence thresholds
- Multiple signals required
- Prefer false negatives over false positives
2. Read-Only Always
- No Delete* permissions
- No Tag* permissions
- No modification APIs
- Safe for production
3. Review-Only Recommendations
- Findings are candidates for review, not automated action
- Clear reasoning for each finding
- Humans stay in control
Who Is This For?
Primary users:
- SRE teams
- Platform engineers
- Infrastructure teams
Stakeholders:
- Security (read-only = passes security reviews)
- Compliance (SOC2/ISO27001 friendly)
- FinOps (identifies waste without aggressive optimization)
Not for:
- Teams wanting auto-cleanup
- Cost optimization as primary goal
- Aggressive savings recommendations
Real Talk: Why I Built This
I've seen too many "smart" automation tools cause outages:
- Auto-scaler that scaled to zero during a traffic spike
- Cleanup tool that deleted "unused" security groups (broke production)
- Cost optimizer that downsized a database (performance disaster)
The pattern: Automation is confident. Humans are cautious. Production requires caution.
CleanCloud is designed for teams who value trust over automation.
Try It
GitHub: https://github.com/sureshcsdp/cleancloud
Install: pip install cleancloud
Docs: Complete setup guides for AWS and Azure
Looking for feedback:
- What cloud hygiene tools do you currently use?
- Would read-only signals be useful for your team?
- What features would make this production-ready for you?
Open source, MIT license. Contributions welcome!
If you found this useful:
- ⭐ Star the repo
- 💬 Share your cloud hygiene horror stories in the comments
- 🐛 Report issues or suggest features
Built for SRE teams who value trust over automation.
Top comments (0)