TL;DR: Idle AI/ML endpoints burn $500-$23K per month unnoticed. Here's how to detect them across AWS, Azure, and GCP — and stop them automatically in CI.
Tools like CleanCloud surface these across AWS, Azure, and GCP — and can enforce detection in CI before the bill hits.
AI/ML Waste: The New Blind Spot
For the past several years, the dominant form of cloud waste was infrastructure sprawl — orphaned EBS volumes, idle NAT gateways, stopped VMs that were never deallocated. These resources are expensive in aggregate, but individually small. A forgotten EBS volume costs $10-$40/month. An unattached Elastic IP costs pennies.
AI/ML infrastructure operates at a completely different scale. A single idle SageMaker endpoint backed by a GPU instance can cost more than a thousand stopped EC2 instances. A Vertex AI endpoint running an undeployed model on a high-memory accelerator can burn through $23,000 a month with zero traffic.
This is the new waste category that FinOps dashboards were never designed to catch.
Tools like CleanCloud approach this differently: instead of dashboards, they scan for deterministic signals (like zero invocations over time) and flag idle endpoints directly — making them enforceable in CI/CD rather than something you discover weeks later in billing.
Key stats:
- SageMaker GPU endpoint, idle: ~$500/month (minimum)
- SageMaker p4d.24xlarge endpoint, idle: $23K+/month
- Default idle window before detection: 7 days
- Typical "forgotten" endpoint: 6+ weeks of unnoticed billing
The pattern is consistent across all three major clouds. A data scientist spins up an endpoint to serve a model — for a hackathon, a proof-of-concept, a demo. The event ends. The endpoint doesn't. Six weeks later, finance flags an anomaly in the ML budget. By then, the team has scattered, the model is stale, and the endpoint has cost more than the original experiment was worth.
AWS: Idle SageMaker Endpoints
SageMaker endpoints are always-on serving infrastructure. Unlike Lambda or Fargate, an endpoint doesn't scale to zero when traffic stops. The underlying EC2 instance keeps running, and AWS keeps billing — at full on-demand rates, including any attached GPU capacity.
How the rule works
The aws.sagemaker.endpoint.idle rule queries CloudWatch Metrics for InvocationCount across the idle window (default: 7 days). An endpoint with zero invocations over that period is flagged. The confidence level and cost estimate depend on the instance type:
| Instance Type | Approx. Monthly Cost (Idle) | Confidence |
|---|---|---|
| ml.t3.medium | ~$30 | MEDIUM |
| ml.m5.xlarge | ~$140 | MEDIUM |
| ml.g4dn.xlarge | ~$500 | HIGH |
| ml.g4dn.12xlarge | ~$1,800 | HIGH |
| ml.p3.2xlarge | ~$2,800 | HIGH |
| ml.p4d.24xlarge | ~$23,000+ | HIGH |
GPU-backed endpoints are rated HIGH confidence because the cost impact is unambiguous — zero invocations over 7 days on a GPU instance is definitively idle waste. CPU-backed endpoints at lower cost tiers are rated MEDIUM because they may be handling low but real traffic below the detection threshold.
Example CleanCloud output
Rule: aws.sagemaker.endpoint.idle
Resource: fraud-detection-v2 (us-east-1)
Instance: ml.g4dn.xlarge
Idle: 14 days (zero InvocationCount)
Confidence: HIGH
Estimated Cost: ~$504/month
Scanning for idle SageMaker endpoints in CI
# .github/workflows/ai-hygiene.yml
name: AI/ML Waste Detection
on:
schedule:
- cron: '0 9 * * MON' # Weekly
jobs:
scan:
runs-on: ubuntu-latest
permissions:
id-token: write
contents: read
steps:
- uses: actions/checkout@v4
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::${{ vars.AWS_ACCOUNT_ID }}:role/CleanCloudCIReadOnly
aws-region: us-east-1
- uses: cleancloud-io/scan-action@v1
with:
provider: aws
category: ai
fail-on-cost: 500
output: json
output-file: findings.json
- name: Report to Slack (optional)
if: failure()
uses: slackapi/slack-github-action@v1
with:
payload: |
{
"text": "SageMaker endpoints detected with idle waste ($500+/month): Check findings.json"
}
Azure: Idle Machine Learning Compute Clusters
Azure Machine Learning (AML) compute clusters are provisioned clusters that you manage. Unlike serverless options, AML clusters have a minimum node count that stays running even when no training jobs are active.
How the rule works
The azure.ml.compute.idle rule checks:
- Cluster has minimum node count > 0
- No training jobs submitted in the idle window (default: 14 days)
- Cluster is still in "Running" state (not deallocated)
Result: A cluster provisioned for a pilot project that finished months ago is still spinning, still billing.
Cost examples
| Cluster Type | Approx. Monthly Cost (Idle) | Confidence |
|---|---|---|
| Standard_D2s_v3 (2 CPU, 8GB RAM) | ~$60/month | MEDIUM |
| Standard_NC6_v3 (6 CPU, 1 GPU, 112GB RAM) | ~$600/month | HIGH |
| Standard_ND40rs_v2 (8 GPU, 24 CPU, 948GB RAM) | ~$15,000/month | HIGH |
Detection in CI
- uses: cleancloud-io/scan-action@v1
with:
provider: azure
category: ai
fail-on-cost: 500
GCP: Idle Vertex AI Prediction Endpoints
Vertex AI is Google's managed ML platform. Prediction endpoints are always-on infrastructure for real-time model serving. Unlike batch jobs, endpoints don't stop — they keep running and billing even with zero predictions.
How the rule works
The gcp.vertex.prediction.endpoint.idle rule queries Vertex AI metrics for prediction activity over the idle window (default: 14 days). An endpoint with zero or near-zero predictions is flagged. Confidence is HIGH for GPU-backed endpoints.
Cost examples
| Instance Type | Approx. Monthly Cost (Idle) | Confidence |
|---|---|---|
| n1-standard-4 (CPU) | ~$150/month | MEDIUM |
| nvidia-tesla-k80 (1 GPU) | ~$800/month | HIGH |
| nvidia-tesla-v100 (1 GPU) | ~$2,500/month | HIGH |
| nvidia-tesla-a100 (1 GPU) | ~$10,000/month | HIGH |
Detection in CI
- uses: cleancloud-io/scan-action@v1
with:
provider: gcp
category: ai
all-projects: true
fail-on-cost: 500
Multi-Cloud Detection: A Real Example
One organization ran:
- 3 idle SageMaker endpoints (ml.g4dn.xlarge) = $1,500/month
- 1 idle AML cluster (GPU-backed) = $600/month
- 2 idle Vertex AI endpoints (Tesla K80) = $1,600/month
Total monthly waste: $3,700
Detected by: Single weekly scan across all three clouds
Detecting AI/ML Waste at Scale
Option 1: Weekly scheduled scan (CI-based)
# Runs every Monday at 9am
cleancloud scan --provider aws --category ai --all-regions \
--provider azure --category ai \
--provider gcp --category ai --all-projects \
--fail-on-cost 500 \
--output json --output-file ai-waste.json
Option 2: Per-deployment scan
Run AI/ML detection before prod deployment to catch new endpoints:
- name: Pre-deploy AI/ML check
run: |
cleancloud scan --provider aws --category ai \
--fail-on-confidence HIGH
Option 3: Policy-as-code enforcement
Suppress intentional AI/ML infrastructure:
# cleancloud.yaml
exceptions:
- rule_id: aws.sagemaker.endpoint.idle
resource_id: demo-endpoint-prod
reason: "Production serving endpoint for customer X"
expires_at: "2026-12-31"
thresholds:
fail_on_cost: 500 # CI gate: fail if monthly AI/ML waste > $500
Why This Matters
- Data science teams: Spot forgotten endpoints before they blow budgets
- Platform teams: Enforce endpoint lifecycle policies automatically
- FinOps teams: AI/ML waste visibility in CI/CD, not quarterly surprises
- Finance teams: Audit trail of detected waste and enforcement decisions
Getting Started
Step 1: Try the demo
pipx install cleancloud
cleancloud demo --category ai
This shows sample AI/ML waste findings without needing cloud credentials.
Step 2: Scan your cloud
cleancloud scan --provider aws --category ai --all-regions
cleancloud scan --provider azure --category ai
cleancloud scan --provider gcp --category ai --all-projects
Step 3: Add to CI/CD
Copy the workflow example above and commit to .github/workflows/ai-hygiene.yml
Step 4: Use policy-as-code
Add cleancloud.yaml to document which endpoints are intentional (and when they expire).
Next Steps
Learn more:
- Full guide: https://www.getcleancloud.com/blog/idle-ai-ml-infrastructure-cost.html
- Detection rules: https://github.com/cleancloud-io/cleancloud/blob/main/docs/rules.md
- Configuration: https://github.com/cleancloud-io/cleancloud/blob/main/docs/configuration.md
- Try it: cleancloud scan --provider aws --category ai --all-regions
GitHub: https://github.com/cleancloud-io/cleancloud
What's your biggest source of AI/ML waste? SageMaker endpoints, AML clusters, or Vertex AI endpoints? Share in the comments.
Originally published on getcleancloud.com
Top comments (0)