DEV Community

Suresh
Suresh

Posted on • Originally published at getcleancloud.com

The Hidden Cost of Idle AI/ML Infrastructure: SageMaker, AML, and Vertex AI

TL;DR: Idle AI/ML endpoints burn $500-$23K per month unnoticed. Here's how to detect them across AWS, Azure, and GCP — and stop them automatically in CI.

Tools like CleanCloud surface these across AWS, Azure, and GCP — and can enforce detection in CI before the bill hits.


AI/ML Waste: The New Blind Spot

For the past several years, the dominant form of cloud waste was infrastructure sprawl — orphaned EBS volumes, idle NAT gateways, stopped VMs that were never deallocated. These resources are expensive in aggregate, but individually small. A forgotten EBS volume costs $10-$40/month. An unattached Elastic IP costs pennies.

AI/ML infrastructure operates at a completely different scale. A single idle SageMaker endpoint backed by a GPU instance can cost more than a thousand stopped EC2 instances. A Vertex AI endpoint running an undeployed model on a high-memory accelerator can burn through $23,000 a month with zero traffic.

This is the new waste category that FinOps dashboards were never designed to catch.

Tools like CleanCloud approach this differently: instead of dashboards, they scan for deterministic signals (like zero invocations over time) and flag idle endpoints directly — making them enforceable in CI/CD rather than something you discover weeks later in billing.

Key stats:

  • SageMaker GPU endpoint, idle: ~$500/month (minimum)
  • SageMaker p4d.24xlarge endpoint, idle: $23K+/month
  • Default idle window before detection: 7 days
  • Typical "forgotten" endpoint: 6+ weeks of unnoticed billing

The pattern is consistent across all three major clouds. A data scientist spins up an endpoint to serve a model — for a hackathon, a proof-of-concept, a demo. The event ends. The endpoint doesn't. Six weeks later, finance flags an anomaly in the ML budget. By then, the team has scattered, the model is stale, and the endpoint has cost more than the original experiment was worth.


AWS: Idle SageMaker Endpoints

SageMaker endpoints are always-on serving infrastructure. Unlike Lambda or Fargate, an endpoint doesn't scale to zero when traffic stops. The underlying EC2 instance keeps running, and AWS keeps billing — at full on-demand rates, including any attached GPU capacity.

How the rule works

The aws.sagemaker.endpoint.idle rule queries CloudWatch Metrics for InvocationCount across the idle window (default: 7 days). An endpoint with zero invocations over that period is flagged. The confidence level and cost estimate depend on the instance type:

Instance Type Approx. Monthly Cost (Idle) Confidence
ml.t3.medium ~$30 MEDIUM
ml.m5.xlarge ~$140 MEDIUM
ml.g4dn.xlarge ~$500 HIGH
ml.g4dn.12xlarge ~$1,800 HIGH
ml.p3.2xlarge ~$2,800 HIGH
ml.p4d.24xlarge ~$23,000+ HIGH

GPU-backed endpoints are rated HIGH confidence because the cost impact is unambiguous — zero invocations over 7 days on a GPU instance is definitively idle waste. CPU-backed endpoints at lower cost tiers are rated MEDIUM because they may be handling low but real traffic below the detection threshold.

Example CleanCloud output

Rule: aws.sagemaker.endpoint.idle
Resource: fraud-detection-v2 (us-east-1)
Instance: ml.g4dn.xlarge
Idle: 14 days (zero InvocationCount)
Confidence: HIGH
Estimated Cost: ~$504/month
Enter fullscreen mode Exit fullscreen mode

Scanning for idle SageMaker endpoints in CI

# .github/workflows/ai-hygiene.yml
name: AI/ML Waste Detection

on:
  schedule:
    - cron: '0 9 * * MON'  # Weekly

jobs:
  scan:
    runs-on: ubuntu-latest
    permissions:
      id-token: write
      contents: read
    steps:
      - uses: actions/checkout@v4

      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::${{ vars.AWS_ACCOUNT_ID }}:role/CleanCloudCIReadOnly
          aws-region: us-east-1

      - uses: cleancloud-io/scan-action@v1
        with:
          provider: aws
          category: ai
          fail-on-cost: 500
          output: json
          output-file: findings.json

      - name: Report to Slack (optional)
        if: failure()
        uses: slackapi/slack-github-action@v1
        with:
          payload: |
            {
              "text": "SageMaker endpoints detected with idle waste ($500+/month): Check findings.json"
            }
Enter fullscreen mode Exit fullscreen mode

Azure: Idle Machine Learning Compute Clusters

Azure Machine Learning (AML) compute clusters are provisioned clusters that you manage. Unlike serverless options, AML clusters have a minimum node count that stays running even when no training jobs are active.

How the rule works

The azure.ml.compute.idle rule checks:

  1. Cluster has minimum node count > 0
  2. No training jobs submitted in the idle window (default: 14 days)
  3. Cluster is still in "Running" state (not deallocated)

Result: A cluster provisioned for a pilot project that finished months ago is still spinning, still billing.

Cost examples

Cluster Type Approx. Monthly Cost (Idle) Confidence
Standard_D2s_v3 (2 CPU, 8GB RAM) ~$60/month MEDIUM
Standard_NC6_v3 (6 CPU, 1 GPU, 112GB RAM) ~$600/month HIGH
Standard_ND40rs_v2 (8 GPU, 24 CPU, 948GB RAM) ~$15,000/month HIGH

Detection in CI

- uses: cleancloud-io/scan-action@v1
  with:
    provider: azure
    category: ai
    fail-on-cost: 500
Enter fullscreen mode Exit fullscreen mode

GCP: Idle Vertex AI Prediction Endpoints

Vertex AI is Google's managed ML platform. Prediction endpoints are always-on infrastructure for real-time model serving. Unlike batch jobs, endpoints don't stop — they keep running and billing even with zero predictions.

How the rule works

The gcp.vertex.prediction.endpoint.idle rule queries Vertex AI metrics for prediction activity over the idle window (default: 14 days). An endpoint with zero or near-zero predictions is flagged. Confidence is HIGH for GPU-backed endpoints.

Cost examples

Instance Type Approx. Monthly Cost (Idle) Confidence
n1-standard-4 (CPU) ~$150/month MEDIUM
nvidia-tesla-k80 (1 GPU) ~$800/month HIGH
nvidia-tesla-v100 (1 GPU) ~$2,500/month HIGH
nvidia-tesla-a100 (1 GPU) ~$10,000/month HIGH

Detection in CI

- uses: cleancloud-io/scan-action@v1
  with:
    provider: gcp
    category: ai
    all-projects: true
    fail-on-cost: 500
Enter fullscreen mode Exit fullscreen mode

Multi-Cloud Detection: A Real Example

One organization ran:

  • 3 idle SageMaker endpoints (ml.g4dn.xlarge) = $1,500/month
  • 1 idle AML cluster (GPU-backed) = $600/month
  • 2 idle Vertex AI endpoints (Tesla K80) = $1,600/month

Total monthly waste: $3,700

Detected by: Single weekly scan across all three clouds


Detecting AI/ML Waste at Scale

Option 1: Weekly scheduled scan (CI-based)

# Runs every Monday at 9am
cleancloud scan --provider aws --category ai --all-regions \
  --provider azure --category ai \
  --provider gcp --category ai --all-projects \
  --fail-on-cost 500 \
  --output json --output-file ai-waste.json
Enter fullscreen mode Exit fullscreen mode

Option 2: Per-deployment scan

Run AI/ML detection before prod deployment to catch new endpoints:

- name: Pre-deploy AI/ML check
  run: |
    cleancloud scan --provider aws --category ai \
      --fail-on-confidence HIGH
Enter fullscreen mode Exit fullscreen mode

Option 3: Policy-as-code enforcement

Suppress intentional AI/ML infrastructure:

# cleancloud.yaml
exceptions:
  - rule_id: aws.sagemaker.endpoint.idle
    resource_id: demo-endpoint-prod
    reason: "Production serving endpoint for customer X"
    expires_at: "2026-12-31"

thresholds:
  fail_on_cost: 500  # CI gate: fail if monthly AI/ML waste > $500
Enter fullscreen mode Exit fullscreen mode

Why This Matters

  • Data science teams: Spot forgotten endpoints before they blow budgets
  • Platform teams: Enforce endpoint lifecycle policies automatically
  • FinOps teams: AI/ML waste visibility in CI/CD, not quarterly surprises
  • Finance teams: Audit trail of detected waste and enforcement decisions

Getting Started

Step 1: Try the demo

pipx install cleancloud
cleancloud demo --category ai
Enter fullscreen mode Exit fullscreen mode

This shows sample AI/ML waste findings without needing cloud credentials.

Step 2: Scan your cloud

cleancloud scan --provider aws --category ai --all-regions
cleancloud scan --provider azure --category ai
cleancloud scan --provider gcp --category ai --all-projects
Enter fullscreen mode Exit fullscreen mode

Step 3: Add to CI/CD

Copy the workflow example above and commit to .github/workflows/ai-hygiene.yml

Step 4: Use policy-as-code

Add cleancloud.yaml to document which endpoints are intentional (and when they expire).


Next Steps

Learn more:

GitHub: https://github.com/cleancloud-io/cleancloud


What's your biggest source of AI/ML waste? SageMaker endpoints, AML clusters, or Vertex AI endpoints? Share in the comments.


Originally published on getcleancloud.com

Top comments (0)