DEV Community

Cover image for How SageMaker Actually Bills: A Breakdown for Engineers
Rick Wise
Rick Wise

Posted on

How SageMaker Actually Bills: A Breakdown for Engineers

You deployed a SageMaker notebook to prototype a model. A week later, your AWS bill has a $280 line item you can't explain.

Sound familiar?

SageMaker is one of the most powerful ML platforms on AWS — and one of the most confusing to bill for. Unlike EC2 (one instance, one hourly rate), SageMaker has at least 12 independent billing dimensions spread across notebooks, training, endpoints, storage, data processing, and more. Each one ticks on its own meter.

This post breaks down every SageMaker billing component, shows you the real numbers, and highlights the traps that catch even experienced AWS engineers.


The Core Mental Model: SageMaker Is Not One Service

Think of SageMaker as a collection of services that share a console. Each has its own pricing:

┌─────────────────────────────────────────────────┐
│                 SageMaker                       │
│                                                 │
│  ┌──────────┐  ┌──────────┐  ┌───────────────┐  │
│  │ Notebooks│  │ Training │  │   Endpoints   │  │
│  │ (Dev)    │  │ (Build)  │  │   (Serve)     │  │
│  │ $/hr     │  │ $/hr     │  │   $/hr 24/7   │  │
│  └──────────┘  └──────────┘  └───────────────┘  │
│                                                 │
│  ┌──────────┐  ┌──────────┐  ┌───────────────┐  │
│  │Processing│  │ Storage  │  │  Data Wrangler│  │
│  │ (ETL)    │  │ (EBS+S3) │  │   (Prep)      │  │
│  │ $/hr     │  │ $/GB-mo  │  │   $/hr        │  │
│  └──────────┘  └──────────┘  └───────────────┘  │
│                                                 │
│  ┌──────────┐  ┌──────────┐  ┌───────────────┐  │
│  │ Canvas   │  │ Feature  │  │  Inference    │  │
│  │ (No-code)│  │ Store    │  │  Recommender  │  │
│  │ $/hr     │  │ $/GB+req │  │  (load test)  │  │
│  └──────────┘  └──────────┘  └───────────────┘  │
└─────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Each box bills independently. You can have zero training cost but be paying hundreds for an idle endpoint. Let's walk through each.


1. Notebook Instances: The Silent $37/month Drain

How it charges: Per-second billing while the notebook is in InService status. You pay for the instance whether or not you have a kernel running.

Instance Type $/Hour Monthly (24/7)
ml.t3.medium $0.05 $36.50
ml.t3.large $0.10 $73.00
ml.m5.large $0.115 $83.95
ml.m5.xlarge $0.23 $167.90
ml.c5.xlarge $0.204 $148.92

The trap: Notebooks keep billing even when you close the browser tab. The instance stays InService until you explicitly stop it. There's no auto-stop by default.

# Check for running notebooks right now
aws sagemaker list-notebook-instances \
  --status-equals InService \
  --query 'NotebookInstances[].{Name:NotebookInstanceName,Type:InstanceType,Created:CreationTime}' \
  --output table
Enter fullscreen mode Exit fullscreen mode

What catches teams off guard:

  • You spin up a ml.t3.medium to test a concept on Monday
  • You forget about it Friday
  • It runs for 4 weekends = 192 extra hours = $9.60 wasted per forgotten instance
  • Multiply by a team of 5 data scientists doing this regularly = real money

Cost-saving tip: Use SageMaker Studio notebooks with auto-shutdown instead of classic notebook instances. Or set a CloudWatch alarm:

# Alarm if notebook is InService for > 12 hours with no API activity
aws cloudwatch put-metric-alarm \
  --alarm-name "sagemaker-notebook-idle" \
  --namespace "AWS/SageMaker" \
  --metric-name "InvocationsPerInstance" \
  --dimensions Name=NotebookInstanceName,Value=my-notebook \
  --statistic Sum \
  --period 43200 \
  --threshold 0 \
  --comparison-operator LessThanOrEqualToThreshold \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:alert-topic
Enter fullscreen mode Exit fullscreen mode

Notebook Storage (Often Overlooked)

Each notebook instance has an EBS volume (default 5 GB, configurable up to 16 TB). You pay for it even when the notebook is stopped:

Volume Size $/Month
5 GB (default) $0.58
50 GB $5.80
500 GB $58.00

At $0.116/GB-month (gp2 pricing), a 500 GB volume costs $58/month just sitting there — even while the notebook is stopped.


2. Training Jobs: Pay-Per-Second, But Instance Choice Matters Enormously

How it charges: Per-second billing while the training job runs. No charge when it completes. The clock starts at instance launch and stops at job completion or failure.

Instance Type $/Hour Use Case
ml.m5.large $0.115 Tabular data, small models
ml.m5.xlarge $0.23 Medium models, preprocessing-heavy
ml.c5.xlarge $0.204 CPU-bound training (gradient boosting)
ml.p3.2xlarge $3.825 GPU training (deep learning)
ml.p3.8xlarge $14.688 Multi-GPU training
ml.p3.16xlarge $28.152 Distributed deep learning
ml.p4d.24xlarge $37.688 Large model training (8× A100 GPUs)
ml.g5.xlarge $1.408 Cost-effective GPU (single NVIDIA A10G)
ml.trn1.2xlarge $1.3438 AWS Trainium — optimized for training

The trap: GPU instances are eye-wateringly expensive per hour. A single ml.p3.2xlarge training job that takes 24 hours costs $91.80. If your hyperparameter tuning job launches 20 variants in parallel, that's $1,836 in one day.

Training Cost Formula

Training Cost = (instance_count × instance_price_per_second × training_duration_seconds)
              + (storage_gb × $0.116/GB-month × duration_fraction)
              + (data_download_from_s3)
Enter fullscreen mode Exit fullscreen mode

Spot Training: 60-90% Savings (With a Catch)

SageMaker supports managed spot training — using EC2 Spot Instances for training jobs:

estimator = sagemaker.estimator.Estimator(
    # ...
    use_spot_instances=True,
    max_wait=7200,       # Max time to wait for spot capacity
    max_run=3600,        # Max training time
    # checkpoint_s3_uri for spot interruption recovery
    checkpoint_s3_uri='s3://my-bucket/checkpoints/',
)
Enter fullscreen mode Exit fullscreen mode

Savings: Typically 60–90% off On-Demand pricing.

The catch: Spot instances can be interrupted. Your training job gets a 2-minute warning, then terminates. Without checkpointing, you lose all progress and pay for the time already consumed.

Pro tip: Always set checkpoint_s3_uri when using spot training. This saves model checkpoints to S3 so interrupted jobs can resume instead of restarting from scratch.

Managed Warm Pools (New)

If you run many training jobs in sequence (e.g., hyperparameter tuning), each job normally provisions a new instance from scratch (2–5 minutes startup). Warm pools keep instances running between jobs:

  • You pay for instance time during the keep-alive period
  • But you skip the ~3 minute cold start per job
  • Break-even: if you run enough sequential jobs that the saved startup time exceeds the keep-alive cost

3. Real-Time Endpoints: The Big One

This is where most SageMaker overspend happens. Endpoints run 24/7 and bill continuously, even with zero traffic.

How it charges: Per-second billing while the endpoint is InService. You pay for the full instance(s) whether they receive 0 or 10,000 requests per second.

Monthly Endpoint Cost = instance_count × hourly_rate × 730 hours
Enter fullscreen mode Exit fullscreen mode
Instance $/Hour Monthly (1 instance) Monthly (2 instances)
ml.t2.medium $0.065 $47.45 $94.90
ml.m5.large $0.115 $83.95 $167.90
ml.m5.xlarge $0.23 $167.90 $335.80
ml.c5.xlarge $0.204 $148.92 $297.84
ml.g4dn.xlarge $0.736 $537.28 $1,074.56
ml.p3.2xlarge $3.825 $2,792.25 $5,584.50
ml.inf1.xlarge $0.297 $216.81 $433.62

Read that again: A single ml.p3.2xlarge endpoint costs $2,792/month. Two instances for high availability: $5,585/month. Many teams deploy this, see it works, and forget to right-size.

The Idle Endpoint Problem

A SageMaker endpoint with zero invocations still costs the full hourly rate. Common scenarios:

  • Model was deployed for a demo → demo ended → endpoint left running
  • A/B testing: old variant endpoint wasn't deleted after the new model won
  • Dev/staging endpoints running 24/7 when they're only used during business hours
# Find endpoints with zero invocations in the last 7 days
for endpoint in $(aws sagemaker list-endpoints --status-equals InService \
  --query 'Endpoints[].EndpointName' --output text); do

  invocations=$(aws cloudwatch get-metric-statistics \
    --namespace AWS/SageMaker \
    --metric-name Invocations \
    --dimensions Name=EndpointName,Value="$endpoint" Name=VariantName,Value=AllTraffic \
    --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%S) \
    --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
    --period 604800 \
    --statistics Sum \
    --query 'Datapoints[0].Sum' --output text 2>/dev/null)

  if [ "$invocations" = "None" ] || [ "$invocations" = "0.0" ]; then
    echo "⚠️  IDLE: $endpoint (0 invocations in 7 days)"
  fi
done
Enter fullscreen mode Exit fullscreen mode

Multi-Model Endpoints: Pack More Models, Pay Less

If you have many low-traffic models, a Multi-Model Endpoint (MME) lets you load models on-demand into a single endpoint:

Standard:  10 models × ml.m5.large × 730 hrs = $839.50/month
MME:       1 endpoint × ml.m5.xlarge × 730 hrs = $167.90/month
                                      Savings:   $671.60/month (80%)
Enter fullscreen mode Exit fullscreen mode

The tradeoff: cold-start latency when loading a model that isn't cached. Fine for batch-like traffic; bad for latency-sensitive real-time inference.

Serverless Inference: Pay Only for What You Use

For sporadic traffic (< ~1000 requests/hour), Serverless Inference eliminates the always-on cost:

Pricing:
  - Memory: $0.0000016/GB-second
  - Requests: included

Example: 1000 requests/day, 500ms avg, 4GB memory
  = 1000 × 0.5s × 4GB × $0.0000016/GB-s × 30 days
  = $0.096/month  ← vs $83.95/month for ml.m5.large
Enter fullscreen mode Exit fullscreen mode

The catch: Cold starts (30s–2min for first invocation after idle period) and max 6 MB payload.


4. Storage: Three Hidden Meters

SageMaker storage costs come from three independent sources:

a) Notebook EBS Volumes

Already covered above: $0.116/GB-month, billed even when notebook is stopped.

b) Training Job Storage

Each training job gets a temporary EBS volume for input data and model artifacts:

  • Default: 30 GB per instance
  • Configurable up to 16 TB
  • Billed only during training (per-second)
  • SSD (gp2): $0.116/GB-month, prorated to seconds

c) Model Artifacts in S3

Trained models are stored in S3 as .tar.gz archives:

  • S3 Standard: $0.023/GB-month
  • A 5 GB model × 20 training runs = 100 GB = $2.30/month
  • But: large language model checkpoints can be 50–200 GB each
  • 10 checkpoints × 100 GB = 1 TB = $23/month

Pro tip: Set an S3 Lifecycle policy to move old model artifacts to S3 Glacier after 30 days:

{
  "Rules": [{
    "ID": "Archive old SageMaker models",
    "Filter": { "Prefix": "sagemaker/" },
    "Status": "Enabled",
    "Transitions": [{
      "Days": 30,
      "StorageClass": "GLACIER"
    }]
  }]
}
Enter fullscreen mode Exit fullscreen mode

5. Processing Jobs (ETL/Feature Engineering)

SageMaker Processing runs containerized data processing workloads:

How it charges: Same as training — per-second billing for the instances used.

Instance $/Hour 1-hour ETL job 8-hour daily ETL (monthly)
ml.m5.xlarge $0.23 $0.23 $55.20
ml.m5.4xlarge $0.922 $0.92 $221.28
ml.r5.4xlarge $1.21 $1.21 $290.40

The trap: Processing jobs often run as part of a pipeline. If your pipeline runs daily with 4 instances for 3 hours:

4 instances × ml.m5.xlarge × $0.23/hr × 3 hrs × 30 days = $82.80/month
Enter fullscreen mode Exit fullscreen mode

That's not huge — but if someone accidentally sets the pipeline to run hourly instead of daily:

4 × $0.23 × 3 × 24 × 30 = $1,987.20/month  ← oops
Enter fullscreen mode Exit fullscreen mode

6. SageMaker Savings Plans

AWS offers SageMaker Savings Plans — commit to a $/hour spend for 1 or 3 years in exchange for a discount:

Commitment Discount vs On-Demand
1-year, no upfront ~20%
1-year, partial upfront ~27%
1-year, all upfront ~30%
3-year, all upfront ~64%

What's covered: Notebook instances, Studio notebooks, training, processing, batch transform, real-time inference, and serverless inference.

What's NOT covered: Data transfer, S3 storage, EBS storage, CloudWatch, and any non-SageMaker charges.

Break-even: If you consistently spend > $100/month on SageMaker compute, a 1-year no-upfront plan likely saves you money. The commitment is dollar-based (e.g., "$0.50/hour"), not instance-based — so you can shift between instance types.


7. Data Transfer: The Other Hidden Cost

SageMaker data transfer charges are identical to EC2 data transfer:

Path Cost
S3 → SageMaker (same region) Free
SageMaker → S3 (same region) Free
Internet → SageMaker Free
SageMaker → Internet $0.09/GB (first 10 TB)
Cross-region S3 → SageMaker $0.01–0.02/GB
Cross-AZ (multi-instance training) $0.01/GB each way

The trap: Distributed training across multiple instances in different AZs generates inter-AZ data transfer charges for gradient synchronization. A training job with 8 ml.p3.16xlarge instances exchanging 100 GB of gradients per hour across AZs can add $2/hour in data transfer alone.

Mitigation: Use SageMaker's managed instance placement (it tries to co-locate instances in the same AZ). For distributed training, consider EFA (Elastic Fabric Adapter) enabled instances (ml.p4d.24xlarge, ml.trn1.32xlarge) — inter-node traffic over EFA is not charged.


8. The Full Bill Breakdown — A Realistic Example

Let's walk through a realistic monthly SageMaker bill for a mid-size ML team (3 data scientists, 2 models in production):

Development

Item Details Monthly Cost
3 Notebook instances ml.m5.large, ~160 hrs/month each $55.20
Notebook storage 3 × 50 GB $17.40
Subtotal $72.60

Training

Item Details Monthly Cost
Training jobs (CPU) 20 jobs × ml.m5.xlarge × 2 hrs $9.20
Training jobs (GPU) 5 jobs × ml.g5.xlarge × 4 hrs $28.16
HPO tuning 1 job × 50 trials × ml.g5.xlarge × 1 hr $70.40
Training storage 20 GB per job, 25 jobs $0.14
Subtotal $107.90

Inference (The Biggest Line Item)

Item Details Monthly Cost
Prod endpoint (Model A) 2× ml.m5.xlarge × 730 hrs $335.80
Prod endpoint (Model B) 2× ml.g4dn.xlarge × 730 hrs $1,074.56
Staging endpoint 1× ml.m5.large × 730 hrs $83.95
Subtotal $1,494.31

Other

Item Details Monthly Cost
Processing (daily ETL) 2× ml.m5.xlarge × 1 hr × 30 days $13.80
Model artifacts in S3 200 GB across all experiments $4.60
Data transfer (internet) 50 GB model serving responses $4.50
CloudWatch (metrics) Custom endpoint metrics $3.00
Subtotal $25.90

Total

Development:   $72.60    (4.3%)
Training:      $107.90   (6.3%)
Inference:     $1,494.31 (87.9%)  ← 88% of the bill
Other:         $25.90    (1.5%)
──────────────────────────────────
TOTAL:         $1,700.71/month
Enter fullscreen mode Exit fullscreen mode

The punchline: Nearly 88% of this team's SageMaker spend is inference endpoints running 24/7. The training — which is what the team actually thinks about and optimizes — is only 6% of the bill.


9. The 7 Most Common SageMaker Billing Mistakes

1. Leaving Notebook Instances Running

Cost: $37–$168/month per forgotten notebook

Fix: Use SageMaker Studio with auto-shutdown, or set a lifecycle config:

#!/bin/bash
# Auto-stop notebook after 1 hour of inactivity
IDLE_TIME=3600
if [ $(jupyter notebook list | grep -c "http") -eq 0 ]; then
  aws sagemaker stop-notebook-instance \
    --notebook-instance-name $(cat /opt/ml/metadata/resource-name)
fi
Enter fullscreen mode Exit fullscreen mode

2. Not Deleting Endpoints After Experimentation

Cost: $84–$2,792/month per forgotten endpoint

Fix: Tag endpoints with environment=dev and run a nightly cleanup:

# Delete all dev endpoints older than 3 days
aws sagemaker list-endpoints --status-equals InService \
  --query 'Endpoints[?CreationTime<`2026-02-18`].EndpointName' \
  --output text | xargs -I{} aws sagemaker delete-endpoint --endpoint-name {}
Enter fullscreen mode Exit fullscreen mode

3. Over-Provisioning Instance Types

Cost: 2–10× the necessary spend

Fix: Start with the smallest instance that works. Use CloudWatch to check actual utilization:

# Check CPU utilization of an endpoint
aws cloudwatch get-metric-statistics \
  --namespace /aws/sagemaker/Endpoints \
  --metric-name CPUUtilization \
  --dimensions Name=EndpointName,Value=my-endpoint \
    Name=VariantName,Value=AllTraffic \
  --start-time 2026-02-14T00:00:00 \
  --end-time 2026-02-21T00:00:00 \
  --period 3600 \
  --statistics Average \
  --query 'sort_by(Datapoints, &Timestamp)[].{Time:Timestamp,CPU:Average}'
Enter fullscreen mode Exit fullscreen mode

If average CPU is < 20%, you're likely over-provisioned. An ml.m5.xlarge at 15% utilization could be an ml.m5.large (50% cheaper).

4. Running Staging Endpoints 24/7

Cost: $84–$538/month for endpoints used ~8 hrs/day

Fix: Schedule endpoint creation/deletion:

# Scale staging endpoint to 0 instances at 7 PM, back to 1 at 8 AM
import boto3, json

application_autoscaling = boto3.client('application-autoscaling')

# Register the endpoint as a scalable target
application_autoscaling.register_scalable_target(
    ServiceNamespace='sagemaker',
    ResourceId='endpoint/staging-model/variant/AllTraffic',
    ScalableDimension='sagemaker:variant:DesiredInstanceCount',
    MinCapacity=0,
    MaxCapacity=2,
)

# Scale to 0 at 7 PM UTC
application_autoscaling.put_scheduled_action(
    ServiceNamespace='sagemaker',
    ScheduledActionName='scale-down-evening',
    ResourceId='endpoint/staging-model/variant/AllTraffic',
    ScalableDimension='sagemaker:variant:DesiredInstanceCount',
    Schedule='cron(0 19 * * ? *)',
    ScalableTargetAction={'MinCapacity': 0, 'MaxCapacity': 0},
)
Enter fullscreen mode Exit fullscreen mode

5. Not Using Spot for Training

Cost: 2–10× overpayment on training jobs

Fix: Add use_spot_instances=True + checkpoint_s3_uri to every training estimator.

6. Ignoring Multi-Model Endpoints for Low-Traffic Models

Cost: $84+/month per model × N models

Fix: Consolidate into a single MME. Works well for models with < 100 requests/hour.

7. No SageMaker Savings Plan

Cost: 20–64% overpayment on steady-state compute

Fix: Analyze 30 days of SageMaker usage → commit to a 1-year no-upfront Savings Plan for your baseline spend.


10. Quick-Reference Billing Cheat Sheet

Component Billing Model Minimum Charge Always On?
Notebook Instance Per-second (InService) 1 second Yes, until stopped
Studio Notebook Per-second (running kernel) 1 second No (auto-shutdown capable)
Training Job Per-second (job duration) 1 second No (job-scoped)
Processing Job Per-second (job duration) 1 second No (job-scoped)
Real-Time Endpoint Per-second (InService) 1 second Yes, 24/7
Serverless Endpoint Per-request + memory-second None (pay per use) No
Async Endpoint Per-second (InService) 1 second Yes (but can scale to 0)
Batch Transform Per-second (job duration) 1 second No (job-scoped)
Feature Store Per-read/write + storage None Depends on store type
EBS Storage Per-GB-month $0.116/GB-month Yes, even when stopped
S3 Artifacts Per-GB-month $0.023/GB-month Yes
Data Transfer Out Per-GB $0.09/GB Only on egress

11. The One Metric That Matters Most

If you track only one metric for SageMaker cost efficiency, track this:

$$\text{Cost per 1K Invocations} = \frac{\text{Monthly Endpoint Cost}}{\text{Monthly Invocations} \div 1000}$$

Example:

  • Endpoint: 2× ml.m5.xlarge = $335.80/month
  • Invocations: 500,000/month

$$\frac{\$335.80}{500} = \$0.67 \text{ per 1K invocations}$$

If that number is above $1.00 — you're likely over-provisioned or should consider serverless inference.

If it's above $5.00 — you either have very low traffic (delete the endpoint at night) or you're burning money on GPU instances that aren't needed.

If it's above $20.00 — the endpoint is effectively idle. Delete it.


Wrapping Up

SageMaker billing boils down to three rules:

  1. Endpoints are the #1 cost driver — they run 24/7. Everything else is transient.
  2. If it's InService, you're paying — notebooks, endpoints, anything with that status.
  3. The bill you expect (training) is rarely the bill you get (inference) — teams optimize training time but ignore endpoint sprawl.

The engineers who keep SageMaker costs under control aren't the ones who pick the cheapest instance type. They're the ones who have a process for deleting things they're not using.

# The best SageMaker cost optimization is a cron job
# Run weekly: find and report idle SageMaker resources

echo "=== Idle Notebook Instances ==="
aws sagemaker list-notebook-instances --status-equals InService \
  --query 'NotebookInstances[].NotebookInstanceName' --output table

echo "=== Idle Endpoints (0 invocations, 7d) ==="
for ep in $(aws sagemaker list-endpoints --status-equals InService \
  --query 'Endpoints[].EndpointName' --output text); do
  inv=$(aws cloudwatch get-metric-statistics \
    --namespace AWS/SageMaker --metric-name Invocations \
    --dimensions Name=EndpointName,Value="$ep" Name=VariantName,Value=AllTraffic \
    --start-time $(date -u -v-7d +%Y-%m-%dT%H:%M:%S) \
    --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
    --period 604800 --statistics Sum \
    --query 'Datapoints[0].Sum' --output text 2>/dev/null)
  [[ "$inv" == "None" || "$inv" == "0.0" ]] && echo "  ⚠️  $ep"
done
Enter fullscreen mode Exit fullscreen mode

Building something to automate this? We built CloudWise to automatically detect idle SageMaker notebooks, endpoints, and oversized instances across all your AWS accounts — including air-gapped environments with no internet access. It's one of 90+ waste detectors that scan your infrastructure so you don't have to run scripts manually.


Found this useful? Drop a 🔖 bookmark — this is the reference I wish I had when I first got a surprise SageMaker bill.

Top comments (0)