Rick Wise

Posted on Feb 26 • Edited on Mar 30 • Originally published at cloudcostwise.io

How SageMaker Actually Bills: A Breakdown for Engineers

#aws #machinelearning #devops #tutorial

You deployed a SageMaker notebook to prototype a model. A week later, your AWS bill has a $280 line item you can't explain.

Sound familiar?

SageMaker is one of the most powerful ML platforms on AWS — and one of the most confusing to bill for. Unlike EC2 (one instance, one hourly rate), SageMaker has at least 12 independent billing dimensions spread across notebooks, training, endpoints, storage, data processing, and more. Each one ticks on its own meter.

This post breaks down every SageMaker billing component, shows you the real numbers, and highlights the traps that catch even experienced AWS engineers.

The Core Mental Model: SageMaker Is Not One Service

Think of SageMaker as a collection of services that share a console. Each has its own pricing:

┌─────────────────────────────────────────────────┐
│                 SageMaker                       │
│                                                 │
│  ┌──────────┐  ┌──────────┐  ┌───────────────┐  │
│  │ Notebooks│  │ Training │  │   Endpoints   │  │
│  │ (Dev)    │  │ (Build)  │  │   (Serve)     │  │
│  │ $/hr     │  │ $/hr     │  │   $/hr 24/7   │  │
│  └──────────┘  └──────────┘  └───────────────┘  │
│                                                 │
│  ┌──────────┐  ┌──────────┐  ┌───────────────┐  │
│  │Processing│  │ Storage  │  │  Data Wrangler│  │
│  │ (ETL)    │  │ (EBS+S3) │  │   (Prep)      │  │
│  │ $/hr     │  │ $/GB-mo  │  │   $/hr        │  │
│  └──────────┘  └──────────┘  └───────────────┘  │
│                                                 │
│  ┌──────────┐  ┌──────────┐  ┌───────────────┐  │
│  │ Canvas   │  │ Feature  │  │  Inference    │  │
│  │ (No-code)│  │ Store    │  │  Recommender  │  │
│  │ $/hr     │  │ $/GB+req │  │  (load test)  │  │
│  └──────────┘  └──────────┘  └───────────────┘  │
└─────────────────────────────────────────────────┘

Each box bills independently. You can have zero training cost but be paying hundreds for an idle endpoint. Let's walk through each.

1. Notebook Instances: The Silent $37/month Drain

How it charges: Per-second billing while the notebook is in InService status. You pay for the instance whether or not you have a kernel running.

Instance Type	$/Hour	Monthly (24/7)
ml.t3.medium	$0.05	$36.50
ml.t3.large	$0.10	$73.00
ml.m5.large	$0.115	$83.95
ml.m5.xlarge	$0.23	$167.90
ml.c5.xlarge	$0.204	$148.92

The trap: Notebooks keep billing even when you close the browser tab. The instance stays InService until you explicitly stop it. There's no auto-stop by default.

# Check for running notebooks right now
aws sagemaker list-notebook-instances \
  --status-equals InService \
  --query 'NotebookInstances[].{Name:NotebookInstanceName,Type:InstanceType,Created:CreationTime}' \
  --output table

What catches teams off guard:

You spin up a ml.t3.medium to test a concept on Monday
You forget about it Friday
It runs for 4 weekends = 192 extra hours = $9.60 wasted per forgotten instance
Multiply by a team of 5 data scientists doing this regularly = real money

Cost-saving tip: Use SageMaker Studio notebooks with auto-shutdown instead of classic notebook instances. Or set a CloudWatch alarm:

# Alarm if notebook is InService for > 12 hours with no API activity
aws cloudwatch put-metric-alarm \
  --alarm-name "sagemaker-notebook-idle" \
  --namespace "AWS/SageMaker" \
  --metric-name "InvocationsPerInstance" \
  --dimensions Name=NotebookInstanceName,Value=my-notebook \
  --statistic Sum \
  --period 43200 \
  --threshold 0 \
  --comparison-operator LessThanOrEqualToThreshold \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:alert-topic

Notebook Storage (Often Overlooked)

Each notebook instance has an EBS volume (default 5 GB, configurable up to 16 TB). You pay for it even when the notebook is stopped:

Volume Size	$/Month
5 GB (default)	$0.58
50 GB	$5.80
500 GB	$58.00

At $0.116/GB-month (gp2 pricing), a 500 GB volume costs $58/month just sitting there — even while the notebook is stopped.

2. Training Jobs: Pay-Per-Second, But Instance Choice Matters Enormously

How it charges: Per-second billing while the training job runs. No charge when it completes. The clock starts at instance launch and stops at job completion or failure.

Instance Type	$/Hour	Use Case
ml.m5.large	$0.115	Tabular data, small models
ml.m5.xlarge	$0.23	Medium models, preprocessing-heavy
ml.c5.xlarge	$0.204	CPU-bound training (gradient boosting)
ml.p3.2xlarge	$3.825	GPU training (deep learning)
ml.p3.8xlarge	$14.688	Multi-GPU training
ml.p3.16xlarge	$28.152	Distributed deep learning
ml.p4d.24xlarge	$37.688	Large model training (8× A100 GPUs)
ml.g5.xlarge	$1.408	Cost-effective GPU (single NVIDIA A10G)
ml.trn1.2xlarge	$1.3438	AWS Trainium — optimized for training

The trap: GPU instances are eye-wateringly expensive per hour. A single ml.p3.2xlarge training job that takes 24 hours costs $91.80. If your hyperparameter tuning job launches 20 variants in parallel, that's $1,836 in one day.

Training Cost Formula

Training Cost = (instance_count × instance_price_per_second × training_duration_seconds)
              + (storage_gb × $0.116/GB-month × duration_fraction)
              + (data_download_from_s3)

Spot Training: 60-90% Savings (With a Catch)

SageMaker supports managed spot training — using EC2 Spot Instances for training jobs:

estimator = sagemaker.estimator.Estimator(
    # ...
    use_spot_instances=True,
    max_wait=7200,       # Max time to wait for spot capacity
    max_run=3600,        # Max training time
    # checkpoint_s3_uri for spot interruption recovery
    checkpoint_s3_uri='s3://my-bucket/checkpoints/',
)

Savings: Typically 60–90% off On-Demand pricing.

The catch: Spot instances can be interrupted. Your training job gets a 2-minute warning, then terminates. Without checkpointing, you lose all progress and pay for the time already consumed.

Pro tip: Always set checkpoint_s3_uri when using spot training. This saves model checkpoints to S3 so interrupted jobs can resume instead of restarting from scratch.

Managed Warm Pools (New)

If you run many training jobs in sequence (e.g., hyperparameter tuning), each job normally provisions a new instance from scratch (2–5 minutes startup). Warm pools keep instances running between jobs:

You pay for instance time during the keep-alive period
But you skip the ~3 minute cold start per job
Break-even: if you run enough sequential jobs that the saved startup time exceeds the keep-alive cost

3. Real-Time Endpoints: The Big One

This is where most SageMaker overspend happens. Endpoints run 24/7 and bill continuously, even with zero traffic.

How it charges: Per-second billing while the endpoint is InService. You pay for the full instance(s) whether they receive 0 or 10,000 requests per second.

Monthly Endpoint Cost = instance_count × hourly_rate × 730 hours

Instance	$/Hour	Monthly (1 instance)	Monthly (2 instances)
ml.t2.medium	$0.065	$47.45	$94.90
ml.m5.large	$0.115	$83.95	$167.90
ml.m5.xlarge	$0.23	$167.90	$335.80
ml.c5.xlarge	$0.204	$148.92	$297.84
ml.g4dn.xlarge	$0.736	$537.28	$1,074.56
ml.p3.2xlarge	$3.825	$2,792.25	$5,584.50
ml.inf1.xlarge	$0.297	$216.81	$433.62

Read that again: A single ml.p3.2xlarge endpoint costs $2,792/month. Two instances for high availability: $5,585/month. Many teams deploy this, see it works, and forget to right-size.

The Idle Endpoint Problem

A SageMaker endpoint with zero invocations still costs the full hourly rate. Common scenarios:

Model was deployed for a demo → demo ended → endpoint left running
A/B testing: old variant endpoint wasn't deleted after the new model won
Dev/staging endpoints running 24/7 when they're only used during business hours

# Find endpoints with zero invocations in the last 7 days
for endpoint in $(aws sagemaker list-endpoints --status-equals InService \
  --query 'Endpoints[].EndpointName' --output text); do

  invocations=$(aws cloudwatch get-metric-statistics \
    --namespace AWS/SageMaker \
    --metric-name Invocations \
    --dimensions Name=EndpointName,Value="$endpoint" Name=VariantName,Value=AllTraffic \
    --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%S) \
    --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
    --period 604800 \
    --statistics Sum \
    --query 'Datapoints[0].Sum' --output text 2>/dev/null)

  if [ "$invocations" = "None" ] || [ "$invocations" = "0.0" ]; then
    echo "⚠️  IDLE: $endpoint (0 invocations in 7 days)"
  fi
done

Multi-Model Endpoints: Pack More Models, Pay Less

If you have many low-traffic models, a Multi-Model Endpoint (MME) lets you load models on-demand into a single endpoint:

Standard:  10 models × ml.m5.large × 730 hrs = $839.50/month
MME:       1 endpoint × ml.m5.xlarge × 730 hrs = $167.90/month
                                      Savings:   $671.60/month (80%)

The tradeoff: cold-start latency when loading a model that isn't cached. Fine for batch-like traffic; bad for latency-sensitive real-time inference.

Serverless Inference: Pay Only for What You Use

For sporadic traffic (< ~1000 requests/hour), Serverless Inference eliminates the always-on cost:

Pricing:
  - Memory: $0.0000016/GB-second
  - Requests: included

Example: 1000 requests/day, 500ms avg, 4GB memory
  = 1000 × 0.5s × 4GB × $0.0000016/GB-s × 30 days
  = $0.096/month  ← vs $83.95/month for ml.m5.large

The catch: Cold starts (30s–2min for first invocation after idle period) and max 6 MB payload.

4. Storage: Three Hidden Meters

SageMaker storage costs come from three independent sources:

a) Notebook EBS Volumes

Already covered above: $0.116/GB-month, billed even when notebook is stopped.

b) Training Job Storage

Each training job gets a temporary EBS volume for input data and model artifacts:

Default: 30 GB per instance
Configurable up to 16 TB
Billed only during training (per-second)
SSD (gp2): $0.116/GB-month, prorated to seconds

c) Model Artifacts in S3

Trained models are stored in S3 as .tar.gz archives:

S3 Standard: $0.023/GB-month
A 5 GB model × 20 training runs = 100 GB = $2.30/month
But: large language model checkpoints can be 50–200 GB each
10 checkpoints × 100 GB = 1 TB = $23/month

Pro tip: Set an S3 Lifecycle policy to move old model artifacts to S3 Glacier after 30 days:

{
  "Rules": [{
    "ID": "Archive old SageMaker models",
    "Filter": { "Prefix": "sagemaker/" },
    "Status": "Enabled",
    "Transitions": [{
      "Days": 30,
      "StorageClass": "GLACIER"
    }]
  }]
}

5. Processing Jobs (ETL/Feature Engineering)

SageMaker Processing runs containerized data processing workloads:

How it charges: Same as training — per-second billing for the instances used.

Instance	$/Hour	1-hour ETL job	8-hour daily ETL (monthly)
ml.m5.xlarge	$0.23	$0.23	$55.20
ml.m5.4xlarge	$0.922	$0.92	$221.28
ml.r5.4xlarge	$1.21	$1.21	$290.40

The trap: Processing jobs often run as part of a pipeline. If your pipeline runs daily with 4 instances for 3 hours:

4 instances × ml.m5.xlarge × $0.23/hr × 3 hrs × 30 days = $82.80/month

That's not huge — but if someone accidentally sets the pipeline to run hourly instead of daily:

4 × $0.23 × 3 × 24 × 30 = $1,987.20/month  ← oops

6. SageMaker Savings Plans

AWS offers SageMaker Savings Plans — commit to a $/hour spend for 1 or 3 years in exchange for a discount:

Commitment	Discount vs On-Demand
1-year, no upfront	~20%
1-year, partial upfront	~27%
1-year, all upfront	~30%
3-year, all upfront	~64%

What's covered: Notebook instances, Studio notebooks, training, processing, batch transform, real-time inference, and serverless inference.

What's NOT covered: Data transfer, S3 storage, EBS storage, CloudWatch, and any non-SageMaker charges.

Break-even: If you consistently spend > $100/month on SageMaker compute, a 1-year no-upfront plan likely saves you money. The commitment is dollar-based (e.g., "$0.50/hour"), not instance-based — so you can shift between instance types.

7. Data Transfer: The Other Hidden Cost

SageMaker data transfer charges are identical to EC2 data transfer:

Path	Cost
S3 → SageMaker (same region)	Free
SageMaker → S3 (same region)	Free
Internet → SageMaker	Free
SageMaker → Internet	$0.09/GB (first 10 TB)
Cross-region S3 → SageMaker	$0.01–0.02/GB
Cross-AZ (multi-instance training)	$0.01/GB each way

The trap: Distributed training across multiple instances in different AZs generates inter-AZ data transfer charges for gradient synchronization. A training job with 8 ml.p3.16xlarge instances exchanging 100 GB of gradients per hour across AZs can add $2/hour in data transfer alone.

Mitigation: Use SageMaker's managed instance placement (it tries to co-locate instances in the same AZ). For distributed training, consider EFA (Elastic Fabric Adapter) enabled instances (ml.p4d.24xlarge, ml.trn1.32xlarge) — inter-node traffic over EFA is not charged.

8. The Full Bill Breakdown — A Realistic Example

Let's walk through a realistic monthly SageMaker bill for a mid-size ML team (3 data scientists, 2 models in production):

Development

Item	Details	Monthly Cost
3 Notebook instances	ml.m5.large, ~160 hrs/month each	$55.20
Notebook storage	3 × 50 GB	$17.40
Subtotal		$72.60

Training

Item	Details	Monthly Cost
Training jobs (CPU)	20 jobs × ml.m5.xlarge × 2 hrs	$9.20
Training jobs (GPU)	5 jobs × ml.g5.xlarge × 4 hrs	$28.16
HPO tuning	1 job × 50 trials × ml.g5.xlarge × 1 hr	$70.40
Training storage	20 GB per job, 25 jobs	$0.14
Subtotal		$107.90

Inference (The Biggest Line Item)

Item	Details	Monthly Cost
Prod endpoint (Model A)	2× ml.m5.xlarge × 730 hrs	$335.80
Prod endpoint (Model B)	2× ml.g4dn.xlarge × 730 hrs	$1,074.56
Staging endpoint	1× ml.m5.large × 730 hrs	$83.95
Subtotal		$1,494.31

Other

Item	Details	Monthly Cost
Processing (daily ETL)	2× ml.m5.xlarge × 1 hr × 30 days	$13.80
Model artifacts in S3	200 GB across all experiments	$4.60
Data transfer (internet)	50 GB model serving responses	$4.50
CloudWatch (metrics)	Custom endpoint metrics	$3.00
Subtotal		$25.90

Total

Development:   $72.60    (4.3%)
Training:      $107.90   (6.3%)
Inference:     $1,494.31 (87.9%)  ← 88% of the bill
Other:         $25.90    (1.5%)
──────────────────────────────────
TOTAL:         $1,700.71/month

The punchline: Nearly 88% of this team's SageMaker spend is inference endpoints running 24/7. The training — which is what the team actually thinks about and optimizes — is only 6% of the bill.

9. The 7 Most Common SageMaker Billing Mistakes

1. Leaving Notebook Instances Running

Cost: $37–$168/month per forgotten notebook

Fix: Use SageMaker Studio with auto-shutdown, or set a lifecycle config:

#!/bin/bash
# Auto-stop notebook after 1 hour of inactivity
IDLE_TIME=3600
if [ $(jupyter notebook list | grep -c "http") -eq 0 ]; then
  aws sagemaker stop-notebook-instance \
    --notebook-instance-name $(cat /opt/ml/metadata/resource-name)
fi

2. Not Deleting Endpoints After Experimentation

Cost: $84–$2,792/month per forgotten endpoint

Fix: Tag endpoints with environment=dev and run a nightly cleanup:

# Delete all dev endpoints older than 3 days
aws sagemaker list-endpoints --status-equals InService \
  --query 'Endpoints[?CreationTime<`2026-02-18`].EndpointName' \
  --output text | xargs -I{} aws sagemaker delete-endpoint --endpoint-name {}

3. Over-Provisioning Instance Types

Cost: 2–10× the necessary spend

Fix: Start with the smallest instance that works. Use CloudWatch to check actual utilization:

# Check CPU utilization of an endpoint
aws cloudwatch get-metric-statistics \
  --namespace /aws/sagemaker/Endpoints \
  --metric-name CPUUtilization \
  --dimensions Name=EndpointName,Value=my-endpoint \
    Name=VariantName,Value=AllTraffic \
  --start-time 2026-02-14T00:00:00 \
  --end-time 2026-02-21T00:00:00 \
  --period 3600 \
  --statistics Average \
  --query 'sort_by(Datapoints, &Timestamp)[].{Time:Timestamp,CPU:Average}'

If average CPU is < 20%, you're likely over-provisioned. An ml.m5.xlarge at 15% utilization could be an ml.m5.large (50% cheaper).

4. Running Staging Endpoints 24/7

Cost: $84–$538/month for endpoints used ~8 hrs/day

Fix: Schedule endpoint creation/deletion:

# Scale staging endpoint to 0 instances at 7 PM, back to 1 at 8 AM
import boto3, json

application_autoscaling = boto3.client('application-autoscaling')

# Register the endpoint as a scalable target
application_autoscaling.register_scalable_target(
    ServiceNamespace='sagemaker',
    ResourceId='endpoint/staging-model/variant/AllTraffic',
    ScalableDimension='sagemaker:variant:DesiredInstanceCount',
    MinCapacity=0,
    MaxCapacity=2,
)

# Scale to 0 at 7 PM UTC
application_autoscaling.put_scheduled_action(
    ServiceNamespace='sagemaker',
    ScheduledActionName='scale-down-evening',
    ResourceId='endpoint/staging-model/variant/AllTraffic',
    ScalableDimension='sagemaker:variant:DesiredInstanceCount',
    Schedule='cron(0 19 * * ? *)',
    ScalableTargetAction={'MinCapacity': 0, 'MaxCapacity': 0},
)

5. Not Using Spot for Training

Cost: 2–10× overpayment on training jobs

Fix: Add use_spot_instances=True + checkpoint_s3_uri to every training estimator.

6. Ignoring Multi-Model Endpoints for Low-Traffic Models

Cost: $84+/month per model × N models

Fix: Consolidate into a single MME. Works well for models with < 100 requests/hour.

7. No SageMaker Savings Plan

Cost: 20–64% overpayment on steady-state compute

Fix: Analyze 30 days of SageMaker usage → commit to a 1-year no-upfront Savings Plan for your baseline spend.

10. Quick-Reference Billing Cheat Sheet

Component	Billing Model	Minimum Charge	Always On?
Notebook Instance	Per-second (InService)	1 second	Yes, until stopped
Studio Notebook	Per-second (running kernel)	1 second	No (auto-shutdown capable)
Training Job	Per-second (job duration)	1 second	No (job-scoped)
Processing Job	Per-second (job duration)	1 second	No (job-scoped)
Real-Time Endpoint	Per-second (InService)	1 second	Yes, 24/7
Serverless Endpoint	Per-request + memory-second	None (pay per use)	No
Async Endpoint	Per-second (InService)	1 second	Yes (but can scale to 0)
Batch Transform	Per-second (job duration)	1 second	No (job-scoped)
Feature Store	Per-read/write + storage	None	Depends on store type
EBS Storage	Per-GB-month	$0.116/GB-month	Yes, even when stopped
S3 Artifacts	Per-GB-month	$0.023/GB-month	Yes
Data Transfer Out	Per-GB	$0.09/GB	Only on egress

11. The One Metric That Matters Most

If you track only one metric for SageMaker cost efficiency, track this:

$$\text{Cost per 1K Invocations} = \frac{\text{Monthly Endpoint Cost}}{\text{Monthly Invocations} \div 1000}$$

Example:

Endpoint: 2× ml.m5.xlarge = $335.80/month
Invocations: 500,000/month

$$\frac{\$335.80}{500} = \$0.67 \text{ per 1K invocations}$$

If that number is above $1.00 — you're likely over-provisioned or should consider serverless inference.

If it's above $5.00 — you either have very low traffic (delete the endpoint at night) or you're burning money on GPU instances that aren't needed.

If it's above $20.00 — the endpoint is effectively idle. Delete it.

Wrapping Up

SageMaker billing boils down to three rules:

Endpoints are the #1 cost driver — they run 24/7. Everything else is transient.
If it's InService, you're paying — notebooks, endpoints, anything with that status.
The bill you expect (training) is rarely the bill you get (inference) — teams optimize training time but ignore endpoint sprawl.

The engineers who keep SageMaker costs under control aren't the ones who pick the cheapest instance type. They're the ones who have a process for deleting things they're not using.

# The best SageMaker cost optimization is a cron job
# Run weekly: find and report idle SageMaker resources

echo "=== Idle Notebook Instances ==="
aws sagemaker list-notebook-instances --status-equals InService \
  --query 'NotebookInstances[].NotebookInstanceName' --output table

echo "=== Idle Endpoints (0 invocations, 7d) ==="
for ep in $(aws sagemaker list-endpoints --status-equals InService \
  --query 'Endpoints[].EndpointName' --output text); do
  inv=$(aws cloudwatch get-metric-statistics \
    --namespace AWS/SageMaker --metric-name Invocations \
    --dimensions Name=EndpointName,Value="$ep" Name=VariantName,Value=AllTraffic \
    --start-time $(date -u -v-7d +%Y-%m-%dT%H:%M:%S) \
    --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
    --period 604800 --statistics Sum \
    --query 'Datapoints[0].Sum' --output text 2>/dev/null)
  [[ "$inv" == "None" || "$inv" == "0.0" ]] && echo "  ⚠️  $ep"
done

Building something to automate this? We built CloudWise to automatically detect idle SageMaker notebooks, endpoints, and oversized instances across all your AWS accounts — including air-gapped environments with no internet access. It's one of 90+ waste detectors that scan your infrastructure so you don't have to run scripts manually.

Found this useful? Drop a 🔖 bookmark — this is the reference I wish I had when I first got a surprise SageMaker bill.

DEV Community