You deployed a SageMaker notebook to prototype a model. A week later, your AWS bill has a $280 line item you can't explain.
Sound familiar?
SageMaker is one of the most powerful ML platforms on AWS — and one of the most confusing to bill for. Unlike EC2 (one instance, one hourly rate), SageMaker has at least 12 independent billing dimensions spread across notebooks, training, endpoints, storage, data processing, and more. Each one ticks on its own meter.
This post breaks down every SageMaker billing component, shows you the real numbers, and highlights the traps that catch even experienced AWS engineers.
The Core Mental Model: SageMaker Is Not One Service
Think of SageMaker as a collection of services that share a console. Each has its own pricing:
┌─────────────────────────────────────────────────┐
│ SageMaker │
│ │
│ ┌──────────┐ ┌──────────┐ ┌───────────────┐ │
│ │ Notebooks│ │ Training │ │ Endpoints │ │
│ │ (Dev) │ │ (Build) │ │ (Serve) │ │
│ │ $/hr │ │ $/hr │ │ $/hr 24/7 │ │
│ └──────────┘ └──────────┘ └───────────────┘ │
│ │
│ ┌──────────┐ ┌──────────┐ ┌───────────────┐ │
│ │Processing│ │ Storage │ │ Data Wrangler│ │
│ │ (ETL) │ │ (EBS+S3) │ │ (Prep) │ │
│ │ $/hr │ │ $/GB-mo │ │ $/hr │ │
│ └──────────┘ └──────────┘ └───────────────┘ │
│ │
│ ┌──────────┐ ┌──────────┐ ┌───────────────┐ │
│ │ Canvas │ │ Feature │ │ Inference │ │
│ │ (No-code)│ │ Store │ │ Recommender │ │
│ │ $/hr │ │ $/GB+req │ │ (load test) │ │
│ └──────────┘ └──────────┘ └───────────────┘ │
└─────────────────────────────────────────────────┘
Each box bills independently. You can have zero training cost but be paying hundreds for an idle endpoint. Let's walk through each.
1. Notebook Instances: The Silent $37/month Drain
How it charges: Per-second billing while the notebook is in InService status. You pay for the instance whether or not you have a kernel running.
| Instance Type | $/Hour | Monthly (24/7) |
|---|---|---|
| ml.t3.medium | $0.05 | $36.50 |
| ml.t3.large | $0.10 | $73.00 |
| ml.m5.large | $0.115 | $83.95 |
| ml.m5.xlarge | $0.23 | $167.90 |
| ml.c5.xlarge | $0.204 | $148.92 |
The trap: Notebooks keep billing even when you close the browser tab. The instance stays InService until you explicitly stop it. There's no auto-stop by default.
# Check for running notebooks right now
aws sagemaker list-notebook-instances \
--status-equals InService \
--query 'NotebookInstances[].{Name:NotebookInstanceName,Type:InstanceType,Created:CreationTime}' \
--output table
What catches teams off guard:
- You spin up a
ml.t3.mediumto test a concept on Monday - You forget about it Friday
- It runs for 4 weekends = 192 extra hours = $9.60 wasted per forgotten instance
- Multiply by a team of 5 data scientists doing this regularly = real money
Cost-saving tip: Use SageMaker Studio notebooks with auto-shutdown instead of classic notebook instances. Or set a CloudWatch alarm:
# Alarm if notebook is InService for > 12 hours with no API activity
aws cloudwatch put-metric-alarm \
--alarm-name "sagemaker-notebook-idle" \
--namespace "AWS/SageMaker" \
--metric-name "InvocationsPerInstance" \
--dimensions Name=NotebookInstanceName,Value=my-notebook \
--statistic Sum \
--period 43200 \
--threshold 0 \
--comparison-operator LessThanOrEqualToThreshold \
--evaluation-periods 1 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:alert-topic
Notebook Storage (Often Overlooked)
Each notebook instance has an EBS volume (default 5 GB, configurable up to 16 TB). You pay for it even when the notebook is stopped:
| Volume Size | $/Month |
|---|---|
| 5 GB (default) | $0.58 |
| 50 GB | $5.80 |
| 500 GB | $58.00 |
At $0.116/GB-month (gp2 pricing), a 500 GB volume costs $58/month just sitting there — even while the notebook is stopped.
2. Training Jobs: Pay-Per-Second, But Instance Choice Matters Enormously
How it charges: Per-second billing while the training job runs. No charge when it completes. The clock starts at instance launch and stops at job completion or failure.
| Instance Type | $/Hour | Use Case |
|---|---|---|
| ml.m5.large | $0.115 | Tabular data, small models |
| ml.m5.xlarge | $0.23 | Medium models, preprocessing-heavy |
| ml.c5.xlarge | $0.204 | CPU-bound training (gradient boosting) |
| ml.p3.2xlarge | $3.825 | GPU training (deep learning) |
| ml.p3.8xlarge | $14.688 | Multi-GPU training |
| ml.p3.16xlarge | $28.152 | Distributed deep learning |
| ml.p4d.24xlarge | $37.688 | Large model training (8× A100 GPUs) |
| ml.g5.xlarge | $1.408 | Cost-effective GPU (single NVIDIA A10G) |
| ml.trn1.2xlarge | $1.3438 | AWS Trainium — optimized for training |
The trap: GPU instances are eye-wateringly expensive per hour. A single ml.p3.2xlarge training job that takes 24 hours costs $91.80. If your hyperparameter tuning job launches 20 variants in parallel, that's $1,836 in one day.
Training Cost Formula
Training Cost = (instance_count × instance_price_per_second × training_duration_seconds)
+ (storage_gb × $0.116/GB-month × duration_fraction)
+ (data_download_from_s3)
Spot Training: 60-90% Savings (With a Catch)
SageMaker supports managed spot training — using EC2 Spot Instances for training jobs:
estimator = sagemaker.estimator.Estimator(
# ...
use_spot_instances=True,
max_wait=7200, # Max time to wait for spot capacity
max_run=3600, # Max training time
# checkpoint_s3_uri for spot interruption recovery
checkpoint_s3_uri='s3://my-bucket/checkpoints/',
)
Savings: Typically 60–90% off On-Demand pricing.
The catch: Spot instances can be interrupted. Your training job gets a 2-minute warning, then terminates. Without checkpointing, you lose all progress and pay for the time already consumed.
Pro tip: Always set checkpoint_s3_uri when using spot training. This saves model checkpoints to S3 so interrupted jobs can resume instead of restarting from scratch.
Managed Warm Pools (New)
If you run many training jobs in sequence (e.g., hyperparameter tuning), each job normally provisions a new instance from scratch (2–5 minutes startup). Warm pools keep instances running between jobs:
- You pay for instance time during the keep-alive period
- But you skip the ~3 minute cold start per job
- Break-even: if you run enough sequential jobs that the saved startup time exceeds the keep-alive cost
3. Real-Time Endpoints: The Big One
This is where most SageMaker overspend happens. Endpoints run 24/7 and bill continuously, even with zero traffic.
How it charges: Per-second billing while the endpoint is InService. You pay for the full instance(s) whether they receive 0 or 10,000 requests per second.
Monthly Endpoint Cost = instance_count × hourly_rate × 730 hours
| Instance | $/Hour | Monthly (1 instance) | Monthly (2 instances) |
|---|---|---|---|
| ml.t2.medium | $0.065 | $47.45 | $94.90 |
| ml.m5.large | $0.115 | $83.95 | $167.90 |
| ml.m5.xlarge | $0.23 | $167.90 | $335.80 |
| ml.c5.xlarge | $0.204 | $148.92 | $297.84 |
| ml.g4dn.xlarge | $0.736 | $537.28 | $1,074.56 |
| ml.p3.2xlarge | $3.825 | $2,792.25 | $5,584.50 |
| ml.inf1.xlarge | $0.297 | $216.81 | $433.62 |
Read that again: A single ml.p3.2xlarge endpoint costs $2,792/month. Two instances for high availability: $5,585/month. Many teams deploy this, see it works, and forget to right-size.
The Idle Endpoint Problem
A SageMaker endpoint with zero invocations still costs the full hourly rate. Common scenarios:
- Model was deployed for a demo → demo ended → endpoint left running
- A/B testing: old variant endpoint wasn't deleted after the new model won
- Dev/staging endpoints running 24/7 when they're only used during business hours
# Find endpoints with zero invocations in the last 7 days
for endpoint in $(aws sagemaker list-endpoints --status-equals InService \
--query 'Endpoints[].EndpointName' --output text); do
invocations=$(aws cloudwatch get-metric-statistics \
--namespace AWS/SageMaker \
--metric-name Invocations \
--dimensions Name=EndpointName,Value="$endpoint" Name=VariantName,Value=AllTraffic \
--start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 604800 \
--statistics Sum \
--query 'Datapoints[0].Sum' --output text 2>/dev/null)
if [ "$invocations" = "None" ] || [ "$invocations" = "0.0" ]; then
echo "⚠️ IDLE: $endpoint (0 invocations in 7 days)"
fi
done
Multi-Model Endpoints: Pack More Models, Pay Less
If you have many low-traffic models, a Multi-Model Endpoint (MME) lets you load models on-demand into a single endpoint:
Standard: 10 models × ml.m5.large × 730 hrs = $839.50/month
MME: 1 endpoint × ml.m5.xlarge × 730 hrs = $167.90/month
Savings: $671.60/month (80%)
The tradeoff: cold-start latency when loading a model that isn't cached. Fine for batch-like traffic; bad for latency-sensitive real-time inference.
Serverless Inference: Pay Only for What You Use
For sporadic traffic (< ~1000 requests/hour), Serverless Inference eliminates the always-on cost:
Pricing:
- Memory: $0.0000016/GB-second
- Requests: included
Example: 1000 requests/day, 500ms avg, 4GB memory
= 1000 × 0.5s × 4GB × $0.0000016/GB-s × 30 days
= $0.096/month ← vs $83.95/month for ml.m5.large
The catch: Cold starts (30s–2min for first invocation after idle period) and max 6 MB payload.
4. Storage: Three Hidden Meters
SageMaker storage costs come from three independent sources:
a) Notebook EBS Volumes
Already covered above: $0.116/GB-month, billed even when notebook is stopped.
b) Training Job Storage
Each training job gets a temporary EBS volume for input data and model artifacts:
- Default: 30 GB per instance
- Configurable up to 16 TB
- Billed only during training (per-second)
- SSD (gp2): $0.116/GB-month, prorated to seconds
c) Model Artifacts in S3
Trained models are stored in S3 as .tar.gz archives:
- S3 Standard: $0.023/GB-month
- A 5 GB model × 20 training runs = 100 GB = $2.30/month
- But: large language model checkpoints can be 50–200 GB each
- 10 checkpoints × 100 GB = 1 TB = $23/month
Pro tip: Set an S3 Lifecycle policy to move old model artifacts to S3 Glacier after 30 days:
{
"Rules": [{
"ID": "Archive old SageMaker models",
"Filter": { "Prefix": "sagemaker/" },
"Status": "Enabled",
"Transitions": [{
"Days": 30,
"StorageClass": "GLACIER"
}]
}]
}
5. Processing Jobs (ETL/Feature Engineering)
SageMaker Processing runs containerized data processing workloads:
How it charges: Same as training — per-second billing for the instances used.
| Instance | $/Hour | 1-hour ETL job | 8-hour daily ETL (monthly) |
|---|---|---|---|
| ml.m5.xlarge | $0.23 | $0.23 | $55.20 |
| ml.m5.4xlarge | $0.922 | $0.92 | $221.28 |
| ml.r5.4xlarge | $1.21 | $1.21 | $290.40 |
The trap: Processing jobs often run as part of a pipeline. If your pipeline runs daily with 4 instances for 3 hours:
4 instances × ml.m5.xlarge × $0.23/hr × 3 hrs × 30 days = $82.80/month
That's not huge — but if someone accidentally sets the pipeline to run hourly instead of daily:
4 × $0.23 × 3 × 24 × 30 = $1,987.20/month ← oops
6. SageMaker Savings Plans
AWS offers SageMaker Savings Plans — commit to a $/hour spend for 1 or 3 years in exchange for a discount:
| Commitment | Discount vs On-Demand |
|---|---|
| 1-year, no upfront | ~20% |
| 1-year, partial upfront | ~27% |
| 1-year, all upfront | ~30% |
| 3-year, all upfront | ~64% |
What's covered: Notebook instances, Studio notebooks, training, processing, batch transform, real-time inference, and serverless inference.
What's NOT covered: Data transfer, S3 storage, EBS storage, CloudWatch, and any non-SageMaker charges.
Break-even: If you consistently spend > $100/month on SageMaker compute, a 1-year no-upfront plan likely saves you money. The commitment is dollar-based (e.g., "$0.50/hour"), not instance-based — so you can shift between instance types.
7. Data Transfer: The Other Hidden Cost
SageMaker data transfer charges are identical to EC2 data transfer:
| Path | Cost |
|---|---|
| S3 → SageMaker (same region) | Free |
| SageMaker → S3 (same region) | Free |
| Internet → SageMaker | Free |
| SageMaker → Internet | $0.09/GB (first 10 TB) |
| Cross-region S3 → SageMaker | $0.01–0.02/GB |
| Cross-AZ (multi-instance training) | $0.01/GB each way |
The trap: Distributed training across multiple instances in different AZs generates inter-AZ data transfer charges for gradient synchronization. A training job with 8 ml.p3.16xlarge instances exchanging 100 GB of gradients per hour across AZs can add $2/hour in data transfer alone.
Mitigation: Use SageMaker's managed instance placement (it tries to co-locate instances in the same AZ). For distributed training, consider EFA (Elastic Fabric Adapter) enabled instances (ml.p4d.24xlarge, ml.trn1.32xlarge) — inter-node traffic over EFA is not charged.
8. The Full Bill Breakdown — A Realistic Example
Let's walk through a realistic monthly SageMaker bill for a mid-size ML team (3 data scientists, 2 models in production):
Development
| Item | Details | Monthly Cost |
|---|---|---|
| 3 Notebook instances | ml.m5.large, ~160 hrs/month each | $55.20 |
| Notebook storage | 3 × 50 GB | $17.40 |
| Subtotal | $72.60 |
Training
| Item | Details | Monthly Cost |
|---|---|---|
| Training jobs (CPU) | 20 jobs × ml.m5.xlarge × 2 hrs | $9.20 |
| Training jobs (GPU) | 5 jobs × ml.g5.xlarge × 4 hrs | $28.16 |
| HPO tuning | 1 job × 50 trials × ml.g5.xlarge × 1 hr | $70.40 |
| Training storage | 20 GB per job, 25 jobs | $0.14 |
| Subtotal | $107.90 |
Inference (The Biggest Line Item)
| Item | Details | Monthly Cost |
|---|---|---|
| Prod endpoint (Model A) | 2× ml.m5.xlarge × 730 hrs | $335.80 |
| Prod endpoint (Model B) | 2× ml.g4dn.xlarge × 730 hrs | $1,074.56 |
| Staging endpoint | 1× ml.m5.large × 730 hrs | $83.95 |
| Subtotal | $1,494.31 |
Other
| Item | Details | Monthly Cost |
|---|---|---|
| Processing (daily ETL) | 2× ml.m5.xlarge × 1 hr × 30 days | $13.80 |
| Model artifacts in S3 | 200 GB across all experiments | $4.60 |
| Data transfer (internet) | 50 GB model serving responses | $4.50 |
| CloudWatch (metrics) | Custom endpoint metrics | $3.00 |
| Subtotal | $25.90 |
Total
Development: $72.60 (4.3%)
Training: $107.90 (6.3%)
Inference: $1,494.31 (87.9%) ← 88% of the bill
Other: $25.90 (1.5%)
──────────────────────────────────
TOTAL: $1,700.71/month
The punchline: Nearly 88% of this team's SageMaker spend is inference endpoints running 24/7. The training — which is what the team actually thinks about and optimizes — is only 6% of the bill.
9. The 7 Most Common SageMaker Billing Mistakes
1. Leaving Notebook Instances Running
Cost: $37–$168/month per forgotten notebook
Fix: Use SageMaker Studio with auto-shutdown, or set a lifecycle config:
#!/bin/bash
# Auto-stop notebook after 1 hour of inactivity
IDLE_TIME=3600
if [ $(jupyter notebook list | grep -c "http") -eq 0 ]; then
aws sagemaker stop-notebook-instance \
--notebook-instance-name $(cat /opt/ml/metadata/resource-name)
fi
2. Not Deleting Endpoints After Experimentation
Cost: $84–$2,792/month per forgotten endpoint
Fix: Tag endpoints with environment=dev and run a nightly cleanup:
# Delete all dev endpoints older than 3 days
aws sagemaker list-endpoints --status-equals InService \
--query 'Endpoints[?CreationTime<`2026-02-18`].EndpointName' \
--output text | xargs -I{} aws sagemaker delete-endpoint --endpoint-name {}
3. Over-Provisioning Instance Types
Cost: 2–10× the necessary spend
Fix: Start with the smallest instance that works. Use CloudWatch to check actual utilization:
# Check CPU utilization of an endpoint
aws cloudwatch get-metric-statistics \
--namespace /aws/sagemaker/Endpoints \
--metric-name CPUUtilization \
--dimensions Name=EndpointName,Value=my-endpoint \
Name=VariantName,Value=AllTraffic \
--start-time 2026-02-14T00:00:00 \
--end-time 2026-02-21T00:00:00 \
--period 3600 \
--statistics Average \
--query 'sort_by(Datapoints, &Timestamp)[].{Time:Timestamp,CPU:Average}'
If average CPU is < 20%, you're likely over-provisioned. An ml.m5.xlarge at 15% utilization could be an ml.m5.large (50% cheaper).
4. Running Staging Endpoints 24/7
Cost: $84–$538/month for endpoints used ~8 hrs/day
Fix: Schedule endpoint creation/deletion:
# Scale staging endpoint to 0 instances at 7 PM, back to 1 at 8 AM
import boto3, json
application_autoscaling = boto3.client('application-autoscaling')
# Register the endpoint as a scalable target
application_autoscaling.register_scalable_target(
ServiceNamespace='sagemaker',
ResourceId='endpoint/staging-model/variant/AllTraffic',
ScalableDimension='sagemaker:variant:DesiredInstanceCount',
MinCapacity=0,
MaxCapacity=2,
)
# Scale to 0 at 7 PM UTC
application_autoscaling.put_scheduled_action(
ServiceNamespace='sagemaker',
ScheduledActionName='scale-down-evening',
ResourceId='endpoint/staging-model/variant/AllTraffic',
ScalableDimension='sagemaker:variant:DesiredInstanceCount',
Schedule='cron(0 19 * * ? *)',
ScalableTargetAction={'MinCapacity': 0, 'MaxCapacity': 0},
)
5. Not Using Spot for Training
Cost: 2–10× overpayment on training jobs
Fix: Add use_spot_instances=True + checkpoint_s3_uri to every training estimator.
6. Ignoring Multi-Model Endpoints for Low-Traffic Models
Cost: $84+/month per model × N models
Fix: Consolidate into a single MME. Works well for models with < 100 requests/hour.
7. No SageMaker Savings Plan
Cost: 20–64% overpayment on steady-state compute
Fix: Analyze 30 days of SageMaker usage → commit to a 1-year no-upfront Savings Plan for your baseline spend.
10. Quick-Reference Billing Cheat Sheet
| Component | Billing Model | Minimum Charge | Always On? |
|---|---|---|---|
| Notebook Instance | Per-second (InService) | 1 second | Yes, until stopped |
| Studio Notebook | Per-second (running kernel) | 1 second | No (auto-shutdown capable) |
| Training Job | Per-second (job duration) | 1 second | No (job-scoped) |
| Processing Job | Per-second (job duration) | 1 second | No (job-scoped) |
| Real-Time Endpoint | Per-second (InService) | 1 second | Yes, 24/7 |
| Serverless Endpoint | Per-request + memory-second | None (pay per use) | No |
| Async Endpoint | Per-second (InService) | 1 second | Yes (but can scale to 0) |
| Batch Transform | Per-second (job duration) | 1 second | No (job-scoped) |
| Feature Store | Per-read/write + storage | None | Depends on store type |
| EBS Storage | Per-GB-month | $0.116/GB-month | Yes, even when stopped |
| S3 Artifacts | Per-GB-month | $0.023/GB-month | Yes |
| Data Transfer Out | Per-GB | $0.09/GB | Only on egress |
11. The One Metric That Matters Most
If you track only one metric for SageMaker cost efficiency, track this:
$$\text{Cost per 1K Invocations} = \frac{\text{Monthly Endpoint Cost}}{\text{Monthly Invocations} \div 1000}$$
Example:
- Endpoint: 2×
ml.m5.xlarge= $335.80/month - Invocations: 500,000/month
$$\frac{\$335.80}{500} = \$0.67 \text{ per 1K invocations}$$
If that number is above $1.00 — you're likely over-provisioned or should consider serverless inference.
If it's above $5.00 — you either have very low traffic (delete the endpoint at night) or you're burning money on GPU instances that aren't needed.
If it's above $20.00 — the endpoint is effectively idle. Delete it.
Wrapping Up
SageMaker billing boils down to three rules:
- Endpoints are the #1 cost driver — they run 24/7. Everything else is transient.
-
If it's
InService, you're paying — notebooks, endpoints, anything with that status. - The bill you expect (training) is rarely the bill you get (inference) — teams optimize training time but ignore endpoint sprawl.
The engineers who keep SageMaker costs under control aren't the ones who pick the cheapest instance type. They're the ones who have a process for deleting things they're not using.
# The best SageMaker cost optimization is a cron job
# Run weekly: find and report idle SageMaker resources
echo "=== Idle Notebook Instances ==="
aws sagemaker list-notebook-instances --status-equals InService \
--query 'NotebookInstances[].NotebookInstanceName' --output table
echo "=== Idle Endpoints (0 invocations, 7d) ==="
for ep in $(aws sagemaker list-endpoints --status-equals InService \
--query 'Endpoints[].EndpointName' --output text); do
inv=$(aws cloudwatch get-metric-statistics \
--namespace AWS/SageMaker --metric-name Invocations \
--dimensions Name=EndpointName,Value="$ep" Name=VariantName,Value=AllTraffic \
--start-time $(date -u -v-7d +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 604800 --statistics Sum \
--query 'Datapoints[0].Sum' --output text 2>/dev/null)
[[ "$inv" == "None" || "$inv" == "0.0" ]] && echo " ⚠️ $ep"
done
Building something to automate this? We built CloudWise to automatically detect idle SageMaker notebooks, endpoints, and oversized instances across all your AWS accounts — including air-gapped environments with no internet access. It's one of 90+ waste detectors that scan your infrastructure so you don't have to run scripts manually.
Found this useful? Drop a 🔖 bookmark — this is the reference I wish I had when I first got a surprise SageMaker bill.
Top comments (0)