EC2 Spot vs On-Demand: the true cost difference in 2026

#aws #finops #cloud #devops

Quick Answer (TL;DR)

EC2 Spot lists at up to 90% off On-Demand, but the effective savings after accounting for interruptions, engineering overhead, and workload retries land closer to 40 to 60% for most teams in 2026. Spot wins for stateless, retryable, or checkpointable workloads. It loses money on single-instance stateful services with strict SLAs. The honest formula: True savings = Spot discount × Utilization ÷ (1 + Interruption overhead).

Why the sticker discount is misleading

The Spot price is a market price. AWS sets it against unused capacity in a given instance family, region, and Availability Zone, and it can move in minutes. The 90% headline is the maximum discount for a rarely-used instance family in an off-peak region. The workhorses (m6i, c7i, r7g in us-east-1) usually sit at 55 to 75% off.

Then there is the hidden cost of interruption. AWS gives a 2-minute warning before reclaiming a Spot instance. Handling that gracefully requires either a stateless workload, a checkpointed job, or careful autoscaler wiring. Teams that do not build for interruption end up with retries, half-finished batches, and engineering time that erases the savings.

Fix #1: Diversify across instance types and AZs

The single most effective way to reduce Spot interruption rate. Instead of asking for m6i.large specifically, ask for "any of m6i.large, m6a.large, m7i.large, m7a.large in any AZ." AWS pools capacity across the diversification pool.

With Karpenter or Auto Scaling Groups:

Set the NodePool or ASG's requirements to allow 5 to 15 instance types across families.
Include both x86 and ARM (Graviton) options when your workload runs on both.
Enable capacity-optimized-prioritized allocation strategy, which picks the deepest capacity pool at launch.

Result: interruption rate drops from ~5% per instance-hour to under 1% on most workloads.

Fix #2: Use Spot for the right workload shape

Not every workload should be on Spot. The rule I use:

Great fits: batch processing, data pipelines, ML training with checkpoints, stateless API tier behind a load balancer, CI/CD runners, dev and staging.
Bad fits: single-instance stateful databases, primary Redis, workloads with startup times over 5 minutes, real-time trading paths where the 2-minute drain window is unacceptable.

For hybrid workloads, use capacity-based mixing: run 60 to 80% of an ASG on Spot with an On-Demand floor of 20 to 40%. This is the pattern that survives when Spot capacity dries up during regional spikes.

Fix #3: Use commitments for the On-Demand base

For the portion that has to stay On-Demand, layer in a Compute Savings Plan on top. A 1-year No-Upfront Compute SP gives 20 to 30% off, stacks cleanly with the Spot savings on the rest, and covers the burst capacity that Spot cannot absorb.

The typical 2026 production mix:

60 to 70% Spot with diversification
20 to 30% On-Demand covered by a 3-year Compute Savings Plan
Remaining 5 to 10% pure On-Demand for burst

Effective blended discount: 45 to 55% off list, without the operational risk of pure-Spot.

How to prevent Spot losses

Three practices avoid the "Spot cost us more than On-Demand" outcome.

Handle the 2-minute notice. Every Spot workload should have a PreStop hook (Kubernetes) or a shutdown handler (systemd unit or containerd hook) that drains connections and saves state. Without this, interrupted work has to be redone.
Monitor interruption rate per instance family. AWS publishes an interruption frequency in the Spot Instance Advisor. Anything above 10% for your target family should push you to diversify or switch families.
Track effective savings, not sticker savings. Multiply your Spot spend by the interruption overhead (retry cost, engineering time). If effective savings drop below 30%, you are paying more than you think.

FAQ

How often are Spot instances actually interrupted in 2026?
Depends on the family and region. Popular families in us-east-1 sit at 5 to 15% per instance-hour interruption. Less popular families and regions can be under 3%. Check the current Spot Instance Advisor for your target.

Can I run production on 100% Spot?
Yes for stateless workloads with proper diversification. Kubernetes on Karpenter with 5+ instance types and multi-AZ regularly runs entire production tiers on Spot. Databases and stateful primaries should stay off Spot.

Is Spot cheaper than a Savings Plan?
Yes on paper. A 3-year Compute SP is ~55% off; Spot is 70 to 85% off before interruption cost. After interruption overhead, Spot lands at 40 to 60% and SP at 45 to 55%. They are often within a few points, so operational fit matters more than the number.

What about Spot Blocks?
Deprecated in 2022 for new users. Not coming back. Use Spot with diversification instead.

Does Fargate Spot work the same way?
Similar interruption model, slightly less flexible allocation. Fargate Spot is 70% off Fargate On-Demand and works well for short-lived batch tasks. It does not diversify across instance types the way EC2 Spot does.

Related guides

The AWS Spot Instance Advisor has real interruption rates by instance family and region.
Karpenter documentation covers Spot to On-Demand fallback which is the current best pattern for K8s workloads.
For a broader comparison of commitment options, see the industry write-ups on Savings Plans versus Reserved Instances in 2026.