Aman Singh

Posted on May 20

7 EC2 Savings Plan Mistakes That Are Costing You Millions

#ai #cloud #finops #cloudnextchallenge

Overcommitting an EC2 Savings Plan is one of the fastest ways to hand AWS money for nothing in return. You pay the hourly commitment rate whether or not your instances are running. When the commitment doesn't match actual usage, the discount becomes a penalty.

Here are the seven mistakes FinOps engineers see most often in real AWS accounts, what each one actually costs, and how to fix them.

What Makes Savings Plan Mistakes So Expensive
An EC2 Savings Plan is a commitment to a minimum hourly compute spend ($/hour) in exchange for discounts up to 40–72% off On-Demand rates. Unlike Reserved Instances, Savings Plans apply automatically to any matching usage which means the application logic is largely invisible to the buyer.

That invisibility is where mistakes start. You buy at one point in time based on data that may already be stale. The commitment runs for 1 or 3 years. If the data was wrong, or usage patterns shift, you pay for that error for the entire term.

Quick reference on plan types:

EC2 Instance Savings Plans: highest discounts (up to 72%) but locked to a single instance family in a single region.
Compute Savings Plans: lower max discounts but apply across any instance family, size, OS, tenancy, and region, including Fargate and Lambda.

Mistake 1: Buying Before You Rightsize

Buying a Savings Plan for an over-provisioned instance locks in a high hourly rate that's already wasteful. The discount applies to the committed rate not to the over-provisioned headroom above your actual workload.

What it costs: An m5.4xlarge in us-east-1 runs ~$0.768/hour On-Demand. A 3-year No Upfront Savings Plan brings that to ~$0.278/hour. An m5.large the right size for a workload running at 15% CPU costs ~$0.035/hour under equivalent savings. At 100 instances for a year, that's $243,528 vs. $30,660. The discount is real; the base is wrong.

The fix: Run AWS Compute Optimizer before purchasing. It analyzes 14 days of CloudWatch metrics and recommends right-sized instance types. Rightsize first, then analyze 30 days of post-rightsizing usage, then purchase.

Mistake 2: Committing to Peak Instead of Baseline

Savings Plans should be sized to your steady-state baseline, not your peak. Peak usage belongs on On-Demand or Spot. When you commit to a rate that covers both, you're paying the committed rate for capacity you only need intermittently.

What it costs: Say your fleet runs at $2,000/hour normally and spikes to $5,000/hour at peak.

Committing at $5,000/hour to cover everything:

$3,650,000/month in commitments (730 hours × $5,000)
530 of those hours only needed $2,000/hour
$1,590,000/month paid for nothing

Committing at $2,000/hour (baseline only):

Commitment: $1,460,000/month
On-Demand for 200 peak hours: $600,000
Total: $2,060,000/month saving $1,590,000

Commit to the floor. Let Auto Scaling handle spikes at On-Demand rates. The On-Demand premium on spikes is almost always cheaper than paying a committed rate for idle capacity.

The fix: Pull 60–90 days of hourly EC2 usage from Cost Explorer. Identify the consistent floor, the usage level that never drops below a threshold. Commit to 80–90% of that floor.

Mistake 3: Using EC2 Instance Savings Plans for Unstable Workloads

EC2 Instance Savings Plans lock you to a specific instance family and region. The discount is ~10–15 percentage points higher than Compute Savings Plans but if your workload migrates to a new instance family, region, or moves to Fargate, the plan doesn't follow.

What it costs: Assume $50,000/month committed under an EC2 Instance Savings Plan. After a migration to m6i, $30,000/month is now mismatched. Over 18 remaining months on a 3-year term: $540,000 in committed spend with no offsetting discount.

The fix: Default to Compute Savings Plans for any workload that may evolve. EC2 Instance Savings Plans are appropriate only for genuinely stable, single-region, single-family workloads unchanged for 12+ months and only with a documented 6-month review checkpoint.

Mistake 4: Buying 3-Year Terms for Non-Stable Workloads

A 3-year plan offers ~10–15% more discount than a 1-year plan. That math only works if your workload stays consistent for all 36 months.

What it costs: 100 × m5.xlarge in us-east-1:

1-year No Upfront: ~$0.149/hour
3-year No Upfront: ~$0.124/hour
Over 36 months, the gap = ~$65,700 in additional savings

But if the workload migrates to month 18, the remaining 18 months generates zero discount return: $0.124 × 100 instances × 4,380 hours = ~$543,000 in committed spend 8× the savings you were trying to capture.

For a full ROI breakdown on term length, see EC2 Savings Plans: 1-Year vs 3-Year Commitment ROI Analysis

The fix: Match term to architectural stability horizon. If your roadmap includes Kubernetes migration, major refactoring, or significant scale changes within 24 months, buy 1-year terms.

Mistake 5: Not Monitoring Utilization After Purchase

A Savings Plan commitment runs whether or not it's being applied to active usage. Utilization can drop to 40–50% without triggering any default alert.

What it costs: $10,000/month committed at 60% utilization = $4,000/month in spend generating no discount. Over 12 months: $48,000 in guaranteed waste.

AWS Cost Explorer refreshes utilization data every 72+ hours. A utilization drop today won't surface in your dashboard for up to 3 days:
Day What's happening What your dashboard shows
Day 1 Utilization drops to 55% Still showing yesterday's 92%
Day 2 ~40% of committed spend wasted No alert, no signal
Day 3 $18,000–$36,000 accumulated waste Dashboard finally updates

There's no retroactive credit. You pay for all three days.

The fix: Set CloudWatch alarms on Savings Plans utilization metrics. Alert when utilization drops below 85%. Review weekly, not quarterly.

Mistake 6: Not Using AWS Organizations for Commitment Sharing

Savings Plans apply at the account level by default. In a multi-account org without sharing enabled, one account's unused commitment stays isolated while another pays full On-Demand for eligible usage.
What it costs: Account A Account B
Monthly committed spend $5,000 $0
Utilization 55% —
Unused commitment $2,250/month —
EC2 On-Demand spend — $5,000/month

Without org-level sharing: ~$4,250/month wasted (unused commitment + foregone discount on Account B's eligible spend).

With sharing enabled: Account A's unused $2,250 automatically covers Account B's usage. Recovered value: $2,000–$2,500/month with zero additional spend or $24,000–$30,000/year from one setting.

For a broader overview of EC2 pricing models, see Amazon EC2 Pricing Explained: Models, Costs & How to Save

The fix: In the management account, go to AWS Billing and Cost Management → Savings Plans → verify sharing is enabled. Confirm via the coverage report that linked accounts are consuming shared commitments before purchasing additional plans per account.

Mistake 7: Buying on Stale Recommendations

AWS Cost Explorer recommendations update every 72+ hours. If usage shifted over the weekend and you buy on Monday, you're committing to a pattern that no longer exists.

What it costs: A recommendation suggests $3,000/hour. Over the weekend, three batch workloads completed and were decommissioned. Actual usage justifies $2,000/hour. You've over-committed by $1,000/hour at scale, that's $876,000/year in misallocated spend (at $100/hour over-commitment).

The fix: Cross-reference Cost Explorer recommendations against your current Cost and Usage Report. Look at the past 7 days of actual hourly usage. If usage changed materially in the past 72 hours, recalculate manually.

How These Mistakes Share a Root Cause

Every mistake here has the same underlying problem: point-in-time purchasing decisions applied to continuously changing workloads, monitored with tools that don't refresh fast enough to catch drift.

At $6,000–$12,000/day in uncovered spend, a 48-hour additional lag on detecting drift adds $12,000–$24,000 per event before the data allows action.

Usage.ai refreshes EC2 Savings Plan recommendations every 24 hours vs. Cost Explorer's 72+. Its Autopilot mode manages purchasing continuously sizing to current baseline, not historical peak. If a commitment purchased through Usage.ai becomes underutilized due to workload changes, it provides cashback on the underutilized portion, not credits. Cashback is real money returned to the business.

Companies including Motive, EVGo (NASDAQ: EVGO), Secureframe, and Blank Street Coffee manage EC2 commitments through the platform. Setup takes 30 minutes, requires billing-layer access only, and charges a percentage of realized savings only if it saves nothing, the fee is zero.

Which of these have you run into in your own accounts? Curious whether the utilization monitoring gap (Mistake 5) catches people as often as I think it does.

Top comments (4)

Argon Loop • May 20

Useful breakdown, especially the overcommitment point. In tenant chargeback reviews for AI workloads, which single auditable anchor has worked best for closing disputes without replaying allocation logic: invoice-line provenance token, allocation-lineage identifier, or another field? If you use another anchor, what minimum fields are mandatory so Finance can close one dispute reproducibly?

Aman Singh • Jun 5

Great question and one that comes up constantly in multi-tenant AI workload reviews.
The most reliable anchor is the CUR lineItem/ResourceId paired with a tenant/workload cost allocation tag. The resource ID gives you immutable invoice-level provenance; the tag maps it to the owning team. Together, Finance can verify the charge directly against the AWS invoice with no allocation logic to replay.
The allocation-lineage identifier approach introduces a dependency on your internal engine's state at billing time, which is exactly what makes disputes hard to close when logic changes or was applied inconsistently.
To close a dispute reproducibly, the four fields you need are the lineItem/ResourceId (invoice-anchored and immutable), the tenant cost allocation tag such as tenant-id, team, or workload, the lineItem/UsageStartDate and UsageEndDate to time-bound the charge, and lineItem/UnblendedCost for the exact dollar figure Finance is disputing. Those four let Finance match the chargeback to the AWS invoice line without needing your allocation system's context.
For AI workloads specifically, also worth tagging at the inference endpoint or SageMaker endpoint level that's where per-tenant cost separation gets granular enough to be dispute-proof.

Sol • May 21

Strong point on committing to baseline instead of peak. In LLM workloads, we keep seeing another overcommit trap: teams reserve USD at blended averages before separating prompt-cache writes, cache reads, and input/output token classes. That can under-reserve expensive paths while looking covered in aggregate. In your reviews, do you reserve by token class and cache path first, then convert to USD, or do you anchor directly in blended $/hour? Which has held up better in audit disputes?

Aman Singh • Jun 5

Good catch, blended $/hour anchoring is where teams get burned with LLM workloads because the cost distribution across token classes is wildly uneven. Cache writes on Bedrock can run 3-4x the cost of cache reads, and output tokens are typically 3-5x input tokens depending on the model. Committing at a blended average means you're over-reserving cheap paths and under-reserving expensive ones, so coverage looks fine on paper until you break it down by call type.
Reserving by token class first, then converting to USD, holds up better in audits. When Finance questions a charge, you can point to actual usage broken down by cache writes, cache reads, input, and output tokens separately, then show how the commitment maps to each. With blended $/hour you're defending an average nobody can directly verify against the invoice.
The practical approach is to pull your Bedrock or API usage logs, separate spend by token type over 30-60 days, find the stable floor per class, and size the commitment against that. Cache reads are high-volume but cheap, so over-committing there wastes money. Cache writes and output tokens are where spend actually concentrates, that's where coverage matters.
Where this breaks down is when teams don't have per-token-class visibility before buying and are working off Cost Explorer aggregates, which don't break it down that way. You need to go to the actual usage logs or CUR data at the API operation level before touching commitment sizing.