Your manager just asked you to cut cloud costs by end of quarter.
Your first instinct is to look at migrating to a cheaper provider. Don't. That's a 3-month project minimum and you have 6 weeks.
Here's what actually works fast — three changes that show up in billing within days, not months.
1. Spot Instances for Training and Batch Jobs (saves 60–70%)
If your team is running ML training, ETL jobs, or any batch processing on on-demand EC2, you're paying full price for work that can be interrupted and restarted.
Spot instances run the exact same hardware for 60–70% less. The only requirement is that your job handles interruptions gracefully — which for training jobs with checkpointing, it already does.
For SageMaker training jobs:
estimator = sagemaker.estimator.Estimator(
...
use_spot_instances=True,
max_wait=7200, # 2 hour max wait
max_run=3600, # 1 hour max run
)
For raw EC2:
aws ec2 request-spot-instances \
--instance-count 1 \
--type one-time \
--launch-specification file://spec.json
Real impact: A team running p3.2xlarge on-demand at $3.06/hour switches to spot at ~$0.91/hour. 100 training hours per month = $215 saved. Every month.
2. Savings Plans for Baseline Compute (saves 30–40%)
Spot works for interruptible workloads. For everything that runs continuously — schedulers, API servers, always-on processing nodes — Reserved Instances or Savings Plans give you 30–40% off with zero changes to your infrastructure.
The commitment is financial, not technical. You're not locked into specific instance types or regions with Compute Savings Plans.
Check your on-demand baseline first:
aws ce get-cost-and-usage \
--time-period Start=2026-05-01,End=2026-05-31 \
--granularity MONTHLY \
--metrics BlendedCost \
--group-by Type=DIMENSION,Key=INSTANCE_TYPE \
--output table
Find the instance types you run every day without exception. Buy a 1-year no-upfront Savings Plan for that baseline. The discount applies immediately — it shows up in your next billing cycle.
Real impact: $5,000/month in stable EC2 spend becomes ~$3,000/month with a 1-year Compute Savings Plan. $24,000 saved per year. 10 minutes to purchase.
3. Schedule Dev and Staging Clusters (saves 60% of those environments)
Your production cluster needs to run 24/7. Your dev and staging clusters almost certainly do not.
If your team works 8am–8pm, your dev environment is sitting idle for 12 hours every night and 48 hours every weekend. That's 252 hours of idle time per month out of 720 total — 35% of your bill for zero value.
EventBridge rule to stop EC2 instances nightly:
# Stop instances tagged Environment=dev at 8pm UTC
aws events put-rule \
--schedule-expression "cron(0 20 * * ? *)" \
--name "StopDevInstances" \
--state ENABLED
# Start them again at 8am UTC
aws events put-rule \
--schedule-expression "cron(0 8 * * ? *)" \
--name "StartDevInstances" \
--state ENABLED
For RDS dev databases:
# Stop dev RDS instance
aws rds stop-db-instance \
--db-instance-identifier your-dev-db
# Note: RDS auto-stops after 7 days —
# use a Lambda to restart it on schedule
Real impact: A dev environment costing $2,000/month running 24/7 costs $1,300/month on a business-hours schedule. $700/month saved from one EventBridge rule.
The Combined Impact
| Change | Effort | Time to implement | Monthly saving |
|---|---|---|---|
| Spot for training/batch | Low | 1–2 hours | 60–70% of those workloads |
| Savings Plans for baseline | Very low | 10 minutes | 30–40% of stable compute |
| Schedule dev/staging | Low | 20 minutes | 60% of non-prod environments |
A team spending $20,000/month on compute that implements all three can realistically be at $10,000–$12,000/month within 30 days. No migration. No architecture changes. No new vendors.
Before You Start — Find Out What You're Actually Paying For
The three fixes above work best when you know exactly where your compute spend is going. Run this first:
aws ce get-cost-and-usage \
--time-period Start=2026-05-01,End=2026-05-31 \
--granularity MONTHLY \
--metrics BlendedCost \
--group-by Type=DIMENSION,Key=SERVICE \
--output table
Then drill into EC2 specifically:
aws ce get-cost-and-usage \
--time-period Start=2026-05-01,End=2026-05-31 \
--granularity MONTHLY \
--metrics BlendedCost \
--group-by Type=DIMENSION,Key=INSTANCE_TYPE \
--filter '{"Dimensions":{"Key":"SERVICE","Values":["Amazon EC2"]}}' \
--output table
This shows you exactly which instance types are costing the most — which tells you where Spot and Savings Plans will have the most impact.
Want a Structured Check Across All 18 Common Patterns?
These three fixes cover compute. But most teams also have recoverable spend hiding in storage, networking, and database that they're not looking at.
If you want a systematic 15-minute check across all 18 patterns — including NAT Gateway overuse, unattached EBS volumes, missing Reserved Instances, and dev RDS running 24/7 — run the free audit at kloudaudit.eu.
No AWS credentials. No signup. Just your answers and an instant savings estimate.
Samuel Ayodele Adomeh is a Senior DevOps Engineer and Azure Solutions Architect based in Wrocław, Poland. He built KloudAudit after seven years of reviewing cloud bills and seeing the same patterns on every infrastructure he worked with.
Top comments (0)