How to Cut AWS Costs Without Reserved Instances
Originally published at https://fortem.dev/blog/reduce-aws-costs-without-ri
RIs and Savings Plans are table stakes — they change how you pay, not what runs. Here are 5 methods that cut your actual AWS consumption, ranked by impact: scheduling, right-sizing, Spot, auto-stop, and killing orphans.
Guide
You've already set up Reserved Instances and Savings Plans. You checked the boxes the FinOps team sent over. Your AWS bill is still too high — and it keeps climbing. That's because RIs and Savings Plans change how you pay for compute. They don't change how much compute you actually consume. If your dev and staging environments run 24/7 while your team works 40 hours a week, no pricing model optimization will fix that. Here are five things that will.
TL;DR
- RIs and Savings Plans change your pricing model — not your consumption. They're table stakes. Get them first, then keep reading.
- Scheduling non-prod environments to business hours alone cuts compute spend by 60–70% — 3× the impact of a typical RI on non-prod workloads.
- Right-sizing overprovisioned services costs $0 to implement and saves 10–30% immediately. Check p95 CloudWatch metrics before changing a single line of Terraform.
- Fargate Spot drops compute costs ~70% for fault-tolerant workloads. Combined with scheduling, dev environments cost near-zero.
- Most teams have 5–15% of environments that nobody owns. Finding and deleting 3 orphaned environments recovers $500–2,000/month.
Reserved Instances are table stakes — what's next?
If you don't have RIs or Savings Plans set up: stop reading. Go to the AWS Savings Plans console and commit to a 1-year plan for your production workloads. It's a 30–50% discount on list price for zero engineering effort. This is the lowest-hanging fruit in AWS cost optimization. Do it first.
Now here's the problem RIs don't solve: they change the price per unit, but not the number of units you consume. Your dev environments still run 168 hours a week. Your staging environment still sits idle at 3am on Sunday. Your three orphaned environments from last year's migration still bill by the second.
KEY INSIGHT: On a $10,000/month AWS bill where 70% is non-production compute: a 40% RI discount saves $2,800/month. Scheduling those same non-production environments to business hours saves $4,900/month. RI addresses the pricing model. Scheduling addresses the consumption.
$10,000/mo bill breakdown:
Non-production compute (70%): $7,000/mo
RI savings on non-prod (40%): −$2,800/mo
Scheduling savings (70% of compute hrs): −$4,900/mo
Scheduling captures 1.75× more savings than RIs on non-prod — and you can do both
Method 1: Schedule environments (60–70% savings)
There are 168 hours in a week. Your team works roughly 50 of them (Mon–Fri 9am–7pm). The other 118 hours — nights, weekends, holidays — your non-production ECS services sit idle, billing by the second. Scheduling means stopping them during off-hours and restarting them at the start of the workday.
“AWS Fargate charges $0.04048 per vCPU-hour and $0.004445 per GB-hour for Linux/x86 on-demand pricing. Every hour a dev environment runs at 3am, every minute a staging cluster spins through the weekend — that's billing against this rate.”
— AWS Fargate Pricing, verified May 2026
What to schedule:dev environments, QA, demo environments, training sandboxes, branch preview environments. Anything that doesn't need to be available at 3am on Sunday.
What NOT to schedule: production, customer-facing staging, on-call sandboxes that need 24/7 availability. Use per-environment configuration — not a single global schedule.
$1,730/mo
$515/mo
24/7 — always on
168 hrs/week
Business hours
50 hrs/week · Mon–Fri 9am–7pm
Monthly cost — 12 environments, 8 services each−70% savings
KEY INSIGHT: Scheduling costs $0 to implement — it's purely an operational change. No Terraform modifications. No new resources. Just stopping services when nobody is using them. For most teams with 10+ non-prod environments, scheduling is the single largest savings lever by a wide margin.
Implementation: tag every non-production environment, then run a scheduler (EventBridge + Lambda, or a third-party tool) that sets desired counts to 0 during off-hours and back to N at the start of the workday. Per-timezone configuration matters — your EU team starts 6 hours before your US team.
Method 2: Right-size your services (10–30% savings)
When someone first deployed that dev API service, they picked 1 vCPU and 2 GB. It made sense at the time. Six months later, the service processes one request per minute during business hours and sits idle every other second. It's paying for capacity it never uses.
How to find overprovisioned services: go to CloudWatch Container Insights → your ECS cluster → CPU Utilization and Memory Utilization per service. Look at the p95 over the last 14 days — not the average. A service with p95 CPU at 87 units on a 1024-unit allocation is using 8.5% of its provisioned capacity.
KEY INSIGHT: A common pattern: task definition requests 1024 CPU units. CloudWatch p95 over 14 days shows 87 CPU units. That service is paying for 12× more CPU than it actually needs. Right-size the task definition to 256 (p95 87 × 3 = 261 ≈ 256) and you cut its Fargate cost by 75%.
Right-sizing rule: p95 × 3, round to nearest Fargate increment
1 vCPU (1024) → p95 = 87 → 87 × 3 = 261 → right-size to 256 = −75% cost
0.5 vCPU (512) → p95 = 120 → 120 × 3 = 360 → keep at 512 = already right-sized
2 GB memory → p95 = 310 MB → 310 × 3 = 930 MB → right-size to 1 GB = −50% cost
Risk: traffic spikes can overwhelm a right-sized service. Mitigate with ECS Service Auto Scaling — set a target tracking policy on CPU utilization at 70%. The service starts small, scales up when needed, scales down at night. Right-sizing without autoscaling is gambling. Right-sizing with autoscaling is engineering.
Method 3: Fargate Spot (up to 70% discount)
Fargate Spot runs tasks on spare AWS capacity at roughly 70% off on-demand pricing, per AWS Fargate pricing (verified May 2026). The tradeoff: AWS can reclaim that capacity with a 2-minute warning. ECS handles the drain and restart cleanly — your task gets SIGTERM, 30 seconds to drain connections, then the replacement task starts on either new Spot capacity or falls back to On-Demand.
Fargate Spot vs On-Demand (0.5 vCPU + 1 GB, Linux/x86):
On-Demand: $0.024685/hr → $18.02/service/mo
Spot: $0.007872/hr → $5.75/service/mo
−68% per service
Good for Spot: CI/CD test runners, batch jobs, dev environments for individual engineers, any workload that restarts cleanly.
Bad for Spot:production, customer-facing staging, anything with an SLA. Use the capacity provider strategy to split — 80% Spot / 20% On-Demand — and interruptions don't cause downtime, just a brief shift to on-demand.
KEY INSIGHT: Spot combined with scheduling creates a compound effect: a dev service on business hours (29.8% of the week) running on Spot (32% of on-demand price) costs just 9.5% of the original 24/7 on-demand cost. A $18.02/month service drops to $1.71/month. That's not a typo.
Method 4: Auto-stop idle environments
This is different from scheduling. Scheduling is predictable — environments stop and start on a fixed calendar. Auto-stop targets environments that _should_be in use but aren't. An environment that hasn't seen a deployment in 10 days, has zero active connections, and generates no application logs — it's probably abandoned, even if someone forgot to tell you.
Implementation:monitor CloudTrail for ECS service updates (deployments) and CloudWatch Logs for application activity. If an environment has zero deploy events and zero log activity for a configurable threshold — say 6 consecutive days — automatically set its ECS service desired counts to 0. Send a Slack notification: “use1-dev-experiment stopped — idle 6 days. One-click restart here.”
KEY INSIGHT: The organizational question is harder than the technical implementation: who decides what “idle” means? 3 days? 7 days? 14 days? Define the policy with your team leads, document it, and give developers a 24-hour warning before auto-stop kicks in. The technical part is a Lambda function. The organizational part is a Slack thread.
Best practice: start conservative. 14-day idle threshold, 48-hour warning. Measure how many environments get auto-stopped and how many get immediately restarted. Tighten the threshold over time as the team builds trust in the process.
Method 5: Kill orphaned environments
While auto-stop handles the recently-idle, this method handles the permanently-abandoned. Every team that's been running ECS for more than a year has environments that nobody claims. They were spun up for a migration, a hackathon, a departed engineer's experiment. Nobody deploys to them. Nobody knows who owns them. They just bill — quietly, every month.
“Most teams we work with find 5–15% of their environments are completely abandoned — no deploys in 6+ months, no identifiable owner, no access logs. Three orphaned environments at $170/month each = $6,120/year of compute serving zero requests.”
— Fortem fleet audit of 100+ ECS environments across 12 teams, 2026
Audit approach: pull the last deployment timestamp per environment. Cross-reference with the team directory (who owns what?). Environments with no deploy in 30+ days and no active team owner go on a review list. The platform team reviews the list, confirms abandonment, and deletes the infrastructure.
KEY INSIGHT: Finding orphaned environments is a one-time audit that costs $0 and takes an afternoon. The savings compound every month. For a team with 50+ environments, the most common outcome is 2–5 orphans worth $500–$2,000/month. That's $6,000–$24,000/year — from a one-time afternoon of work.
Comparing the 5 methods
Stack these in order. Start with the highest-impact, lowest-effort method and work down. Don't try to implement all five at once — that's how cost optimization projects die in committee. Do method 1 this week. Method 2 next week. See the savings compound.
| Method | Impact | Effort | Risk | What it means |
|---|---|---|---|---|
| 1. Scheduling | 60–70% | Low | None | Dev/staging envs stop outside business hours (50 hrs/wk instead of 168). Zero Terraform changes. |
| 2. Right-sizing | 10–30% | Medium | Low | Drop task CPU/memory to p95 + 50% headroom. One-time TF change per service. |
| 3. Fargate Spot | Up to 70% | Low | Medium | Switch capacity provider to FARGATE_SPOT. 2-min interruption notice from AWS. |
| 4. Auto-stop idle | Variable | Medium | Low | Stop any env not deployed to or accessed in 6+ days. CloudTrail + Lambda. |
| 5. Kill orphans | $500–2,000/mo | Low | None | Find envs with no owner and no deploys in 30+ days. Delete them. |
$5,765/mo· $69,180/yr
Combined impact on a $10,000/mo fleet: RI (−$2,800) + Scheduling (−$4,900) + Right-sizing (−$1,050 on remaining) + Spot on eligible dev envs (−$815). Total: $10,000 → $4,235/mo. 57% reduction without touching a single Reserved Instance.
The specific numbers depend on your fleet composition. A team with 80% non-prod compute will see scheduling dominate. A team where everything runs at steady utilization will see right-sizing and Spot carry the weight. The framework is the same regardless: reduce consumption first, then optimize the pricing model on what remains.
Common questions
Do I need to change my Terraform to implement any of this?
Not for scheduling or auto-stop — those are operational concerns handled outside Terraform. Right-sizing requires updating task definition files. Spot requires changing capacity provider strategy in your ECS service definition. Killing orphans requires no Terraform changes. Most teams start with scheduling (zero Terraform impact) and right-sizing (the small Terraform change with the second-largest impact).
What happens to databases when an environment is scheduled off?
ECS scheduling stops and starts compute tasks — it does not touch RDS, ElastiCache, or any other stateful services. Your databases keep running and billing. If you want to stop databases too, you need separate scheduling for each service type. Most teams leave databases running 24/7 and only schedule compute — the cost difference is usually worth the operational simplicity.
Is Fargate Spot safe for staging environments?
It depends on what staging is used for. If staging runs automated tests and can tolerate a 2-minute interruption, Spot is fine. If staging hosts customer demos or is expected to be reliably available during business hours, use On-Demand for those specific services. The capacity provider strategy lets you split — 80% Spot / 20% On-Demand — so interruptions don't cause downtime.
How do I find which environments are idling?
Pull the last task run timestamp from CloudWatch Logs Insights — any service with no log events in the last 14 days is a candidate. Cross-reference with your deployment records (last deploy date). Environments with no deploys in 30+ days and no active owner are safe to stop. Fortem surfaces last deploy time, last access time, and owner for every environment — turning a 2-hour audit into a 2-minute filter.
### Stop optimizing the pricing model. Start optimizing what runs. Fortem automa
Top comments (0)