DEV Community

Cover image for Why Do AWS Staging Environments Cost So Much?
Matt
Matt

Posted on • Originally published at fortem.dev

Why Do AWS Staging Environments Cost So Much?

Why AWS Staging Environments Cost So Much (2026 Guide)

Originally published at https://fortem.dev/blog/aws-staging-environment-cost
AWS staging environments run 168 hours a week. Your team works 40. Here's where the money goes on ECS Fargate — and how to cut it without touching production.


Guide

aws-staging-environment-costfargate-idle-costecs-environment-scheduling

You have 10 ECS environments. Most of them are staging, QA, or dev. No one is using them at 2am on Saturday. But Fargate bills by the second, and by the time the monthly invoice arrives the number is larger than expected. This isn't an infrastructure design problem — it's an idle compute problem. Here's where the money goes, and what moves the needle.

TL;DR

  • 01Non-prod ECS environments run 168 hours a week. Your team works 40. That's 128 hrs/week of idle compute per environment.
  • 02Fargate compute is ~68% of your ECS bill. The rest (CloudWatch Logs, ALB baseline) doesn't stop when the environment sits idle.
  • 03NAT Gateway, VPC, and often ALB are shared across environments — that overhead doesn't multiply. Compute does.
  • 04Fargate Spot cuts non-prod compute by up to 70% for fault-tolerant tasks. Not suitable for demo environments or shared QA sessions.
  • 05Business-hours scheduling (Mon–Fri 09:00–19:00) cuts active compute time to ~30% of the 24/7 baseline with zero architecture changes.

Ready to use — drop this into your Terraform today

ECS Application Auto Scaling scheduled actions — stops all tasks at 19:00 and restarts at 09:00, Mon–Fri. No Lambda required. Replace your-cluster and your-service with your values. Repeat the aws_appautoscaling_* blocks for each service.

# Register the ECS service as a scalable target
resource "aws_appautoscaling_target" "staging_svc" {
  max_capacity       = 4
  min_capacity       = 0
  resource_id        = "service/your-cluster/your-service"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

# Stop at 19:00 UTC Mon–Fri
resource "aws_appautoscaling_scheduled_action" "stop_evening" {
  name               = "stop-staging-evening"
  service_namespace  = aws_appautoscaling_target.staging_svc.service_namespace
  resource_id        = aws_appautoscaling_target.staging_svc.resource_id
  scalable_dimension = aws_appautoscaling_target.staging_svc.scalable_dimension
  schedule           = "cron(0 19 ? * MON-FRI *)"

  scalable_target_action {
    min_capacity = 0
    max_capacity = 0
  }
}

# Restart at 09:00 UTC Mon–Fri
resource "aws_appautoscaling_scheduled_action" "start_morning" {
  name               = "start-staging-morning"
  service_namespace  = aws_appautoscaling_target.staging_svc.service_namespace
  resource_id        = aws_appautoscaling_target.staging_svc.resource_id
  scalable_dimension = aws_appautoscaling_target.staging_svc.scalable_dimension
  schedule           = "cron(0 9 ? * MON-FRI *)"

  scalable_target_action {
    min_capacity = 1
    max_capacity = 4
  }
}

# Optional: Fargate Spot capacity provider for non-prod
resource "aws_ecs_service" "staging_svc" {
  # ... your existing service config ...

  capacity_provider_strategy {
    capacity_provider = "FARGATE_SPOT"
    weight            = 1
  }

  capacity_provider_strategy {
    capacity_provider = "FARGATE"
    weight            = 0
    base              = 0
  }
}
Enter fullscreen mode Exit fullscreen mode

Monthly compute cost — 10 non-prod environments (80 services, 0.5 vCPU each)

us-east-1, Linux x86, on-demand rates June 2026

24/7 on-demand

$1,442/mo

Business hours on-demand

-70%$428/mo

Business hours + Fargate Spot

-91%$128/mo

Business hours = Mon–Fri 09:00–19:00 (50 hrs/wk, ~217 hrs/mo). Fargate Spot at 70% discount. Shared infrastructure (NAT Gateway, VPC, ALB) not included — shared cost does not multiply per environment.

Why non-prod spend stays invisible

Non-prod costs get lumped into a single “infrastructure” line item with no per-environment breakdown. No one owns the number, so it doesn't get fixed.

Production gets optimized after a big bill. Staging gets the same config it had when the second engineer joined and no one has touched it since. The reason isn't negligence — it's visibility. AWS Cost Explorer shows you ECS as a service total. Without per-environment cost allocation tags, there's no way to see that your staging environment costs more than your QA environment, or that three dev environments have been running since February with no active work behind them.

The result: non-prod spend is invisible in reviews, gets absorbed into the overall AWS bill, and deferred indefinitely with “it's just staging, we'll fix it later.”

KEY INSIGHT: Key insight “Nobody noticed because staging bills get lumped into ‘infrastructure costs’ and nobody questions them.” — practitioner, dev.to

Where the money goes on Fargate

Fargate compute is ~68% of a typical ECS bill at $0.04048/vCPU-hr and $0.004445/GB-hr. The remaining 32% — CloudWatch Logs at $0.50/GB ingested, ALB baseline at $0.0225/hr — doesn't scale to zero when tasks are idle.

The big number is compute, and compute is the lever. But a few non-obvious charges compound the problem for non-prod environments specifically:

  • 01

    CloudWatch Logs — verbose by default

    Non-prod environments often run at DEBUG log level. A service generating 1 GB/day of logs costs $15/month in ingestion alone. Multiply by 8 services and 10 environments and you have a meaningful line item that has nothing to do with compute.

  • 02

    Container Insights — charged per observation

    Container Insights is on by default on many clusters. For non-prod, it adds cost without adding value. Turn it off on dev and staging clusters.

  • 03

    ALB dedicated to one environment

    If each environment has its own ALB, the $0.0225/hr base charge ($16.43/mo) runs regardless of traffic. Teams running 10 environments with dedicated ALBs pay $164/mo in ALB base charges before a single request is processed.

The 168-hour problem

A non-prod environment running 24/7 runs 168 hours a week. Your team works 40. That gap — 128 hours per week of idle compute per environment — is the real cost driver on Fargate.

Let's do the math on a realistic fleet. Ten non-prod environments, each running 8 services at 0.5 vCPU and 1 GB memory:

Scenario Hrs/mo active Compute/mo vs 24/7
24/7 on-demand 730 $1,442
Business hours on-demand ~217 $428 −70%
Business hours + Spot ~217 ~$128 −91%

80 services × 0.5 vCPU × $0.04048/hr + 80 × 1 GB × $0.004445/hr. Business hours = Mon–Fri 09:00–19:00 UTC (~217 hrs/mo).

KEY INSIGHT: Key insight The compute in a non-prod environment doesn't know it's 2am on Sunday. It charges the same rate as a Tuesday afternoon.

Fargate bills by the second with no minimum charge. A task stopped at 19:00 pays nothing until it restarts at 09:00. That's not an approximation — it's how the billing model works. The savings from scheduling are immediate and exact.

What shared infrastructure changes (and doesn't change)

NAT Gateway, VPC, and often ALB are shared across environments. That overhead doesn't multiply per environment. What multiplies is compute — one set of running tasks per environment, billed independently.

A well-structured ECS fleet shares:

  • NAT Gateway — one per VPC, ~$32.85/mo base. Shared across all environments. $3.29/env at 10 environments.
  • ALB with host-based routing — one ALB routes to all environments via hostname rules. $16.43/mo base total, not per environment.
  • VPC, subnets, security groups — no per-environment charge.

What doesn't share: Fargate task hours, CloudWatch Logs ingestion per environment, and ECR image pull data. These are the numbers that multiply at fleet scale — and they're all driven by idle compute.

This is why the fix is scheduling tasks, not redesigning network architecture. Once you understand that shared infra is already cheap per environment, the question becomes: how do you stop paying for 128 idle compute hours per week?

You can set up per-environment cost allocation tags with AWS Cost Anomaly Detection to get alerted when any single environment deviates from its historical spend baseline — useful once you have scheduling in place and want to catch drift.

Fargate Spot for non-prod: when it works, when it doesn't

Fargate Spot runs non-prod tasks on spare AWS capacity at up to 70% off on-demand rates. It works well for dev and QA. Avoid it for environments used for customer demos or with stateful in-memory work that can't tolerate a restart.

The mechanics: AWS gives 2 minutes' warning via SIGTERM before reclaiming Spot capacity. ECS marks the task as SPOT_INTERRUPTIONand, if desired count is still > 0, launches a replacement.

Environment type Fargate Spot? Reason
Dev environments ✓ Yes Stateless, restartable, no active users
Feature branch preview ✓ Yes Ephemeral, restartable on interrupt
CI / integration tests ✓ Yes Short-lived tasks, retry on failure
QA (automated) ✓ Yes Tests restart automatically on failure
QA (live session) ✗ Risky Interrupt kills active QA session
Demo environment ✗ No Customer impact if interrupted
Staging (production-like) ✗ Usually not Used for final validation, needs stability

The capacity provider strategy in the Terraform block above sets FARGATE_SPOT weight=1, FARGATE weight=0 — pure Spot. For environments that need occasional stability, set Spot weight to 3 and on-demand weight to 1 to prefer Spot but fall back automatically.

Business-hours scheduling: the fastest ROI

Scheduling ECS tasks to stop at 19:00 and restart at 09:00 Mon–Fri cuts active compute time from 730 hours/month to ~217 hours — a 70% reduction with no architecture changes required.

The AWS-native approach uses ECS Application Auto Scaling scheduled actions. No Lambda function, no custom scheduler, no third-party tool — this is a first-class ECS feature. The Terraform block at the top of this article implements it exactly.

A few operational details worth knowing before you deploy:

  • Deregistration delay. ALB target groups have a default 300-second deregistration delay. Reduce this to 30 seconds on non-prod target groups so environments stop promptly at 19:00 instead of draining for 5 minutes.
  • Stateful services. RDS and ElastiCache run independently — they're not stopped by this config. Data persists across task restarts. EFS mounts reattach on task start.
  • Timezone offset. EventBridge cron uses UTC. Mon–Fri 09:00–19:00 ET is 13:00–23:00 UTC. Adjust the cron expressions for your team's timezone.
  • Override capability. The scheduled action sets desired count — any engineer can manually set it back to 1 for an after-hours session. The schedule resumes as normal the next morning.

At 10+ environments, this math becomes unavoidable

One staging environment running 24/7 is an annoyance. Ten of them is a line item that starts appearing in board decks. The fix doesn't scale manually.

Manual scheduling via the AWS console or one-off Terraform blocks works at 1–2 environments. At 10+, the operational overhead compounds:

  • —Schedule drift — different engineers set different start/stop times, no one audits
  • —Environment-specific hours — the ML team needs their env at 6am, QA needs theirs until 9pm
  • —On-demand overrides — “can you keep staging up tonight, we have a client demo” — sent in Slack, forgotten in Terraform
  • —New environments inherit no schedule by default — the next dev environment someone spins up runs 24/7 until someone notices

This is where fleet-level tooling pays for itself. Fortem manages scheduling across all non-prod environments from one interface — with override capability per environment, audit log of who changed what, and defaults that apply to new environments automatically.

See which environments in your fleet are burning budget right now.

Talk to us about your fleet

Questions this article doesn't answer

How do I actually see which environment is costing what in AWS?+

Enable cost allocation tags for your environment key in the AWS Billing console, then use Cost Explorer with a Group by filter on that tag. You'll see per-environment spend broken out as individual rows. Our article on per-environment cost visibility walks through the exact steps.

Can I automatically stop ECS environments when there's no active deployment or open PR?+

Not with native ECS scheduling alone — you'd need to wire EventBridge to your CI/CD events. A GitHub Actions workflow can call the ECS UpdateService API to set desired count to 0 when a PR is closed and back to 1 when a new deployment completes. Some teams add this to their deploy pipeline directly.

What's the difference between desired count = 0 and deleting the ECS service entirely?+

Setting desired count to 0 stops all running tasks but preserves the service definition, IAM roles, capacity provider strategies, and auto-scaling rules. The service restarts exactly as configured. Deleting the service removes all of this and you'd need to recreate it from Terraform. For scheduling, use desired count = 0 — not service deletion.

Does stopping and restarting ECS tasks affect RDS or other stateful services?+

RDS, ElastiCache, and other stateful services run independently of ECS task count. Stopping tasks at 19:00 has no effect on your database — it continues running (and billing) until you separately stop it. Data persists across task restarts. EFS volumes reattach automatically when tasks start again.

Common questions

Is Fargate Spot available for ECS services or only tasks?

Fargate Spot is available for ECS services through capacity provider strategies. You set FARGATE_SPOT as a capacity provider with a weight in your ECS service definition. Tasks get scheduled on Spot capacity when available. If AWS needs the capacity back, tasks receive a SIGTERM with a 2-minute warning before SIGKILL.

Does setting ECS desired count to 0 stop billing immediately?

Yes. When desired count reaches 0 and running tasks drain and stop, Fargate billing stops within seconds — Fargate charges by the second with no minimum. However, other resources associated with the environment (ALB if dedicated, CloudWatch Log Groups, RDS) continue to incur charges independently.

How do I set up a schedule to stop ECS services on nights and weekends?

Use ECS Application Auto Scaling scheduled actions — no Lambda required. Create a scalable target for each ECS service, then add two scheduled actions: one to set desired count to 0 at your stop time and one to restore it in the morning. EventBridge cron expressions handle the schedule. Terraform example is included in this article.

Will reducing non-prod ECS task size break anything?

It depends on what the task does. For services that only handle QA traffic or automated tests, dropping from 1 vCPU to 0.5 vCPU rarely causes issues. The risk is for tasks that run build pipelines, data migrations, or integration tests under time constraints — those may fail or time out. Right-size based on actual observed CPU and memory utilization, not on what production uses.

How does Fargate Spot handle interruptions in ECS?

AWS sends a SIGTERM to the task 2 minutes before reclaiming capacity, then sends SIGKILL. ECS marks the task as stopped with reason SPOT_INTERRUPTION. If the ECS service has a desired count greater than 0, it will launch a replacement task — on Spot if available, falling back to on-demand if not (depending on your capacity provider strategy weights).


See your real per-env cost: fortem.dev/ecs-cost-calculator

Top comments (0)