How Do You Stop Paying for Idle ECS Environments?
Originally published at https://fortem.dev/blog/ecs-environment-scheduling
Stop paying for ECS dev and staging compute when nobody's using it. Every scheduling approach — AWS-native options, trade-offs, and what teams at fleet scale actually do.
Your dev and staging ECS environments run 168 hours a week. Your team works 40. The other 128 hours are pure waste. This guide covers every approach to scheduling ECS environments — from AWS-native options you can set up today to what actually works when you're managing 20+ environments across multiple accounts.
The math: what you're actually paying
A typical dev environment on ECS Fargate — 8 services, 0.5 vCPU and 1GB memory each — costs around $144/monthrunning 24/7. That's $1,728/year for one environment that your developers use 50 hours a week at most.
ScheduleHours/weekHours/monthMonthly cost
24/7 (current)168730$144
Mon–Fri 9am–7pm50217$43
Mon–Fri 8am–8pm60260$51
Mon–Sun 8am–10pm98425$84
Switching one 8-service environment from 24/7 to business hours saves $101/month. At 10 environments that's $1,010/month — $12,120/year — without changing a single line of application code.
How ECS scheduling works
ECS doesn't have a native "scheduled environment" concept. What you're actually doing is setting the desired count of each ECS service to 0 on a schedule (stop) and back to its normal value on another schedule (start).
When desired count hits 0, ECS drains existing tasks and stops billing for vCPU and memory. Your service definition, load balancer, security groups, and networking remain intact. The environment is "off" — not deleted. Starting it is setting desired count back to 1 (or whatever your normal value is).
Key principle — You pay for running tasks, not for service definitions. Desired count = 0 means no tasks running means no Fargate billing. The service configuration costs nothing — only the compute does.
Ready to use — copy this today
This EventBridge + Lambda setup stops and starts an ECS service on a schedule. Replace the cluster name, service name, and region — it works today with zero additional tools.
pythonCopy
import boto3, os
ecs = boto3.client("ecs")
CLUSTER = os.environ["CLUSTER_NAME"]
SERVICE = os.environ["SERVICE_NAME"]
def set_desired_count(count: int):
ecs.update_service(
cluster=CLUSTER,
service=SERVICE,
desiredCount=count,
)
print(f"Set {SERVICE} desired count to {count}")
def handler(event, context):
action = event.get("action", "stop") # "stop" or "start"
count = 0 if action == "stop" else int(os.environ.get("NORMAL_COUNT", "2"))
set_desired_count(count)
Deploy this as a Lambda, then create two EventBridge Scheduler rules — one that invokes it with { "action": "stop" } on weekdays at 7PM, another with { "action": "start" } at 9AM Mon–Fri. Total cost: zero beyond the Lambda invocations.
Option 1: Application Auto Scaling scheduled actions
Best for: 1–3 environments, simple schedules
Application Auto Scaling supports scheduled scaling actions on ECS services. You define a cron expression and a min/max/desired capacity. AWS handles the rest — no Lambda, no EventBridge rules to manage.
Register your ECS service as a scalable target, then create two scheduled actions — one to stop (desired = 0) and one to start (desired = your normal count):
# Register the service as a scalable target
aws application-autoscaling register-scalable-target \
--service-namespace ecs \
--resource-id service/my-cluster/my-service \
--scalable-dimension ecs:service:DesiredCount \
--min-capacity 0 \
--max-capacity 3
# Stop at 7pm UTC (Mon–Fri)
aws application-autoscaling put-scheduled-action \
--service-namespace ecs \
--resource-id service/my-cluster/my-service \
--scalable-dimension ecs:service:DesiredCount \
--scheduled-action-name stop-evenings \
--schedule "cron(0 19 ? * MON-FRI *)" \
--scalable-target-action MinCapacity=0,MaxCapacity=0
# Start at 8am UTC (Mon–Fri)
aws application-autoscaling put-scheduled-action \
--service-namespace ecs \
--resource-id service/my-cluster/my-service \
--scalable-dimension ecs:service:DesiredCount \
--scheduled-action-name start-mornings \
--schedule "cron(0 8 ? * MON-FRI *)" \
--scalable-target-action MinCapacity=1,MaxCapacity=3
Limitations
- • One command per service — 8 services × 2 actions = 16 CLI calls per environment
- • No concept of "environment" — you schedule individual services
- • Schedule changes require updating each service individually
- • No visibility into scheduled state across services or environments
Option 2: EventBridge Scheduler + Lambda
Best for: multiple environments, custom logic, per-timezone schedules
EventBridge Scheduler triggers a Lambda function on a cron schedule. The Lambda iterates over all services in an environment (identified by a tag) and sets their desired count. This is the most flexible AWS-native approach — you can handle timezones, environment grouping, and custom logic.
The Lambda function itself is straightforward — iterate over tagged services and update desired count:
pythonCopy
import boto3
ecs = boto3.client('ecs')
def handler(event, context):
desired_count = event['desired_count'] # 0 to stop, 1 to start
cluster = event['cluster']
env_tag = event['environment'] # e.g. "staging"
# List all services in the cluster
paginator = ecs.get_paginator('list_services')
for page in paginator.paginate(cluster=cluster):
for arn in page['serviceArns']:
# Describe to get tags
svc = ecs.describe_services(
cluster=cluster,
services=[arn],
include=['TAGS']
)['services'][0]
tags = {t['key']: t['value'] for t in svc.get('tags', [])}
if tags.get('Environment') == env_tag:
current = svc['desiredCount']
if desired_count == 0:
# Store current count before stopping
ecs.tag_resource(
resourceArn=arn,
tags=[{'key': 'ScheduledDesiredCount',
'value': str(current)}]
)
ecs.update_service(
cluster=cluster,
service=arn,
desiredCount=0
)
else:
# Restore previous count
restore = int(tags.get('ScheduledDesiredCount', '1'))
ecs.update_service(
cluster=cluster,
service=arn,
desiredCount=restore
)
Then create two EventBridge Scheduler rules — one for stop, one for start — each passing the appropriate desired_count in the input.
What this doesn't solve
- • No UI — schedule changes require code or CLI changes
- • Per-timezone logic gets complex fast (US-east vs EU-west teams)
- • Error handling and alerting on failed starts is your problem
- • At 10+ environments, you're maintaining a scheduling system, not using one
Option 3: Terraform-managed schedules
Best for: teams with strong Terraform discipline and few environments
You can manage scheduled scaling actions directly in Terraform using the aws_appautoscaling_scheduled_action resource. This keeps scheduling configuration version-controlled alongside your infrastructure.
resource "aws_appautoscaling_target" "ecs_target" {
service_namespace = "ecs"
resource_id = "service/${var.cluster_name}/${var.service_name}"
scalable_dimension = "ecs:service:DesiredCount"
min_capacity = 0
max_capacity = var.max_capacity
}
resource "aws_appautoscaling_scheduled_action" "stop" {
name = "${var.service_name}-stop"
service_namespace = "ecs"
resource_id = aws_appautoscaling_target.ecs_target.resource_id
scalable_dimension = aws_appautoscaling_target.ecs_target.scalable_dimension
schedule = "cron(0 19 ? * MON-FRI *)"
scalable_target_action {
min_capacity = 0
max_capacity = 0
}
}
resource "aws_appautoscaling_scheduled_action" "start" {
name = "${var.service_name}-start"
service_namespace = "ecs"
resource_id = aws_appautoscaling_target.ecs_target.resource_id
scalable_dimension = aws_appautoscaling_target.ecs_target.scalable_dimension
schedule = "cron(0 8 ? * MON-FRI *)"
scalable_target_action {
min_capacity = var.desired_count
max_capacity = var.max_capacity
}
}
Clean and auditable — but it still operates at the service level. Changing a schedule for an environment with 8 services means updating 8 Terraform resources and running apply. For teams where schedules change rarely, this is fine. For teams where developers want to adjust their own environment hours, it becomes a bottleneck.
What breaks at fleet scale
Every approach above works at 1–3 environments. Here's what teams discover when they try to scale it to 15–50 environments across multiple AWS accounts:
✗
Per-service configuration doesn't scale
At 20 environments × 8 services, you have 160 individual Auto Scaling targets to manage. A schedule change for one environment touches 8 resources. A timezone change for one team requires finding and updating those 8 resources across potentially multiple accounts.
✗
No environment-level visibility
None of the AWS-native approaches give you a view of 'which environments are running, which are scheduled, and what their current cost is.' You're looking at individual services in CloudWatch and Cost Explorer, not environments as units.
✗
Timezone complexity multiplies
EU teams want environments to stop at 18:00 CET. US East teams want 19:00 EST. US West teams want 19:00 PST. Each requires separate cron expressions — and those expressions need to account for DST. A single Lambda managing this across 20 environments becomes a meaningful maintenance burden.
✗
Developer self-service breaks down
Developers want to override their environment schedule occasionally — stay late on a sprint, work a weekend. In every AWS-native approach, that override requires console access or a platform engineer intervention. The friction is high enough that teams just leave environments running 24/7 to avoid the hassle.
✗
Failed starts are silent
If an ECS service fails to start after a scheduled start (image pull error, IAM issue, resource limits), the EventBridge rule fires, Lambda runs, desired count updates — but nobody knows the environment didn't come up. You need separate health checking and alerting to catch this.
The pattern we see
Teams start with EventBridge + Lambda at 3 environments. By 10 environments they're spending 2–4 hours a month maintaining the scheduling system. By 20 environments they've either given up and gone back to 24/7, or a platform engineer owns a growing codebase that does nothing except stop and start ECS services on a schedule.
What to track
Regardless of which approach you use, these are the metrics worth monitoring:
Baseline vs. actual spend per environment
Tag all ECS services with Environment and use Cost Explorer with resource-level tags. Baseline = what you'd pay at 24/7. Actual = what you paid. The delta is your scheduling savings.
Schedule adherence
CloudWatch metric: ECS service DesiredCount. If an environment should be at 0 from 19:00–08:00 but DesiredCount is 1, your schedule isn't firing. Set an alarm on non-zero DesiredCount during expected off-hours.
Start latency
Time from scheduled start to all services healthy. ECS RunningTaskCount = DesiredCount AND target group healthy host count = DesiredCount. Anything over 3 minutes warrants investigation.
Failed starts
ECS StoppedTaskCount increasing after a scheduled start usually means image pull errors or resource exhaustion. CloudWatch alarm on StoppedTaskCount > 0 for environments in scheduled-start window.
See your scheduling savings: fortem.dev/ecs-cost-calculator
Top comments (0)