Matt

Posted on Jun 4 • Edited on Jun 30 • Originally published at fortem.dev

ECS Environment Scheduling: The Complete Guide

#aws #ecs #fargate #scheduling

How Do You Stop Paying for Idle ECS Environments?

Originally published at https://fortem.dev/blog/ecs-environment-scheduling
Stop paying for idle ECS dev and staging compute. Every scheduling approach — AWS-native options, trade-offs, and what teams at fleet scale actually do.

Your dev and staging ECS environments run 168 hours a week. Your team works 40. The other 128 hours are pure waste. Every approach to scheduling ECS environments is covered here — from AWS-native options you can set up today to what works when you're managing 20+ environments across multiple accounts.

TL;DR

Setting ECS service desired count to 0 stops Fargate billing immediately — no tasks running, no charge.
A typical 8-service dev environment costs $144/month at 24/7; scheduling it to business hours drops that to $43/month — $101/month saved per environment.
AWS-native options (Application Auto Scaling, EventBridge + Lambda, Terraform) work at 1–5 environments; at 10+ you're maintaining a scheduling system rather than using one.
Fleet-scale problems — per-timezone cron complexity, no developer override mechanism, and silent failed starts — don't surface until you're past 10 environments.

The math: what you're actually paying

“Non-prod environments run 168 hours a week. Your team works ~55. Scheduling environments offline outside business hours cuts compute cost by 60–70%.”

— Fortem fleet data, 100+ ECS environments

A dev environment on ECS Fargate costs $144/month at 24/7; with ECS Fargate environment scheduling to business hours it drops to $43/month — a $101/month saving per environment, or $12,120/year across 10 environments. That's $1,728/year for one environment that your developers use 50 hours a week at most.

ScheduleHours/weekHours/monthMonthly cost

24/7 (current)168730$144

Mon–Fri 9am–7pm50217$43

Mon–Fri 8am–8pm60260$51

Mon–Sun 8am–10pm98425$84

"AWS Fargate charges $0.04048 per vCPU-hour and $0.004445 per GB-hour for Linux/X86 in US East (N. Virginia), billed per-second with a one-minute minimum. Setting desired count to zero stops vCPU and memory charges immediately."

— AWS Fargate Pricing, verified June 2026

Switching one 8-service environment from 24/7 to business hours saves $101/month. At 10 environments that's $1,010/month — $12,120/year — without changing a single line of application code. If you're wondering why staging environments cost so much in the first place — shared infra math, CloudWatch Logs, and the visibility problem — that breakdown is in a separate guide.

KEY INSIGHT: Switching one 8-service ECS Fargate environment from 24/7 to business hours saves $101/month — $1,010/month across 10 environments. That's a 70% compute cost reduction without changing a single line of application code.

How ECS scheduling works

ECS environment scheduling works by setting service desired count to 0 on a stop schedule and restoring it on a start schedule; on Fargate, billing stops the moment desired count reaches zero. Desired count = N → tasks start from the capacity pool. EventBridge Scheduler triggers the change. On EC2 with ASG: also scale instances to 0 to stop instance billing.

ECS doesn't have a native "scheduled environment" concept. The mechanism is setting the desired count of each ECS service to 0 on a schedule (stop) and back to its normal value on another schedule (start).

When desired count hits 0, ECS drains existing tasks and stops billing for vCPU and memory. Your service definition, load balancer, security groups, and networking remain intact. The environment is "off" — not deleted. Starting it is setting desired count back to 1 (or whatever your normal value is).

Key principle — You pay for running tasks, not for service definitions. Desired count = 0 means no tasks running means no Fargate billing. The service configuration costs nothing — only the compute does.

Ready to use — copy this today

This EventBridge + Lambda setup stops and starts an ECS service on a schedule. Replace the cluster name, service name, and region — it works today with zero additional tools.

pythonCopy

import boto3, os

ecs = boto3.client("ecs")
CLUSTER = os.environ["CLUSTER_NAME"]
SERVICE = os.environ["SERVICE_NAME"]

def set_desired_count(count: int):
    ecs.update_service(
        cluster=CLUSTER,
        service=SERVICE,
        desiredCount=count,
    )
    print(f"Set {SERVICE} desired count to {count}")

def handler(event, context):
    action = event.get("action", "stop")  # "stop" or "start"
    count = 0 if action == "stop" else int(os.environ.get("NORMAL_COUNT", "2"))
    set_desired_count(count)

Deploy this as a Lambda, then create two EventBridge Scheduler rules — one that invokes it with { "action": "stop" } on weekdays at 7PM, another with { "action": "start" } at 9AM Mon–Fri. Total cost: zero beyond the Lambda invocations.

Option 1: Application Auto Scaling scheduled actions

Application Auto Scaling lets you define scheduled min/max/desired capacity per ECS service using cron expressions — no Lambda required, but it operates per service, not per environment. Works at 3-5 services. At 10+ environments × 8 services: 160 actions to manage. No developer override mechanism — if someone works late, they need platform engineer access.

Best for: 1–3 environments, simple schedules

Application Auto Scaling supports scheduled scaling actions on ECS services. You define a cron expression and a min/max/desired capacity. AWS handles the rest — no Lambda, no EventBridge rules to manage. This is the calendar side of Application Auto Scaling; for scaling to live traffic instead of a clock, see ECS Fargate autoscaling with target tracking and step scaling.

Register your ECS service as a scalable target, then create two scheduled actions — one to stop (desired = 0) and one to start (desired = your normal count):

# Register the service as a scalable target
aws application-autoscaling register-scalable-target \
  --service-namespace ecs \
  --resource-id service/my-cluster/my-service \
  --scalable-dimension ecs:service:DesiredCount \
  --min-capacity 0 \
  --max-capacity 3

# Stop at 7pm UTC (Mon–Fri)
aws application-autoscaling put-scheduled-action \
  --service-namespace ecs \
  --resource-id service/my-cluster/my-service \
  --scalable-dimension ecs:service:DesiredCount \
  --scheduled-action-name stop-evenings \
  --schedule "cron(0 19 ? * MON-FRI *)" \
  --scalable-target-action MinCapacity=0,MaxCapacity=0

# Start at 8am UTC (Mon–Fri)
aws application-autoscaling put-scheduled-action \
  --service-namespace ecs \
  --resource-id service/my-cluster/my-service \
  --scalable-dimension ecs:service:DesiredCount \
  --scheduled-action-name start-mornings \
  --schedule "cron(0 8 ? * MON-FRI *)" \
  --scalable-target-action MinCapacity=1,MaxCapacity=3

Limitations

• One command per service — 8 services × 2 actions = 16 CLI calls per environment
• No concept of "environment" — you schedule individual services
• Schedule changes require updating each service individually
• No visibility into scheduled state across services or environments

Option 2: EventBridge Scheduler + Lambda

EventBridge Scheduler triggers a Lambda that sets ECS service desired count to 0 or its normal value on a cron — flexible, multi-environment, but hard to maintain past 10 environments. Cron expressions handle timezones. Still doesn't scale past 10 environments — maintaining the Lambda + cron matrix becomes a part-time job.

Best for: multiple environments, custom logic, per-timezone schedules

EventBridge Scheduler triggers a Lambda function on a cron schedule. The Lambda iterates over all services in an environment (identified by a tag) and sets their desired count. This is the most flexible AWS-native approach — you can handle timezones, environment grouping, and custom logic.

The Lambda function itself is straightforward — iterate over tagged services and update desired count:

pythonCopy

import boto3

ecs = boto3.client('ecs')

def handler(event, context):
    desired_count = event['desired_count']  # 0 to stop, 1 to start
    cluster = event['cluster']
    env_tag = event['environment']          # e.g. "staging"

    # List all services in the cluster
    paginator = ecs.get_paginator('list_services')
    for page in paginator.paginate(cluster=cluster):
        for arn in page['serviceArns']:
            # Describe to get tags
            svc = ecs.describe_services(
                cluster=cluster,
                services=[arn],
                include=['TAGS']
            )['services'][0]

            tags = {t['key']: t['value'] for t in svc.get('tags', [])}

            if tags.get('Environment') == env_tag:
                current = svc['desiredCount']
                if desired_count == 0:
                    # Store current count before stopping
                    ecs.tag_resource(
                        resourceArn=arn,
                        tags=[{'key': 'ScheduledDesiredCount',
                               'value': str(current)}]
                    )
                    ecs.update_service(
                        cluster=cluster,
                        service=arn,
                        desiredCount=0
                    )
                else:
                    # Restore previous count
                    restore = int(tags.get('ScheduledDesiredCount', '1'))
                    ecs.update_service(
                        cluster=cluster,
                        service=arn,
                        desiredCount=restore
                    )

Then create two EventBridge Scheduler rules — one for stop, one for start — each passing the appropriate desired_count in the input.

What this doesn't solve

• No UI — schedule changes require code or CLI changes
• Per-timezone logic gets complex fast (US-east vs EU-west teams)
• Error handling and alerting on failed starts is your problem
• At 10+ environments, you're maintaining a scheduling system, not using one

Option 3: Terraform-managed schedules

Terraform manages ECS scheduled Auto Scaling actions as code, keeping schedules version-controlled and auditable, but developer runtime overrides still require a separate mechanism. Consistent, version-controlled. But Terraform doesn't handle runtime overrides ('keep this env up tonight') — that requires an external mechanism.

Best for: teams with strong Terraform discipline and few environments

You can manage scheduled scaling actions directly in Terraform using the aws_appautoscaling_scheduled_action resource. This keeps scheduling configuration version-controlled alongside your infrastructure — a pattern covered in detail in the ECS Fargate Terraform guide.

resource "aws_appautoscaling_target" "ecs_target" {
  service_namespace  = "ecs"
  resource_id        = "service/${var.cluster_name}/${var.service_name}"
  scalable_dimension = "ecs:service:DesiredCount"
  min_capacity       = 0
  max_capacity       = var.max_capacity
}

resource "aws_appautoscaling_scheduled_action" "stop" {
  name               = "${var.service_name}-stop"
  service_namespace  = "ecs"
  resource_id        = aws_appautoscaling_target.ecs_target.resource_id
  scalable_dimension = aws_appautoscaling_target.ecs_target.scalable_dimension
  schedule           = "cron(0 19 ? * MON-FRI *)"

  scalable_target_action {
    min_capacity = 0
    max_capacity = 0
  }
}

resource "aws_appautoscaling_scheduled_action" "start" {
  name               = "${var.service_name}-start"
  service_namespace  = "ecs"
  resource_id        = aws_appautoscaling_target.ecs_target.resource_id
  scalable_dimension = aws_appautoscaling_target.ecs_target.scalable_dimension
  schedule           = "cron(0 8 ? * MON-FRI *)"

  scalable_target_action {
    min_capacity = var.desired_count
    max_capacity = var.max_capacity
  }
}

Clean and auditable — but it still operates at the service level. Changing a schedule for an environment with 8 services means updating 8 Terraform resources and running apply. For teams where schedules change rarely, this is fine. For teams where developers want to adjust their own environment hours, it becomes a bottleneck.

What breaks at fleet scale

At fleet scale, AWS-native ECS scheduling breaks on three axes: per-timezone cron complexity, no developer override mechanism, and silent failed starts nobody catches until Monday. Three problems: (1) timezone complexity for multi-region teams, (2) no developer override mechanism ('I'm working late, keep my env alive'), (3) silent failed starts — cron fires, Lambda runs, but the environment doesn't come up. Nobody catches it until Monday.

Every approach above works at 1–3 environments. What teams discover when they try to scale it to 15–50 environments across multiple AWS accounts:

✗

Per-service configuration doesn't scale

At 20 environments × 8 services, you have 160 individual Auto Scaling targets to manage. A schedule change for one environment touches 8 resources. A timezone change for one team requires finding and updating those 8 resources across potentially multiple accounts.

✗

No environment-level visibility

None of the AWS-native approaches give you a view of 'which environments are running, which are scheduled, and what their current cost is.' You're looking at individual services in CloudWatch and Cost Explorer, not environments as units.

✗

Timezone complexity multiplies

EU teams want environments to stop at 18:00 CET. US East teams want 19:00 EST. US West teams want 19:00 PST. Each requires separate cron expressions — and those expressions need to account for DST. A single Lambda managing this across 20 environments becomes a meaningful maintenance burden.

✗

Developer self-service breaks down

Developers want to override their environment schedule occasionally — stay late on a sprint, work a weekend. In every AWS-native approach, that override requires console access or a platform engineer intervention. The friction is high enough that teams leave environments running 24/7 to avoid the hassle.

✗

Failed starts are silent

If an ECS service fails to start after a scheduled start (image pull error, IAM issue, resource limits), the EventBridge rule fires, Lambda runs, desired count updates — but nobody knows the environment didn't come up. You need separate health checking and alerting to catch this.

The pattern we see

Teams start with EventBridge + Lambda at 3 environments. By 10 environments they're spending 2–4 hours a month maintaining the scheduling system. By 20 environments they've either given up and gone back to 24/7, or a platform engineer owns a growing codebase that does nothing except stop and start ECS services on a schedule.

What to track

Track four metrics: baseline vs. actual spend per environment, schedule adherence via DesiredCount alarms, start latency to first healthy task, and StoppedTaskCount spikes after scheduled starts. Regardless of which approach you use, these are the metrics worth monitoring:

Baseline vs. actual spend per environment

Tag all ECS services with Environment and use Cost Explorer with resource-level tags. Baseline = what you'd pay at 24/7. Actual = what you paid. The delta is your scheduling savings.

Schedule adherence

CloudWatch metric: ECS service DesiredCount. If an environment should be at 0 from 19:00–08:00 but DesiredCount is 1, your schedule isn't firing. Set an alarm on non-zero DesiredCount during expected off-hours.

Start latency

Time from scheduled start to all services healthy. ECS RunningTaskCount = DesiredCount AND target group healthy host count = DesiredCount. Anything over 3 minutes warrants investigation.

Failed starts

ECS StoppedTaskCount increasing after a scheduled start usually means image pull errors or resource exhaustion. CloudWatch alarm on StoppedTaskCount > 0 for environments in scheduled-start window.

For a complete picture of where ECS spend goes beyond scheduling, the ECS Fargate cost visibility guide covers tagging strategy, Cost Explorer filters, and the metrics that matter most per environment.

See your scheduling savings: fortem.dev/ecs-cost-calculator

DEV Community