Many organizations running Amazon ECS services on EC2 experience predictable traffic patterns — for example, higher demand during business hours and lower utilization during off-hours. Efficient scaling in such environments requires not only the ability to handle load spikes proactively but also to scale down during idle periods to optimize costs.
This post describes an approach to implementing predictive scaling for ECS services using custom CloudWatch metrics and custom termination policies. It also addresses several implementation challenges and outlines solutions that ensure scaling decisions remain accurate, proactive, and non-disruptive.
Overview of Predictive Scaling
Unlike target tracking or step scaling policies that react to metric thresholds, predictive scaling in AWS Auto Scaling analyzes historical data to forecast future capacity requirements. This allows ECS clusters to anticipate incoming traffic and scale up resources before the load actually increases.
However, implementing predictive scaling for ECS services with EC2 launch type introduces a number of challenges:
- Default termination policies do not account for the number of ECS tasks running on each instance.
- Blue/green deployments fragment metric history for scaling.
- Publishing aggregated custom metrics can result in data gaps.
- Predictive scaling only supports proactive scale-up actions and cannot directly scale down tasks.
The following sections describe each challenge and its corresponding solution.
Challenge 1: ECS Termination Policies Do Not Consider Running Tasks
When the desired task count for an ECS service decreases, and the ECS cluster has more EC2 instances than required, the Auto Scaling Group (ASG) determines which instance to terminate based on its termination policy.
AWS’s default termination policies prioritize instance age, availability zone balance, or instance lifecycle state, but they do not consider the number of ECS tasks running on an instance. As a result, an instance hosting multiple active tasks may be terminated while another instance remains mostly idle. This behavior can lead to application downtime or uneven task distribution.
Solution: Custom Termination Policy Lambda
A custom AWS Lambda function can be used as a termination policy for the ASG.
The function is invoked whenever a scale-in event is initiated. It performs the following operations:
- Retrieves the list of EC2 instances registered to the ASG or ECS capacity provider.
- Uses the ECS API to obtain the number of running tasks per instance.
- Identifies the instance with the fewest running tasks.
- Returns the corresponding instance ID to the ASG for termination.
The Lambda function must respond within 2 seconds. Through optimization, the execution time can be consistently reduced to under 1 second, allowing the ASG to complete scale-in operations efficiently and without service disruption.
Challenge 2: Using Predictive Scaling with Blue/Green Deployments
Predictive scaling policies support only the following predefined metrics:
AverageCPUUtilization
AverageMemoryUtilization
ALBRequestCountPerTarget
For most ECS workloads, ALBRequestCountPerTarget provides the most accurate representation of traffic. However, when ECS services use blue/green deployments, each service maintains two target groups, with only one active at a time.
This introduces two problems:
Metric fragmentation — the request count metric changes target group ARN when a deployment switch occurs, breaking the continuity of metric data.
Scaling policy maintenance overhead — each deployment requires manual updates to associate the predictive scaling policy with the active target group.
Solution: Combined Request Count Metric
To address these issues, a custom metric named CombinedRequestCount is published in CloudWatch.
Using the AWS SDK, request counts from both target groups (blue and green) are aggregated into a single metric namespace. This enables predictive scaling to reference a consistent metric, preserving historical data across deployment switches.
Benefits include:
- Continuous metric history for accurate forecasting.
- Elimination of scaling policy reconfiguration between deployments.
- Simplified metric management through automation.
Challenge 3: Publishing Custom Metrics Without Data Gaps
When publishing custom metrics such as CombinedRequestCount through a Lambda function, data gaps can occur due to the timing of CloudWatch metric availability.
For example, fetching request counts at half-minute intervals can sometimes return zero values if the metrics are not yet aggregated by CloudWatch at the time of retrieval. This can mislead the predictive scaling model and reduce forecast accuracy.
Solution: Delayed Metric Collection
Introducing a one-minute delay in the metric publishing Lambda ensures that CloudWatch has finalized data aggregation before the values are retrieved.
The Lambda execution sequence is as follows:
- Wait one minute before collecting data for the previous period.
- Retrieve request counts from both target groups.
- Aggregate and publish the results as CombinedRequestCount.
This approach ensures the continuity and reliability of metric data, which is critical for the predictive scaling algorithm.
Challenge 4: Predictive Scaling Does Not Scale Down
While predictive scaling effectively forecasts and provisions additional capacity ahead of demand spikes, it does not support scaling down tasks. Predictive scaling only triggers scale-up actions, relying on historical growth trends rather than current utilization levels.
To achieve optimal resource usage and prevent over-provisioning, predictive scaling must be combined with a reactive scaling policy that can handle downscale events.
Solution: Hybrid Scaling with Target Tracking
A hybrid scaling strategy can be implemented by combining predictive scaling with ECS target tracking.
In this configuration:
- Predictive scaling proactively increases task counts before load spikes.
- Target tracking reduces task counts when utilization drops.
A second custom metric, RequestPerTaskUtilization, is introduced to support this model.
This metric calculates how many requests each ECS task is handling relative to a defined target threshold. The resulting utilization percentage is then published to CloudWatch and used as the input metric for the ECS target tracking policy.
For example:
RequestPerTaskUtilization = (TotalRequests / NumberOfTasks) / TargetRequestsPerTask * 100
With this configuration:
- 1. Predictive scaling ensures readiness for anticipated traffic increases.
- 3. Target tracking ensures cost efficiency during low-demand periods.
This dual-policy approach provides proactive and reactive scaling behaviors, delivering both performance stability and resource optimization.
Implementation
Architecture Diagram
Component Overview
| Component | Description | Purpose |
|---|---|---|
| ECS Cluster (EC2 Launch Type) | Runs containerized services across a fleet of EC2 instances managed by an Auto Scaling Group. | Hosts the workloads that require dynamic scaling. |
| Auto Scaling Group (ASG) | Manages EC2 instances for the ECS cluster. | Adjusts instance count based on predictive and target tracking policies. |
| Custom Termination Lambda | Invoked via an ASG lifecycle hook during scale-in events. | Determines which instance to terminate based on ECS task load. |
| Application Load Balancer (ALB) | Routes incoming traffic to ECS services. | Provides the RequestCountPerTarget metric used for scaling logic. |
| CombinedRequestCount Metric | Custom CloudWatch metric that aggregates requests across blue/green target groups. | Provides a consistent scaling metric unaffected by deployment switches. |
| RequestPerTaskUtilization Metric | Custom CloudWatch metric derived from CombinedRequestCount and ECS task count. | Provides utilization percentage for target tracking policy. |
| Predictive Scaling Policy | Uses historical CombinedRequestCount data. |
Proactively scales up ECS tasks before forecasted traffic increases. |
| Target Tracking Scaling Policy | Uses RequestPerTaskUtilization. |
Reactively scales down tasks during low load periods. |
Implementation Steps
Below are the main implementation steps with key AWS CLI commands and Lambda code references.
1. Create the Custom CombinedRequestCount Metric
This metric aggregates request counts from both target groups of your blue/green ECS service.
AWS CLI Example
aws cloudwatch put-metric-data \
--namespace "ECS/CustomMetrics" \
--metric-name "CombinedRequestCount" \
--value 1500 \
--unit "Count"
In production, this command is automated by a Lambda function that:
- Uses
get-metric-statisticsto fetchRequestCountfor each target group. - Sums the counts.
- Publishes the total as
CombinedRequestCount.
Lambda Pseudocode Example
import boto3, datetime
cw = boto3.client('cloudwatch')
def lambda_handler(event, context):
alb_metrics = boto3.client('cloudwatch')
now = datetime.datetime.utcnow()
start = now - datetime.timedelta(minutes=2)
end = now - datetime.timedelta(minutes=1)
target_groups = ["arn:aws:elasticloadbalancing:...:tg/blue",
"arn:aws:elasticloadbalancing:...:tg/green"]
total_requests = 0
for tg in target_groups:
stats = alb_metrics.get_metric_statistics(
Namespace='AWS/ApplicationELB',
MetricName='RequestCount',
Dimensions=[{'Name': 'TargetGroup', 'Value': tg}],
StartTime=start,
EndTime=end,
Period=60,
Statistics=['Sum']
)
if stats['Datapoints']:
total_requests += stats['Datapoints'][0]['Sum']
cw.put_metric_data(
Namespace='ECS/CustomMetrics',
MetricData=[{
'MetricName': 'CombinedRequestCount',
'Value': total_requests,
'Unit': 'Count'
}]
)
2. Create the RequestPerTaskUtilization Metric
This metric calculates the number of requests handled per ECS task and expresses it as a utilization percentage.
AWS CLI Example
aws cloudwatch put-metric-data \
--namespace "ECS/CustomMetrics" \
--metric-name "RequestPerTaskUtilization" \
--value 72.5 \
--unit "Percent"
Lambda Logic Pseudocode (Derived from Combined Metric)
import boto3
ecs = boto3.client('ecs')
cw = boto3.client('cloudwatch')
def lambda_handler(event, context):
cluster = "your-ecs-cluster"
service = "your-ecs-service"
response = ecs.describe_services(cluster=cluster, services=[service])
task_count = response['services'][0]['runningCount']
# Example: use CombinedRequestCount from previous step
combined_requests = 1500
target_requests_per_task = 1000
utilization = (combined_requests / task_count) / target_requests_per_task * 100
cw.put_metric_data(
Namespace='ECS/CustomMetrics',
MetricData=[{
'MetricName': 'RequestPerTaskUtilization',
'Value': utilization,
'Unit': 'Percent'
}]
)
3. Attach Scaling Policies
- Predictive Scaling (Passive)
aws autoscaling put-scaling-policy \
--policy-name "PredictiveScalingPolicy" \
--auto-scaling-group-name "ecs-asg" \
--policy-type "PredictiveScaling" \
--predictive-scaling-configuration '{
"MetricSpecifications": [{
"TargetValue": 1000.0,
"CustomizedCapacityMetricSpecification": {
"MetricDataQueries": [{
"Id": "combinedRequestMetric",
"MetricStat": {
"Metric": {
"Namespace": "ECS/CustomMetrics",
"MetricName": "CombinedRequestCount"
},
"Stat": "Sum"
}
}]
}
}]
}'
- Target Tracking (Reactive)
aws application-autoscaling put-scaling-policy \
--policy-name "ECS-TargetTracking" \
--service-namespace ecs \
--resource-id service/your-cluster/your-service \
--scalable-dimension ecs:service:DesiredCount \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration '{
"TargetValue": 100.0,
"CustomizedMetricSpecification": {
"MetricName": "RequestPerTaskUtilization",
"Namespace": "ECS/CustomMetrics",
"Statistic": "Average"
},
"ScaleOutCooldown": 60,
"ScaleInCooldown": 300
}'
4. Attach Custom Termination Policy Lambda
Create a Lambda function and register it as a custom termination policy for the Auto Scaling Group (ASG).
This function ensures that the instance with the fewest running ECS tasks is terminated first, minimizing disruption during scale-ins.
Lambda Pseudocode
import boto3, heapq, re
asg = boto3.client('autoscaling')
ecs = boto3.client('ecs')
def lambda_handler(event, context):
asg_name = event.get("AutoScalingGroupName")
capacity_to_terminate = event.get("CapacityToTerminate", [])
total_capacity = sum(cap["Capacity"] for cap in capacity_to_terminate)
# Derive ECS cluster name from ASG naming convention
env_match = re.search(r"^(external|internal)-([^-]+)-", asg_name)
cluster_type, env = env_match.groups()
cluster_name = f"{cluster_type}-{env}-ecs-cluster"
# Describe ASG and ECS instances
instances = asg.describe_auto_scaling_groups(AutoScalingGroupNames=[asg_name])['AutoScalingGroups'][0]['Instances']
eligible = [i for i in instances if i["LifecycleState"] == "InService" and not i.get("ProtectedFromScaleIn", False)]
container_arns = ecs.list_container_instances(cluster=cluster_name)['containerInstanceArns']
details = ecs.describe_container_instances(cluster=cluster_name, containerInstances=container_arns)['containerInstances']
task_counts = {d['ec2InstanceId']: d['runningTasksCount'] for d in details}
# Pick instances with the fewest running tasks
selected = heapq.nsmallest(total_capacity, eligible, key=lambda x: task_counts.get(x["InstanceId"], 0))
return {"InstanceIDs": [i["InstanceId"] for i in selected]}
- Register as ASG Termination Policy
aws autoscaling update-auto-scaling-group \
--auto-scaling-group-name ecs-asg \
--termination-policies "arn:aws:lambda:us-east-1:123456789012:function:CustomTerminationPolicyLambda"
This configuration ensures:
- The ASG invokes the Lambda on scale-in events.
- The instance(s) with the fewest ECS tasks are selected for termination.
- Scale-ins occur smoothly without disrupting active workloads.

Top comments (0)