DEV Community

Cover image for Practical ECS Configurations in Terraform that make your Life easier

Practical ECS Configurations in Terraform that make your Life easier

Getting started with ECS as a DevOps engineer is a learning journey in itself - a bit like assembling a complex puzzle—every tiny piece needs to click into place for it to run smoothly.

Implementing it in Terraform as Infrastructure as Code adds another layer to the story.

Here are a few key considerations that can save your production deployments and help make your solution more robust and reliable


Propagating Tags Automatically (and Correctly)

Tagging is essential for cost allocation, ownership, and governance, but some ECS resources don’t automatically inherit tags unless you explicitly configure them.

Why is that? When writing Infrastructure as Code, you typically define the Task Definition and the Service. The actual Tasks that run are deployed based on those definitions and inherit most of their configuration from them, but not tags. They appear in your account only when the infrastructure is deployed, which means you need a way to ensure consistent tagging across these ephemeral resources..

To propagate tags correctly in your ECS Service Terraform code, add these two lines:

enable_ecs_managed_tags = true
propagate_tags          = "SERVICE"
Enter fullscreen mode Exit fullscreen mode

Why this matters

  • Ensures tasks inherit tags from the ECS service – no more untagged tasks floating around.
  • Makes cost allocation and resource ownership clear – essential for reporting and chargebacks.
  • Reduces the risk of “untagged” resources appearing in billing or compliance reports.

This is especially important in larger organizations where tagging policies are enforced.

Reference: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-using-tags.html


Preventing ECS from Lowering Desired Count in Production

In production environments, Terraform shouldn’t override operational realities like autoscaling adjustments or manual scaling during incidents. Without proper handling, a terraform apply could unintentionally scale down your ECS service, causing downtime or degraded performance.

You can prevent this by telling Terraform to ignore changes to the desired_count in your ECS Service resource:

lifecycle {
  ignore_changes = [desired_count]
}
Enter fullscreen mode Exit fullscreen mode

Why this matters

  • Prevents Terraform from scaling your service down unintentionally

  • Avoids surprises during terraform apply

  • Works nicely with Application Auto Scaling or manual scaling during incidents


Always Create a Custom CloudWatch Log Group

ECS can automatically create CloudWatch log groups for your tasks, but doing so limits your control over important settings like log retention, naming conventions, and cost management. Defining log groups explicitly in Terraform is a best practice that ensures consistency and predictability.

resource "aws_cloudwatch_log_group" "<project_name>_ecs_log_group" {
  name = "/ecs/<log group name>"
  retention_in_days = 1 # current minimum 
}
Enter fullscreen mode Exit fullscreen mode

Why this matters

  • Controls log retention and avoids infinite log storage
  • Helps with cost optimization
  • Makes log group naming consistent and predictable

Enable ECS Exec for easier Debugging

ECS Exec is a powerful feature that lets you connect directly into a running container from the AWS Console or CLI — no SSH or bastion host required. This is incredibly useful for troubleshooting production issues safely and quickly.

To enable ECS Exec in your service:

enable_execute_command = true
Enter fullscreen mode Exit fullscreen mode

You also need the proper IAM permissions:

resource "aws_iam_policy" "<policy name>" {
  name        = <policy name>
  description = "Give SSM permissions to use ECS exec"

  policy = jsonencode({
    "Version" : "2012-10-17",
    "Statement" : [
      {
        "Effect" : "Allow",
        "Action" : [
          "ssmmessages:CreateControlChannel",
          "ssmmessages:CreateDataChannel",
          "ssmmessages:OpenControlChannel",
          "ssmmessages:OpenDataChannel"
        ],
        "Resource" : "*"
      }
    ]
  })
}
Enter fullscreen mode Exit fullscreen mode

How it works

To troubleshoot inside the container click on the running task and scroll until Containers section.

Choose the application container and on the upper left choose the Connect as show in the image below

After this, the terminal in Console will open up, and suggest to paste the command like shown in the example below:

$ aws ecs execute-command --cluster <cluster_name>
--task <placeholder>
--container <container_name> 
--interactive --command '/bin/sh'
Enter fullscreen mode Exit fullscreen mode

To check the healthCheck configuration navigate to:

Task Definitions → choose your Task Definition → Choose a revision → Scroll down to Containers section → Monitoring and logging

Reference: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-exec.html

Why this matters

  • Secure access without opening SSH
  • Extremely helpful for production debugging

Once you use ECS Exec, it’s hard to go back.


Allow ECS to Temporarily Exceed the Task Limit During Deployments

By default, ECS can be very conservative during deployments. You can speed up rollouts by allowing temporary overprovisioning.

deployment_maximum_percent         = x
deployment_minimum_healthy_percent = y
Enter fullscreen mode Exit fullscreen mode

How ECS Deployment Percentages Work

When ECS deploys a new version of your service, it doesn’t replace all tasks at once. Instead, it gradually shifts traffic from the old tasks to the new ones to avoid downtime. Two key settings control this behaviour deployment_maximum_percent and
deployment_minimum_healthy_percent.

  • deployment_maximum_percent

This defines the maximum number of tasks ECS can run during a deployment, expressed as a percentage of the desired task count.

Example: if your service normally runs 4 tasks and deployment_maximum_percent = 200, ECS can temporarily run up to 8 tasks during the rollout.

This means, ECS can start new tasks before stopping old ones, ensuring that your service remains available and deployment is faster.

  • deployment_minimum_healthy_percent

This defines the minimum number of tasks that must remain healthy during the deployment, also as a percentage of the desired task count.

Example: with 4 desired tasks and deployment_minimum_healthy_percent = 50, ECS ensures at least 2 tasks stay healthy at any time.

This means, even if new tasks fail to start or are unhealthy, ECS keeps enough old tasks running so your application continues serving traffic.

Why this matters

  • Enables zero-downtime deployments
  • Allows ECS to start new tasks before stopping old ones

This is especially important for production workloads.


Add Container-Level Health Checks

ECS health checks go beyond simply verifying that a container is running — they allow the scheduler to understand whether your application is actually healthy and ready to serve traffic.

By configuring a health check in your task definition, ECS can automatically replace unhealthy containers, improving reliability and reducing downtime.

healthCheck = {
  command     = ["CMD-SHELL", "curl -f http://localhost/status || exit 1"]
  interval    = 10
  timeout     = 5
  retries     = 3
  startPeriod = 10
}
Enter fullscreen mode Exit fullscreen mode

How it works

  • command – what ECS runs to check the container’s health. Here, it calls a local endpoint /status. If the command fails, the container is marked unhealthy.
  • interval – how often the health check runs (seconds).
  • timeout – how long to wait for a response before considering the check failed (seconds).
  • retries – number of consecutive failures before marking the container unhealthy.
  • startPeriod – initial grace period for a container to start before health checks begin (seconds).

Why this matters

  • ECS can automatically replace unhealthy containers
  • Improves reliability and resilience
  • Makes deployments safer by detecting bad releases early

Always implement application-level health checks rather than just process-level checks. A container might be running but the app inside could still be failing—health checks catch this early.


And final

Individually, these configurations may seem minor—but together they make a big difference:

  • Improve operational stability
  • Reduce deployment risk
  • Increase observability and debugging ability
  • Align ECS services with production best practices

If you’re using ECS with Terraform and don’t have these yet, I highly recommend adding them to your baseline.

Let's build great things together! 🚀

Top comments (0)