Matt

Posted on Jun 12 • Originally published at fortem.dev

How to Debug AWS Fargate Containers with ECS Exec

#aws #ecs #devops #platformengineering

You moved to Fargate. No more SSH. No more docker exec. Your container is failing and you can't get inside.

ECS Exec — AWS's answer to docker exec for Fargate — has been around since 2021. It bind-mounts the SSM agent into your running container at runtime. No sidecar. No ports. No keys. Just IAM.

This guide covers setup, the 5 errors that catch everyone, and the production controls you actually need.

Why ECS Exec exists

Fargate has no hosts to SSH into. Before ECS Exec launched in March 2021, debugging a Fargate container meant you couldn't get a shell at all. It was the #1 most requested feature on the AWS Containers Roadmap.

ECS on EC2 (before)	ECS on Fargate (with ECS Exec)
SSH into EC2 instance	`aws ecs execute-command` (no SSH)
`docker exec -it container bash`	`/bin/bash` via SSM
Open ports, manage SSH keys	No ports, no keys — IAM controls access
Locate instance in ASG first	Direct to task ID — always routable

Key fact: ECS Exec is not a sidecar. It bind-mounts the SSM agent binaries into your existing container at runtime. Your task definition doesn't change.

Download the skill file first

Before you hit one of the 5 errors below — there's a skill file on fortem.dev that an AI agent (Claude Code, OpenCode, Codex) can run for you.

It checks:

Whether --enable-execute-command is set on your service
Whether the task role has the right SSM permissions
Whether the Session Manager plugin is installed locally
Network path to SSM endpoints
Read-only filesystem settings

Get the ECS Exec Readiness skill file → fortem.dev/blog/ecs-exec-guide

Drop the .md file into your AI agent and it runs the 5-point checklist against your AWS account. Everything runs locally, read-only by default.

The 5 errors that catch everyone

01 — `ExecuteCommandAgent not RUNNING`

Cause: You forgot --enable-execute-command when creating or updating the service.

aws ecs update-service \
    --cluster your-cluster \
    --service your-service \
    --enable-execute-command \
    --force-new-deployment

Verify: aws ecs describe-tasks — check enableExecuteCommand: true and ExecuteCommandAgent status: RUNNING

02 — `AccessDeniedException — User is not authorized`

Cause: Your task IAM role doesn't have SSM permissions. This is the #1 cause of silent failures.

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": [
      "ssmmessages:CreateControlChannel",
      "ssmmessages:CreateDataChannel",
      "ssmmessages:OpenControlChannel",
      "ssmmessages:OpenDataChannel"
    ],
    "Resource": "*"
  }]
}

Attach to the task role, not the execution role. The SSM agent runs inside the container — it's the task that needs the permissions.

03 — `TargetNotConnected — Session Manager plugin not found`

Cause: The SSM Session Manager plugin is not installed on your local machine.

# macOS
curl "https://s3.amazonaws.com/session-manager-downloads/plugin/latest/mac/sessionmanager-bundle.zip" -o "session.zip"
unzip session.zip
sudo ./sessionmanager-bundle/install -i /usr/local/sessionmanagerplugin -b /usr/local/bin/session-manager-plugin

# Verify
session-manager-plugin --version

04 — Timeout, session never connects

Cause: Your Fargate task can't reach the SSM service endpoint. Either no NAT gateway in the private subnet, or missing VPC endpoints.

# Create VPC endpoint for SSM (recommended for private subnets)
aws ec2 create-vpc-endpoint \
    --vpc-id vpc-xxx \
    --service-name com.amazonaws.region.ssmmessages \
    --subnet-ids subnet-xxx

05 — Session starts but commands fail — `cannot create directory`

Cause: Your container has readonlyRootFilesystem: true. The SSM agent writes to /var/lib/amazon/ssm/ — it needs a writable filesystem.

"linuxParameters": {
  "initProcessEnabled": true
}

And set readonlyRootFilesystem: false. There's no workaround — the agent needs writable storage.

The happy path — step by step

Step 1 — Install Session Manager plugin (see error 03 above)

Step 2 — Task IAM role policy (see error 02 above — attach ssmmessages:* to task role)

Step 3 — Enable on service:

aws ecs update-service \
    --cluster my-cluster \
    --service my-service \
    --enable-execute-command \
    --force-new-deployment

Step 4 — Verify:

aws ecs describe-tasks \
    --cluster my-cluster \
    --tasks $(aws ecs list-tasks --cluster my-cluster --service my-service --query 'taskArns[0]' --output text)
# Look for: "enableExecuteCommand": true, ExecuteCommandAgent "lastStatus": "RUNNING"

Step 5 — Execute:

# Interactive shell
aws ecs execute-command \
    --cluster my-cluster \
    --task YOUR_TASK_ID \
    --container nginx \
    --command "/bin/bash" \
    --interactive

# Single command
aws ecs execute-command \
    --cluster my-cluster \
    --task YOUR_TASK_ID \
    --container nginx \
    --command "env | grep DATABASE" \
    --interactive

Production setup — logging, audit, access control

Three layers for production:

Layer 1 — Log command output

aws ecs update-cluster \
    --cluster my-cluster \
    --configuration executeCommandConfiguration='{
      "logging": "OVERRIDE",
      "logConfiguration": {
        "cloudWatchLogGroupName": "/aws/ecs/my-cluster-exec",
        "s3BucketName": "my-exec-logs",
        "s3KeyPrefix": "exec-output"
      }
    }'

CloudTrail logs who ran ExecuteCommand. S3/CloudWatch logs what they ran.

Layer 2 — Restrict by environment tag

{
  "Effect": "Allow",
  "Action": "ecs:ExecuteCommand",
  "Resource": [
    "arn:aws:ecs:us-east-1:123456789:cluster/my-cluster",
    "arn:aws:ecs:us-east-1:123456789:task/my-cluster/*"
  ],
  "Condition": {
    "StringEquals": {
      "ecs:ResourceTag/environment": "development"
    }
  }
}

Layer 3 — Block production by container name

{
  "Effect": "Deny",
  "Action": "ecs:ExecuteCommand",
  "Resource": "*",
  "Condition": {
    "StringEquals": {
      "ecs:container-name": "production-app"
    }
  }
}

What ECS Exec can't do

Limitation	Why it matters
20-minute idle timeout	Not configurable. Active commands keep it alive
1 session per PID namespace	Second session fails until first exits
Must be enabled at launch	Can't retroactively enable on running tasks
Read-only root FS breaks it	SSM writes to `/var/lib/amazon/ssm/`
Commands run as root	Ignores container `USER` directive
No AWS Console support	CLI/SDK only
Only tools in the image	No injected debug tools

"ECS Exec sessions drop after 20 minutes of idle time — this timeout is not configurable. Only one session per container PID namespace is supported, and sessions always run as root regardless of the container USER directive." — AWS ECS Exec documentation, verified June 2026

FAQ

Does ECS Exec work on Fargate Spot?
Yes. The Spot interruption risk means you might lose your exec session mid-debug, but the feature works identically on Spot and On-Demand.

How much does ECS Exec cost?
ECS Exec itself is free. The only potential cost is CloudWatch Logs or S3 storage if you enable session logging. SSM Session Manager is also free. KMS key usage for encryption costs ~$1/month per key.

Can I use ECS Exec to run a one-off command?
Yes: aws ecs execute-command --command 'ls -la' --interactive. For non-interactive use (from CI/CD), omit --interactive.

How do I restrict ECS Exec to specific IAM users?
Use IAM condition keys on ecs:ExecuteCommand. Restrict by cluster name, task tags, container name. Example: allow exec only on tasks tagged environment=development.

What happens to ECS Exec when I update the service?
If you update without --enable-execute-command, new tasks will NOT have ECS Exec. Always include the flag in your update-service calls, or manage it via IaC.

Full article with downloadable skill file: fortem.dev/blog/ecs-exec-guide

DEV Community

How to Debug AWS Fargate Containers with ECS Exec

Why ECS Exec exists

Download the skill file first

The 5 errors that catch everyone

01 — `ExecuteCommandAgent not RUNNING`

02 — `AccessDeniedException — User is not authorized`

03 — `TargetNotConnected — Session Manager plugin not found`

04 — Timeout, session never connects

05 — Session starts but commands fail — `cannot create directory`

The happy path — step by step

Production setup — logging, audit, access control

Layer 1 — Log command output

Layer 2 — Restrict by environment tag

Layer 3 — Block production by container name

What ECS Exec can't do

FAQ

Top comments (0)

Why ECS Exec exists

Download the skill file first

The 5 errors that catch everyone

01 — ExecuteCommandAgent not RUNNING

02 — AccessDeniedException — User is not authorized

03 — TargetNotConnected — Session Manager plugin not found

04 — Timeout, session never connects

05 — Session starts but commands fail — cannot create directory

The happy path — step by step

Production setup — logging, audit, access control

Layer 1 — Log command output

Layer 2 — Restrict by environment tag

Layer 3 — Block production by container name

What ECS Exec can't do

FAQ

01 — `ExecuteCommandAgent not RUNNING`

02 — `AccessDeniedException — User is not authorized`

03 — `TargetNotConnected — Session Manager plugin not found`

05 — Session starts but commands fail — `cannot create directory`