Sameer Khanal

Posted on Dec 16, 2025 • Edited on Dec 22, 2025

Automating Container Security on AWS with CI/CD and Fargate using GitHub Actions

#ecs #containers #aws #community

Introduction: Why I Built This

A few months ago, I was responsible for managing a rapidly growing microservices architecture supported by AWS. There was five different services running in separate containers, each with its own dependencies, and I was continually worried about whether any of them had been updated with a vulnerability during image deployment.

The real problem wasn't just the scanning of images; it was about how to automate and consistently maintain security across the entire deployment pipeline.
Manual security checks do not scale, and we all know they are often skipped when a deadline approaches.

I built an automated security pipeline to scan for vulnerabilities in container images using AWS Fargate and GitHub Actions. It eliminated all the manual scanning of images, as well as the constant fear of whether the last base image had critical CVEs and whether we were deploying insecure code to production.

In this article, I will provide a detailed overview of how I set this up with example configurations and command lines I ran, as well as some of the lessons I learned from implementing it in a production environment..

What We Will Discuss:
How to set up ECR with automated vulnerability scanning.

How to create GitHub Actions workflows that will do deployments based on security findings.

What Least Privilege IAM Roles and Network Isolation on Fargate mean.

How to continuously monitor and track compliance.

An example step-by-step for your projects.

Architecture Overview: The Big Picture
Now that we have an overview of what we will be doing, here's an overview of the architecture that supports the automated security pipeline you built:

With this configuration, you will have:

Automated security scanning of all container images before reaching production.
Network Isolation for Private Subnets and Security Groups.
Least Privilege Access with Separate IAM Roles for Execution and Task Roles
Complete Audit Trail via CloudWatch Logs
Runtime Protection via Fargate Isolation Model

Now Let's Get into How Each Component Functions!

Container Image Security
It is necessary not to assume that any images of containers are trustworthy. Even official images used by developers can contain vulnerabilities and/or have dependencies that could create security problems.
I locked down the image containers by:

1. Automatic image scanning ECR (Scan on push)
For each repository in ECR, I enabled image scanning on push. When I push an image into ECR, AWS automatically scans that image against the CVE database. To enable image scan on push for a repository, follow these steps:

Enabling scan-on-push for a repository:

aws ecr put-image-scanning-configuration \
    --repository-name production-api \
    --image-scanning-configuration scanOnPush=true \
    --region us-east-1

Configuring enhanced scanning with Inspector:

aws ecr put-registry-scanning-configuration \
    --scan-type ENHANCED \
    --rules '[{"repositoryFilters":[{"filter":"*","filterType":"WILDCARD"}],"scanFrequency":"CONTINUOUS_SCAN"}]' \
    --region us-east-1

With improved scanning you will discover:

Operating system vulnerabilities
Vulnerabilities in programming-language packages (Python, Node.JS,
Java etc.)
Continuous monitoring of your environment not just at the time you 'push' your container image.

2. IAM Roles - Separation is Key
I learnt the hard way - do not ever use the same IAM Role across everything. Here is how I distinguish between:
Task Execution Role - this is the role that ECS will require in order to launch the container you have created.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ecr:GetAuthorizationToken",
        "ecr:BatchCheckLayerAvailability",
        "ecr:GetDownloadUrlForLayer",
        "ecr:BatchGetImage",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "*"
    }
  ]
}

*Task Role *(what your application needs):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:GetItem",
        "dynamodb:PutItem",
        "dynamodb:Query"
      ],
      "Resource": "arn:aws:dynamodb:us-east-1:745812456855:table/users-table"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject"
      ],
      "Resource": "arn:aws:s3:::my-app-assets/*"
    }
  ]
}

The service’s requirement for executing a task is identical for all services; however, each service's specific need defines the task role.

Base Image Selection & Multi-Stage Builds
I no longer use large operating system images; instead, I use small base images as a foundation. Below is an example comparing both types of base images.

Base Image Size and Vulnerabilities (my tests show the below data).

node:18 - full image: 1.1 GB 247 CVEs
node:18-slim - small image: 243 MB 89 CVEs
node:18-alpine - container images: 178 MB 12 CVEs

My Dockerfile Strategy Uses Multi-Stage Builds:

# Build stage - includes dev dependencies
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

# Production stage - minimal runtime
FROM node:18-alpine
WORKDIR /app

# Create non-root user
RUN addgroup -g 1001 appuser && \
    adduser -D -u 1001 -G appuser appuser

# Copy only production artifacts
COPY --from=builder --chown=appuser:appuser /app/dist ./dist
COPY --from=builder --chown=appuser:appuser /app/node_modules ./node_modules
COPY --from=builder --chown=appuser:appuser /app/package.json ./

# Switch to non-root user
USER appuser

# Run the application
EXPOSE 3000
CMD ["node", "dist/index.js"]

Security Practices in the Dockerfile:

Utilizes an Alpine image as a base
Implements a multi-stage build; production image has no build tools
Runs the image as a non-root user (following the principle of least privilege)
Only production dependencies included in the image
Related dependencies are pinned to a specific version.

Vulnerability Thresholds
I created a policy that states: No Critical or High-Risk vulnerabilities exist in production -- No Exceptions.
The following process allows me to check the programmatic results of the scans:

# Get the latest scan findings
aws ecr describe-image-scan-findings \
    --repository-name production-api \
    --image-id imageTag=v1.2.3 \
    --region us-east-1 \
    --query 'imageScanFindings.findingSeverityCounts'

Output:

{
    "CRITICAL": 0,
    "HIGH": 0,
    "MEDIUM": 3,
    "LOW": 12,
    "INFORMATIONAL": 5
}

A build will fail automatically when CRITICAL or HIGH vulnerabilities are detected in GitHub Actions CI/CD using our automated security process (in the following section).

Automating Security in GitHub Actions CI/CD
This is where things become interesting: every time new code is pushed into the master branch, an automated process is created, which builds the application, scans it for vulnerabilities, and only then deploys (if everything was successful).

GitHub Actions Full Workflow
Here is the workflow document I have.(.github/workflows/deploy.yml):

name: Build, Scan, and Deploy to Fargate

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

env:
  AWS_REGION: us-east-1
  ECR_REPOSITORY: production-api
  ECS_CLUSTER: production-cluster
  ECS_SERVICE: api-service
  CONTAINER_NAME: api-container

jobs:
  build-and-scan:
    name: Build and Security Scan
    runs-on: ubuntu-latest

    outputs:
      image: ${{ steps.build-image.outputs.image }}

    steps:
    - name: Checkout code
      uses: actions/checkout@v3

    - name: Configure AWS credentials
      uses: aws-actions/configure-aws-credentials@v2
      with:
        aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
        aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        aws-region: ${{ env.AWS_REGION }}

    - name: Login to Amazon ECR
      id: login-ecr
      uses: aws-actions/amazon-ecr-login@v1

    - name: Build Docker image
      id: build-image
      env:
        ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
        IMAGE_TAG: ${{ github.sha }}
      run: |
        docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG .
        docker tag $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG $ECR_REGISTRY/$ECR_REPOSITORY:latest
        echo "image=$ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG" >> $GITHUB_OUTPUT

    - name: Push image to Amazon ECR
      env:
        ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
        IMAGE_TAG: ${{ github.sha }}
      run: |
        docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
        docker push $ECR_REGISTRY/$ECR_REPOSITORY:latest

    - name: Wait for ECR scan to complete
      env:
        IMAGE_TAG: ${{ github.sha }}
      run: |
        echo "Waiting for image scan to complete..."
        sleep 30

        for i in {1..20}; do
          SCAN_STATUS=$(aws ecr describe-image-scan-findings \
            --repository-name $ECR_REPOSITORY \
            --image-id imageTag=$IMAGE_TAG \
            --region $AWS_REGION \
            --query 'imageScanStatus.status' \
            --output text 2>/dev/null || echo "PENDING")

          echo "Scan status: $SCAN_STATUS"

          if [ "$SCAN_STATUS" = "COMPLETE" ]; then
            echo "Scan completed successfully"
            break
          elif [ "$SCAN_STATUS" = "FAILED" ]; then
            echo "Scan failed!"
            exit 1
          fi

          sleep 15
        done

    - name: Check for vulnerabilities
      env:
        IMAGE_TAG: ${{ github.sha }}
      run: |
        echo "Checking for vulnerabilities..."

        FINDINGS=$(aws ecr describe-image-scan-findings \
          --repository-name $ECR_REPOSITORY \
          --image-id imageTag=$IMAGE_TAG \
          --region $AWS_REGION)

        echo "$FINDINGS" | jq '.imageScanFindings.findingSeverityCounts'

        CRITICAL=$(echo "$FINDINGS" | jq -r '.imageScanFindings.findingSeverityCounts.CRITICAL // 0')
        HIGH=$(echo "$FINDINGS" | jq -r '.imageScanFindings.findingSeverityCounts.HIGH // 0')
        MEDIUM=$(echo "$FINDINGS" | jq -r '.imageScanFindings.findingSeverityCounts.MEDIUM // 0')

        echo "Critical vulnerabilities: $CRITICAL"
        echo "High vulnerabilities: $HIGH"
        echo "Medium vulnerabilities: $MEDIUM"

        if [ "$CRITICAL" -gt 0 ] || [ "$HIGH" -gt 0 ]; then
          echo "❌ CRITICAL or HIGH vulnerabilities found! Deployment blocked."
          echo "Please fix vulnerabilities before deploying."
          exit 1
        fi

        if [ "$MEDIUM" -gt 5 ]; then
          echo "⚠️  Warning: More than 5 MEDIUM vulnerabilities found."
          echo "Consider addressing these vulnerabilities."
        fi

        echo "✅ Security scan passed!"

    - name: Run Trivy vulnerability scanner
      uses: aquasecurity/trivy-action@master
      with:
        image-ref: ${{ steps.build-image.outputs.image }}
        format: 'sarif'
        output: 'trivy-results.sarif'

    - name: Upload Trivy results to GitHub Security
      uses: github/codeql-action/upload-sarif@v2
      if: always()
      with:
        sarif_file: 'trivy-results.sarif'

  deploy:
    name: Deploy to Fargate
    needs: build-and-scan
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main' && github.event_name == 'push'

    steps:
    - name: Configure AWS credentials
      uses: aws-actions/configure-aws-credentials@v2
      with:
        aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
        aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        aws-region: ${{ env.AWS_REGION }}

    - name: Download task definition
      run: |
        aws ecs describe-task-definition \
          --task-definition api-service-task \
          --query taskDefinition > task-definition.json

    - name: Fill in the new image ID in the task definition
      id: task-def
      uses: aws-actions/amazon-ecs-render-task-definition@v1
      with:
        task-definition: task-definition.json
        container-name: ${{ env.CONTAINER_NAME }}
        image: ${{ needs.build-and-scan.outputs.image }}

    - name: Deploy to Amazon ECS
      uses: aws-actions/amazon-ecs-deploy-task-definition@v1
      with:
        task-definition: ${{ steps.task-def.outputs.task-definition }}
        service: ${{ env.ECS_SERVICE }}
        cluster: ${{ env.ECS_CLUSTER }}
        wait-for-service-stability: true

    - name: Notify deployment success
      if: success()
      run: |
        echo "🚀 Deployment successful!"
        echo "Image: ${{ needs.build-and-scan.outputs.image }}"
        echo "Service: ${{ env.ECS_SERVICE }}"
        echo "Cluster: ${{ env.ECS_CLUSTER }}"

Below is a list of the actions taken in this workflow, listed sequentially:

Build:
Retrieve code from the repository
Create a Docker image
Tag the image with the SHA for historical records
Scan:
Send the image to Amazon ECR (which starts an automatic scan)
Check for completion of the scan (with timeout)
Download the results of the scan for identified vulnerabilities
Deployments that have Critical or High vulnerabilities are prohibited from proceeding
Run another scan using Trivy
Upload the results of the scan to the Security Tab in GitHub
Deploy: This will only take place when a successful scan occurs.
Pull current task definition file(s)
Replace current image reference with new image
Deploy the task to Fargate
Observe until the service has stabilized

The Secret Sauce consists of Deployment Gate Checks, which essentially checks for any vulnerabilities present during the deployment process.

if [ "$CRITICAL" -gt 0 ] || [ "$HIGH" -gt 0 ]; then
  echo " CRITICAL or HIGH vulnerabilities found! Deployment blocked."
  exit 1
fi

If any vulnerabilities are found during this deployment gate check, the workflow will fail and the user will not be able to deploy that artifact regardless of the reason. This process has saved us from deploying known CVEs on several occasions.

Secrets Required to Configure GitHub
You will need to add the following secret keys to your repository:

AWS_ACCESS_KEY_ID (ex: AKIAIOSFODNN7EXAMPLE)
 AWS_SECRET_ACCESS_KEY (ex: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY)

Pro tip: The best practice is to use an IAM user that has the least privileges necessary:

ecr:* To your repositories
ecs:DescribeTaskDefinition, ecs:RegisterTaskDefinition, ecs:UpdateService
iam:PassRole to the task execution role

Security and Runtime Management of Your Tasks Running within Fargate

There are many built-in Security features available with Fargate, including isolated processors and no-host management capabilities; however, it is still necessary to properly configure Fargate to obtain maximum benefits from these built-in Security capabilities.

1. Network Segmentation

When using Fargate to deploy my services, I created an environment with Private SUBNETS That Do Not Have Direct Access To The Internet. To leverage Outbound Connectivity (i.e., For ECR Image Pulling and API Calls), I used NAT Gateways (server-side Network Address Translation Gateway).

I Created A VPC According To This Configuration Through Terraform.

resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = "production-vpc"
  }
}

resource "aws_subnet" "private_a" {
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.1.0/24"
  availability_zone = "us-east-1a"

  tags = {
    Name = "production-private-subnet-a"
  }
}

resource "aws_subnet" "private_b" {
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.2.0/24"
  availability_zone = "us-east-1b"

  tags = {
    Name = "production-private-subnet-b"
  }
}

resource "aws_security_group" "ecs_tasks" {
  name        = "ecs-tasks-sg"
  description = "Security group for ECS tasks"
  vpc_id      = aws_vpc.main.id

  # Allow outbound to internet (via NAT Gateway)
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  # Allow inbound from ALB only
  ingress {
    from_port       = 3000
    to_port         = 3000
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id]
    description     = "Allow traffic from ALB"
  }

  tags = {
    Name = "ecs-tasks-security-group"
  }
}

A sample task definition (Terraform) for Fargate Task Definitions that adheres to secure practices:

resource "aws_ecs_task_definition" "api_service" {
  family                   = "api-service-task"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = "512"
  memory                   = "1024"
  execution_role_arn       = aws_iam_role.ecs_execution_role.arn
  task_role_arn            = aws_iam_role.api_task_role.arn

  container_definitions = jsonencode([
    {
      name      = "api-container"
      image     = "745812456855.dkr.ecr.us-east-1.amazonaws.com/production-api:latest"
      essential = true

      portMappings = [
        {
          containerPort = 3000
          protocol      = "tcp"
        }
      ]

      logConfiguration = {
        logDriver = "awslogs"
        options = {
          "awslogs-group"         = "/ecs/production-cluster/api-service"
          "awslogs-region"        = "us-east-1"
          "awslogs-stream-prefix" = "ecs"
        }
      }

      environment = [
        {const winston = require('winston');

const logger = winston.createLogger({
  level: 'info',
  format: winston.format.combine(
    winston.format.timestamp(),
    winston.format.json()
  ),
  defaultMeta: { 
    service: 'api-service',
    environment: 'production'
  },
  transports: [
    new winston.transports.Console()
  ]
});

// Usage
logger.info('User login successful', { 
  userId: 'user-123', 
  ipAddress: '203.0.113.42' 
});

logger.error('Database connection failed', { 
  error: err.message,
  stack: err.stack 
});

          name  = "NODE_ENV"
          value = "production"
        },
        {
          name  = "PORT"
          value = "3000"
        }
      ]

      secrets = [
        {
          name      = "DATABASE_URL"
          valueFrom = "arn:aws:secretsmanager:us-east-1:745812456855:secret:prod/database-url-AbCdEf"
        },
        {
          name      = "API_KEY"
          valueFrom = "arn:aws:secretsmanager:us-east-1:745812456855:secret:prod/api-key-XyZ123"
        }
      ]

      # Security configurations
      readonlyRootFilesystem = true

      linuxParameters = {
        capabilities = {
          drop = ["ALL"]
          add  = ["NET_BIND_SERVICE"]
        }
      }

      healthCheck = {
        command     = ["CMD-SHELL", "curl -f http://localhost:3000/health || exit 1"]
        interval    = 30
        timeout     = 5
        retries     = 3
        startPeriod = 60
      }
    }
  ])

  tags = {
    Environment = "production"
    Service     = "api"
  }
}

The task definition contains the following key features related to security:

The readonlyRootFilesystem option restricts write access to the local file system of the container so that it cannot be modified by external entities.
The linuxParameters.capabilities allow the user to drop all capabilities by default, and only grant the minimal required access.
The secrets parameter should be set to use AWS Secrets Manager instead of environment variables.
The execution role and the task role should be separate.
It is recommended that a health check be included as part of the overall reliability of the application.

Encryption in Transit:

ALB still uses HTTPS using ACM certificates.
All internal service-to-service communications use HTTPS.

Encryption at Rest:

ECR images will be encrypted using AWS KMS.
CloudWatch logs will be encrypted.
EFS volume (if used) will be encrypted.

Example: Creating encrypted log group:

aws logs create-log-group \
    --log-group-name /ecs/production-cluster/api-service \
    --kms-key-id arn:aws:kms:us-east-1:745812456855:key/12345678-4562-4568-5645-745812456855 \
    --region us-east-1

4. Least-Privilege IAM Policies

This is the actual task role I used:

resource "aws_iam_role" "api_task_role" {
  name = "apiServiceTaskRole"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "ecs-tasks.amazonaws.com"
        }
      }
    ]
  })
}

resource "aws_iam_role_policy" "api_task_policy" {
  name = "apiServiceTaskPolicy"
  role = aws_iam_role.api_task_role.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "dynamodb:GetItem",
          "dynamodb:PutItem",
          "dynamodb:Query",
          "dynamodb:UpdateItem"
        ]
        Resource = [
          "arn:aws:dynamodb:us-east-1:745812456855:table/users-table",
          "arn:aws:dynamodb:us-east-1:745812456855:table/users-table/index/*"
        ]
      },
      {
        Effect = "Allow"
        Action = [
          "s3:GetObject",
          "s3:PutObject"
        ]
        Resource = "arn:aws:s3:::my-app-assets/*"
      },
      {
        Effect = "Allow"
        Action = [
          "secretsmanager:GetSecretValue"
        ]
        Resource = [
          "arn:aws:secretsmanager:us-east-1:745812456855:secret:prod/database-url-*",
          "arn:aws:secretsmanager:us-east-1:745812456855:secret:prod/api-key-*"
        ]
      }
    ]
  })
}

This is a clearer account of what I'm storing in DynamoDB
(the only DynamoDB tables required by this service) and which S3 actions can be performed (only GetObject and PutObject allowed - deletes not included). The other sensitive data I store (secrets) must be secured.

Compliance & Monitoring

Monitoring isn't a one-time event; it requires consistent monitoring so that we can maintain audit trails.

CloudWatch Logging Log files from all containers are collected in CloudWatch. The way I configure structured logging is detailed below: Creating the log group:

aws logs create-log-group \
    --log-group-name /ecs/production-cluster/api-service \
    --retention-in-days 30 \
    --region us-east-1

Application logging (Node.js example):

const winston = require('winston');

const logger = winston.createLogger({
  level: 'info',
  format: winston.format.combine(
    winston.format.timestamp(),
    winston.format.json()
  ),
  defaultMeta: { 
    service: 'api-service',
    environment: 'production'
  },
  transports: [
    new winston.transports.Console()
  ]
});

// Usage
logger.info('User login successful', { 
  userId: 'user-123', 
  ipAddress: '203.0.113.42' 
});

logger.error('Database connection failed', { 
  error: err.message,
  stack: err.stack 
});

CloudWatch Alarms I set up alarms for critical metrics:

# High CPU usage alarm
aws cloudwatch put-metric-alarm \
    --alarm-name api-service-high-cpu \
    --alarm-description "Alert when API service CPU > 80%" \
    --metric-name CPUUtilization \
    --namespace AWS/ECS \
    --statistic Average \
    --period 300 \
    --threshold 80 \
    --comparison-operator GreaterThanThreshold \
    --evaluation-periods 2 \
    --dimensions Name=ServiceName,Value=api-service Name=ClusterName,Value=production-cluster \
    --alarm-actions arn:aws:sns:us-east-1:745812456855:ops-alerts

# Failed task alarm
aws cloudwatch put-metric-alarm \
    --alarm-name api-service-task-failures \
    --alarm-description "Alert on task failures" \
    --metric-name TaskFailures \
    --namespace ECS/ContainerInsights \
    --statistic Sum \
    --period 60 \
    --threshold 1 \
    --comparison-operator GreaterThanThreshold \
    --evaluation-periods 1 \
    --dimensions Name=ServiceName,Value=api-service Name=ClusterName,Value=production-cluster \
    --alarm-actions arn:aws:sns:us-east-1:745812456855:ops-alerts

3. Amazon Inspector Integration
Inspector provides continuous runtime vulnerability scanning. Here's how I enabled it:

# Enable Inspector for ECR and EC2
aws inspector2 enable \
    --resource-types ECR EC2 \
    --region us-east-1

# Check Inspector findings
aws inspector2 list-findings \
    --filter-criteria '{
      "ecrImageRepositoryName": [{"comparison": "EQUALS", "value": "production-api"}],
      "severity": [{"comparison": "EQUALS", "value": "CRITICAL"}]
    }' \
    --region us-east-1

The inspector provides you with the following benefits:

Continuous scanning of all of your running containers;.
detection of CVEs in both your operating system packages and your application dependencies.
Risk scores to prioritize your scan and find the most urgent risks; integration with AWS Security Hub.

Log Query Analysis I use CloudWatch Insights to monitor security-relevant events, and I have created a query that identifies all failed authentication attempts.

fields @timestamp, userId, ipAddress, @message
| filter @message like /authentication failed/
| sort @timestamp desc
| limit 100

The query for the error rate by service:

fields @timestamp, service, @message
| filter level = "error"
| stats count() by service
| sort count() desc

Compliance Reporting For compliance requirements (SOC 2, ISO 27001), I export logs to S3 for long-term retention:

aws logs create-export-task \
    --log-group-name /ecs/production-cluster/api-service \
    --from $(date -d '30 days ago' +%s)000 \
    --to $(date +%s)000 \
    --destination s3-compliance-logs-bucket \
    --destination-prefix ecs-logs/api-service/

Example of Steps: Complete Process

I want to demonstrate how to deploy (create) a completely new microservice while implementing all security features.

Step 1: Create the Microservice
Project Layout:

notification-service/
├── src/
│   ├── index.js
│   ├── handlers/
│   └── utils/
├── Dockerfile
├── package.json
└── .github/
    └── workflows/
        └── deploy.yml

Example Docker file (There is a link to the Docker file that is similar to notification-service/dockerfile on GitHub)

# Build stage
FROM node:18-alpine AS builder

WORKDIR /app

# Copy package files
COPY package*.json ./

# Install dependencies
RUN npm ci --only=production

# Copy application code
COPY . .

# Build if needed (TypeScript, etc.)
# RUN npm run build

# Production stage
FROM node:18-alpine

WORKDIR /app

# Install security updates
RUN apk update && \
    apk upgrade && \
    apk add --no-cache dumb-init && \
    rm -rf /var/cache/apk/*

# Create non-root user
RUN addgroup -g 1001 -S appgroup && \
    adduser -u 1001 -S appuser -G appgroup && \
    chown -R appuser:appgroup /app

# Copy from builder
COPY --from=builder --chown=appuser:appgroup /app/node_modules ./node_modules
COPY --from=builder --chown=appuser:appgroup /app/src ./src
COPY --from=builder --chown=appuser:appgroup /app/package.json ./

# Switch to non-root user
USER appuser

# Use dumb-init to handle signals properly
ENTRYPOINT ["dumb-init", "--"]

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3 \
    CMD node -e "require('http').get('http://localhost:3000/health', (r) => {process.exit(r.statusCode === 200 ? 0 : 1)})"

# Expose port
EXPOSE 3000

# Run application
CMD ["node", "src/index.js"]

Application code (src/index.js):

const express = require('express');
const AWS = require('aws-sdk');
const winston = require('winston');

const app = express();
const PORT = process.env.PORT || 3000;

// Configure logger
const logger = winston.createLogger({
  level: 'info',
  format: winston.format.json(),
  defaultMeta: { service: 'notification-service' },
  transports: [new winston.transports.Console()]
});

// Configure AWS SDK
const sns = new AWS.SNS({ region: 'us-east-1' });
const secretsManager = new AWS.SecretsManager({ region: 'us-east-1' });

app.use(express.json());

// Health check endpoint
app.get('/health', (req, res) => {
  res.status(200).json({ status: 'healthy', timestamp: new Date().toISOString() });
});

// Send notification endpoint
app.post('/notify', async (req, res) => {
  try {
    const { message, topic } = req.body;

    logger.info('Sending notification', { topic, messageLength: message.length });

    const params = {
      Message: message,
      TopicArn: `arn:aws:sns:us-east-1:745812456855:${topic}`
    };

    await sns.publish(params).promise();

    logger.info('Notification sent successfully', { topic });
    res.status(200).json({ success: true, message: 'Notification sent' });

  } catch (error) {
    logger.error('Failed to send notification', { 
      error: error.message,
      stack: error.stack 
    });
    res.status(500).json({ success: false, error: 'Failed to send notification' });
  }
});

// Graceful shutdown
process.on('SIGTERM', () => {
  logger.info('SIGTERM received, shutting down gracefully');
  process.exit(0);
});

app.listen(PORT, () => {
  logger.info(`Notification service listening on port ${PORT}`);
});

Step 2: Create ECR Repository

# Create the repository
aws ecr create-repository \
    --repository-name notification-service \
    --image-scanning-configuration scanOnPush=true \
    --encryption-configuration encryptionType=KMS,kmsKey=arn:aws:kms:us-east-1:745812456855:key/12345678-1234-1234-1234-745812456855 \
    --region us-east-1

# Set lifecycle policy to keep only last 10 images
aws ecr put-lifecycle-policy \
    --repository-name notification-service \
    --lifecycle-policy-text '{
      "rules": [
        {
          "rulePriority": 1,
          "description": "Keep last 10 images",
          "selection": {
            "tagStatus": "any",
            "countType": "imageCountMoreThan",
            "countNumber": 10
          },
          "action": {
            "type": "expire"
          }
        }
      ]
    }' \
    --region us-east-1

Expected output:

{
    "repository": {
        "repositoryArn": "arn:aws:ecr:us-east-1:745812456855:repository/notification-service",
        "registryId": "745812456855",
        "repositoryName": "notification-service",
        "repositoryUri": "745812456855.dkr.ecr.us-east-1.amazonaws.com/notification-service",
        "createdAt": "2024-12-15T10:30:00.000000+00:00",
        "imageScanningConfiguration": {
            "scanOnPush": true
        },
        "encryptionConfiguration": {
            "encryptionType": "KMS",
            "kmsKey": "arn:aws:kms:us-east-1:745812456855:key/12345678-1234-1234-1234-745812456855"
        }
    }
}

Step 3: Build and Push Image Manually (First Time)

# Authenticate Docker to ECR
aws ecr get-login-password --region us-east-1 | \
    docker login --username AWS --password-stdin 745812456855.dkr.ecr.us-east-1.amazonaws.com

# Build the image
docker build -t notification-service:v1.0.0 .

# Tag for ECR
docker tag notification-service:v1.0.0 \
    745812456855.dkr.ecr.us-east-1.amazonaws.com/notification-service:v1.0.0

docker tag notification-service:v1.0.0 \
    745812456855.dkr.ecr.us-east-1.amazonaws.com/notification-service:latest

# Push to ECR
docker push 745812456855.dkr.ecr.us-east-1.amazonaws.com/notification-service:v1.0.0
docker push 745812456855.dkr.ecr.us-east-1.amazonaws.com/notification-service:latest

# Wait for scan to complete
echo "Waiting for image scan..."
sleep 30

# Check scan results
aws ecr describe-image-scan-findings \
    --repository-name notification-service \
    --image-id imageTag=v1.0.0 \
    --region us-east-1 \
    --query 'imageScanFindings.findingSeverityCounts'

Output of scan:

{
    "MEDIUM": 2,
    "LOW": 8,
    "INFORMATIONAL": 3
}

There are no any critical or high vulnerabilities, so safe to deploy!

Step 4: Create IAM Roles

Task execution role (same for all services):

# Create trust policy
cat > trust-policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ecs-tasks.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
EOF

# Create role
aws iam create-role \
    --role-name ecsTaskExecutionRole \
    --assume-role-policy-document file://trust-policy.json

# Attach AWS managed policy
aws iam attach-role-policy \
    --role-name ecsTaskExecutionRole \
    --policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy

Task role (specific to notification service):

# Create task role
aws iam create-role \
    --role-name notificationServiceTaskRole \
    --assume-role-policy-document file://trust-policy.json

# Create custom policy
cat > notification-policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "sns:Publish"
      ],
      "Resource": [
        "arn:aws:sns:us-east-1:745812456855:app-notifications",
        "arn:aws:sns:us-east-1:745812456855:alert-notifications"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "secretsmanager:GetSecretValue"
      ],
      "Resource": [
        "arn:aws:secretsmanager:us-east-1:745812456855:secret:prod/notification-config-*"
      ]
    }
  ]
}
EOF

# Attach policy
aws iam put-role-policy \
    --role-name notificationServiceTaskRole \
    --policy-name NotificationServicePolicy \
    --policy-document file://notification-policy.json

Step 5: Create Task Definition

# Create task definition JSON
cat > task-definition.json <<EOF
{
  "family": "notification-service-task",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "256",
  "memory": "512",
  "executionRoleArn": "arn:aws:iam::745812456855:role/ecsTaskExecutionRole",
  "taskRoleArn": "arn:aws:iam::745812456855:role/notificationServiceTaskRole",
  "containerDefinitions": [
    {
      "name": "notification-container",
      "image": "745812456855.dkr.ecr.us-east-1.amazonaws.com/notification-service:latest",
      "essential": true,
      "portMappings": [
        {
          "containerPort": 3000,
          "protocol": "tcp"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/production-cluster/notification-service",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      },
      "environment": [
        {
          "name": "NODE_ENV",
          "value": "production"
        },
        {
          "name": "PORT",
          "value": "3000"
        }
      ],
      "secrets": [
        {
          "name": "NOTIFICATION_CONFIG",
          "valueFrom": "arn:aws:secretsmanager:us-east-1:745812456855:secret:prod/notification-config-AbCdEf"
        }
      ],
      "readonlyRootFilesystem": false,
      "linuxParameters": {
        "capabilities": {
          "drop": ["ALL"],
          "add": ["NET_BIND_SERVICE"]
        }
      },
      "healthCheck": {
        "command": ["CMD-SHELL", "curl -f http://localhost:3000/health || exit 1"],
        "interval": 30,
        "timeout": 5,
        "retries": 3,
        "startPeriod": 60
      }
    }
  ]
}
EOF

# Register task definition
aws ecs register-task-definition \
    --cli-input-json file://task-definition.json \
    --region us-east-1

Step 6: Create ECS Service

# Create CloudWatch log group first
aws logs create-log-group \
    --log-group-name /ecs/production-cluster/notification-service \
    --region us-east-1

# Create the ECS service
aws ecs create-service \
    --cluster production-cluster \
    --service-name notification-service \
    --task-definition notification-service-task \
    --desired-count 2 \
    --launch-type FARGATE \
    --platform-version LATEST \
    --network-configuration "awsvpcConfiguration={
        subnets=[subnet-0a1b2c3d,subnet-0e1f2g3h],
        securityGroups=[sg-0a1b2c3d],
        assignPublicIp=DISABLED
    }" \
    --load-balancers "targetGroupArn=arn:aws:elasticloadbalancing:us-east-1:745812456855:targetgroup/notification-tg/745812456855,containerName=notification-container,containerPort=3000" \
    --health-check-grace-period-seconds 60 \
    --deployment-configuration "maximumPercent=200,minimumHealthyPercent=100,deploymentCircuitBreaker={enable=true,rollback=true}" \
    --enable-execute-command \
    --region us-east-1

Important Deployment Configuration Options

2 - The number of tasks that are running at any given time to ensure maximum uptime for the application (high availability).
DISABLED - To indicate that this task will be running in the private subnets of this VPC.
deploymentCircuitBreaker - To automatically roll back any failed deployments.
enable-execute-command - To allow the use of ECS Exec to assist in debugging your application (must include an appropriate IAM role).

** Step 7: ** Create a .github/workflows/deploy.yml file (as illustrated above) for this notification-service project.

name: Deploy Notification Service

on:
  push:
    branches: [ main ]
    paths:
      - 'notification-service/**'

env:
  AWS_REGION: us-east-1
  ECR_REPOSITORY: notification-service
  ECS_CLUSTER: production-cluster
  ECS_SERVICE: notification-service
  CONTAINER_NAME: notification-container

jobs:
  deploy:
    name: Build, Scan, and Deploy
    runs-on: ubuntu-latest

    steps:
    - name: Checkout
      uses: actions/checkout@v3

    - name: Configure AWS credentials
      uses: aws-actions/configure-aws-credentials@v2
      with:
        aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
        aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        aws-region: ${{ env.AWS_REGION }}

    - name: Login to Amazon ECR
      id: login-ecr
      uses: aws-actions/amazon-ecr-login@v1

    - name: Build, scan, and push image
      id: build-image
      env:
        ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
        IMAGE_TAG: ${{ github.sha }}
      run: |
        cd notification-service
        docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG .
        docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG

        # Wait for scan
        sleep 30

        # Check vulnerabilities
        CRITICAL=$(aws ecr describe-image-scan-findings \
          --repository-name $ECR_REPOSITORY \
          --image-id imageTag=$IMAGE_TAG \
          --query 'imageScanFindings.findingSeverityCounts.CRITICAL' \
          --output text 2>/dev/null || echo "0")

        if [ "$CRITICAL" != "0" ] && [ "$CRITICAL" != "None" ]; then
          echo " CRITICAL vulnerabilities found!"
          exit 1
        fi

        echo "image=$ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG" >> $GITHUB_OUTPUT

    - name: Deploy to ECS
      uses: aws-actions/amazon-ecs-deploy-task-definition@v1
      with:
        task-definition: notification-service/task-definition.json
        service: ${{ env.ECS_SERVICE }}
        cluster: ${{ env.ECS_CLUSTER }}
        wait-for-service-stability: true

Step 8: Verify Deployment

# Check service status
aws ecs describe-services \
    --cluster production-cluster \
    --services notification-service \
    --region us-east-1 \
    --query 'services[0].[serviceName,status,runningCount,desiredCount]' \
    --output table

# Check task health
aws ecs list-tasks \
    --cluster production-cluster \
    --service-name notification-service \
    --region us-east-1

# Get task details
aws ecs describe-tasks \
    --cluster production-cluster \
    --tasks arn:aws:ecs:us-east-1:745812456855:task/production-cluster/1234567890abcdef \
    --region us-east-1 \
    --query 'tasks[0].[taskArn,lastStatus,healthStatus,containers[0].healthStatus]'

# Check logs
aws logs tail /ecs/production-cluster/notification-service \
    --follow \
    --region us-east-1

Expected healthy output:

Service Name	Status	Desired Tasks	Running Tasks
notification-service	ACTIVE	2	2

Step 9: Test the Service

# Get ALB DNS name
ALB_DNS=$(aws elbv2 describe-load-balancers \
    --names production-alb \
    --query 'LoadBalancers[0].DNSName' \
    --output text \
    --region us-east-1)

# Health check
curl https://$ALB_DNS/health

# Send test notification
curl -X POST https://$ALB_DNS/notify \
    -H "Content-Type: application/json" \
    -d '{
      "message": "Test notification from secure pipeline",
      "topic": "app-notifications"
    }'

Expected response:

{
  "success": true,
  "message": "Notification sent"
}

Security & Lessons Learned & Best Practices

From the time we ran this configuration in production for several months, below are my major takeaways:

Security Lessons

1. Automating Everything or It Doesn't Get Done

When under pressure, manually performing security checks gets overlooked,
Automated gates in Continuous Integration/Continuous Deployment (CI/CD) are a must,
The easiest way to secure everything is to make it the easiest thing to do because making something insecure will almost always be easier.

2. Some Vulnerabilities Are Worse Than Others

CRITICAL/HIGH: block deployment right away
MEDIUM: accept and track with ticket
LOW/INFORMATIONAL: trend monitoring.
Context is KEY here: if you have a SQL Injection CVE in a package that your environment does NOT use, your response will be quite different from that of the SQL Injection in your Web Framework.

3. Utilizing Defense-in-Depth Is Beneficial

Multiple layers of security caught issues: ECR Scan caught OS vulnerabilities, Trivy was able to catch application dependencies within an organization, and Inspector caught runtime issues.
Network isolation limited lateral movement by exploiting a single security incident.
IAM least-privilege limited the blast radius on a credential leak situation.

4. Secrets Management Is Important

Never keep secrets in environment variables (you can view them in logs, console, etc.)
AWS Secrets Manager integration with ECS is straightforward; rotating secrets regularly (e.g., every 90 days) is best practice. 5. Operational Lessons

** Read-Only File Systems can create issues for Applications**

Many applications rely upon writing to /tmp.
Therefore, it is recommended that you mount tmpfs volumes to isolate temporary data.

"mountPoints": [
     {
       "sourceVolume": "tmp",
       "containerPath": "/tmp",
       "readOnly": false
     }
   ],
   "volumes": [
     {
       "name": "tmp",
       "host": {}
     }
   ]

6. The Importance of Health Checks

Implement proper health checks (a response code of '200' isn't a true indicator of health).
Check the database for connectivity and external service health.

7. Considerations for Logging

A structured logging format (such as JSON) allows for easier search and analysis of logs,the usage of trace IDs provides a means to correlate requests between systems, application logs should be separated from access logs, and use retention policies for logs (we maintain our application logs in CloudWatch for 30 days and for 7 years in S3).

8. Cost-Effective Practices

ECR image scanning is free the first time it is pushed to the ECR repository on a given day, however if enhanced image scanning is performed it costs $0.09 per image/month (this is well worth the cost), CloudWatch Logs Insights query charges (the cost of running these queries) will eat up your costs so caching is helpful when running frequently used query patterns, Fargate Spot pricing provides an opportunity to save 70% for workloads that are not time-critical.

Lessons Learned in CI/CD

9. Deployment Speed versus Security

The CI/CD pipeline takes between 8-12 minutes (build: 3m, scan: 2m, deploy: 5m) this is acceptable for production systems however, it is not acceptable to sacrifice security for speed. To accommodate faster iteration times, take advantage of dev environments with relaxed security policies.

10. Rollback Planning

When a deployment fails the ECS circuit breaker will auto roll back the deployment, always retain previous versions of the task definitions, use Blue/Green deployments for critical services, and always test your rollback plans in advance

Lessons Learned on Architecture

11. Isolated Services:
Each microservice has its own ECR repository.
Each service has its unique IAM Task Role.
Services communicate using ALB/API Gateway only.
This limits the potential effects of a compromised service on other services.

12. Monitoring is Mandatory:

An Alarm should be set before incidents occur.
Alerts should be generated for security events such as unusual traffic, failed authentication attempts...
Weekly review meetings for security findings.
AWS Security Hub is used to provide a centralised monitoring service.

My Ten Best Security Practices

Your Practice: How Is It Applied?

Scan for vulnerabilities on each push and catch issues when they start. Use the ECR scan-on push feature and GitHub actions integrated into your existing DevOps pipeline.
Set up automated scans to catch high-severity vulnerabilities in the continuous integration/continuous deployment process.
Limit the ability of services to cause harm if compromised. Create separate task roles for each service for least privilege IAM practices.
Reduce the attack surface by placing services in private subnets with no public IP addresses and utilizing NAT gateways for outbound internet access.
Use non-root container images to prevent privilege escalation. Use a user ID of 1001 in your Dockerfiles.
Use immutable infrastructure (e.g., Fargate) for consistent infrastructure that is auditable and reproducible. This includes using task definitions in Git.
Encrypt all data at rest and in transit using KMS (Key Management Service) for protected Amazon ECR repositories, logs, and secrets.
Enable Structured JSON logs (with trace IDs) to allow you to investigate more thoroughly when a security incident occurs.
Use CloudWatch alarms and AWS Inspector to enable quick reporting and to monitor your environment.
Stay ahead of the current threat landscape by using automated tools for dependency management (such as Dependabot) and by updating base container images through a regular monthly schedule.

What I did wrong, For You To Avoid Making The Same Mistakes
Assigned each service equal IAM role access to everything - If a service gets compromised, it gains access to everything. Fixed this issue by creating IAM roles per service.
Did not set log retention amount of time - The CloudWatch Logs bill was unbelievable. Therefore, I set a 30-day retention period for all log groups.
Did not consider image layer caching - My initial Dockerfiles took 15 minutes to build. So, I changed the order of COPY commands around to optimize the cache layers.
Deployed to Production from a local computer one time - I started to think, "just this one time". This eventually led to a pattern. We enforce all production deployments through GitHub Actions.
I ignored my MEDIUM vulnerabilities - They accumulated to over 200. Now, we are tracking and solving these quarterly.

In summary
When I first began this project, I anticipated that security would be measures to the speed of our work. I was incorrect.
The implementation of Automatic Container Security has actually improved the speed of deployment for us:
We were able to eliminate the "security review" bottlenecks.
Our Development Team was able to find problems in CI before they made their way to production.
The overall number of incidents that required solutions decreased and therefore we spent less time on development.
Compliance audits have become easy because we have access to all of the necessary logs, scans, and configurations in documented form.

By The Numbers:

In the past 6 months, we have experienced Zero production security incidents that were caused by vulnerabilities in containers.
Our average deployment time (including all security checks) is 8 minutes.
100% of Images were scanned for vulnerabilities prior to deployment.
Average 15 minutes / Deployment Saved because there was no manual review needed.
2 hours a week were saved due to the elimination of a need for security meetings.

Advice I would give my former self:
When you start a project like this, you must begin with security in mind.
It is far more difficult to ,manage security than it is to build security in from the beginning.
AWS has fantastic tools (ECR Scanning, Fargate Isolation, IAM) that do all of the "heavy lifting" for you; you just have to properly configure and integrate them.

Resources That Helped Me

AWS ECS Security Best Practices
ECR Image Scanning Documentation
Fargate Security Guide
GitHub Actions for AWS
Container Security by Liz Rice

Final Thoughts

Automating security makes it almost transparent to most developers while providing security for your production infrastructure.
By using AWS Fargate and GitHub Actions, you can build a secure pipeline as part of your deployment process without having to manage that infrastructure yourself.

After following this guide, you can also create an automated container security pipeline. You can use these examples of code to get started and customize them as necessary.

If you have any questions, comments or war stories to share regarding this guide, I would love to hear from you! You can post a comment here or connect with me on social media. Together we can work towards providing a more secure AWS ecosystem!