bikash119

Posted on Sep 8

Deploying GPU-Enabled ECS EC2 Instances with Auto Scaling Groups and Launch Templates

#aws #ecs #ec2 #tutorial

This is Part 2 of a 3-part series on Deploy Docling to AWS ECS infrastructure. In Part 1, we covered the foundational networking and IAM setup required for this deployment.

Setting up Amazon ECS (Elastic Container Service) with EC2 instances can be complex, especially when you need GPU support for compute-intensive workloads. In this comprehensive guide, we'll walk through creating a robust, scalable ECS infrastructure using Auto Scaling Groups (ASG) and Launch Templates, specifically configured for GPU workloads.

Why Use Auto Scaling Groups with ECS?

Auto Scaling Groups provide several key benefits for ECS deployments:

Automatic scaling based on demand and health checks
High availability across multiple availability zones
Cost optimization by scaling down during low usage periods
Consistent tagging and configuration through Launch Templates
Integration with ECS Capacity Providers for seamless container orchestration

Prerequisites

Before we begin, ensure you have:

AWS CLI configured with appropriate permissions
A VPC with public subnets already created - If you haven't set this up yet, please refer to Part 1 of this series where we cover the complete VPC setup including subnets, internet gateways, and route tables
IAM roles properly configured - The ec2_instance_role-profile referenced in our Launch Template was created in Part 1. If you skipped Part 1, you'll need to set up the necessary IAM roles and instance profiles
Basic understanding of AWS ECS, EC2, and Auto Scaling concepts

💡 Note: This guide assumes you have the VPC ID ($VPC_ID) and public subnet ID ($PUBLIC_SUBNET) from Part 1. If you need to retrieve these values, refer to the VPC setup section in Part 1.

Step 1: Create Security Infrastructure

Generate SSH Key Pair

First, let's create a key pair for secure access to our EC2 instances:

aws ec2 create-key-pair \
    --key-name ECS_Instance_Key \
    --tag-specifications 'ResourceType=key-pair,Tags=[{Key=Name,Value=ECS_Instance_Key}]' \
    --query 'KeyMaterial' \
    --output text > ECSInstanceKey.pem

# Secure the key file
chmod 400 ECSInstanceKey.pem

Create Security Group

Next, we'll create a security group to control network access. This security group will be associated with the VPC we created in Part 1:

ECS_SG_ID=$(aws ec2 create-security-group \
    --tag-specifications 'ResourceType=security-group,Tags=[{Key=Name,Value=ECS_Instance_SG}]' \
    --vpc-id $VPC_ID \
    --group-name ECS_Instance_SG \
    --description "SG for ECS Instance" \
    --query "GroupId" \
    --output text)

# Add tags to the security group
aws ec2 create-tags --resources $ECS_SG_ID --cli-input-json file://tags.json

📝 Reminder: The $VPC_ID variable should contain the VPC ID from Part 1. If you need to find your VPC ID, you can use the command provided in Part 1 or run: aws ec2 describe-vpcs --filters "Name=tag:Name,Values=your-vpc-name"

Configure Security Group Rules

⚠️ Security Notice: The following rule allows SSH access from anywhere on the internet. For production environments, restrict this to your specific IP range or use AWS Systems Manager Session Manager for more secure access.

aws ec2 authorize-security-group-ingress \
    --group-id $ECS_SG_ID \
    --protocol tcp \
    --port 22 \
    --cidr 0.0.0.0/0

Step 2: Prepare User Data Script

Create a user data script that configures the ECS agent with GPU support:

#!/bin/bash
echo ECS_CLUSTER=docling-ecs-cluster >> /etc/ecs/ecs.config
echo ECS_BACKEND_HOST=https://ecs.us-east-1.amazonaws.com >> /etc/ecs/ecs.config
echo ECS_ENABLE_GPU_SUPPORT=true >> /etc/ecs/ecs.config

Save this as user-data.sh and encode it in base64:

base64 user-data.sh

Step 3: Get the Optimal GPU AMI

AWS provides optimized AMIs for ECS with GPU support. Let's fetch the latest recommended AMI ID:

aws ssm get-parameters \
    --names /aws/service/ecs/optimized-ami/amazon-linux-2/gpu/recommended \
    --region $(aws configure get region)

Step 4: Create Launch Template

Launch Templates provide a way to store launch parameters so that you don't have to specify them every time you launch an instance. Create a JSON file called ec2-launch-template.json:

{
  "ImageId": "ami-0372b2cc554a36da2",
  "InstanceType": "g4dn.xlarge",
  "KeyName": "ECS_Instance_Key",
  "IamInstanceProfile": {
    "Name": "ec2_instance_role-profile"
  },
  "NetworkInterfaces": [
    {
      "AssociatePublicIpAddress": true,
      "DeleteOnTermination": true,
      "DeviceIndex": 0,
      "SubnetId": "<subnet id of public subnet",
      "Groups": ["value of $ECS_SG_ID"]
    }
  ],
  "UserData": "<replace_with_base64_encoded_user-data.sh>",
  "BlockDeviceMappings": [
    {
      "DeviceName": "/dev/xvda",
      "Ebs": {
        "VolumeSize": 30,
        "VolumeType": "gp3",
        "DeleteOnTermination": true,
        "Encrypted": true
      }
    }
  ],
  "TagSpecifications": [
    {
      "ResourceType": "instance",
      "Tags": [
        {
          "Key": "Name",
          "Value": "ECS-Instance"
        }
      ]
    },
    {
      "ResourceType": "volume",
      "Tags": [
        {
          "Key": "Name",
          "Value": "ECS-Instance-Volume"
        }
      ]
    }
  ],
  "Monitoring": {
    "Enabled": true
  },
  "MetadataOptions": {
    "HttpTokens": "required",
    "HttpPutResponseHopLimit": 2,
    "HttpEndpoint": "enabled"
  }
}

⚠️ Important Configuration Notes:

Replace the SubnetId with your actual public subnet ID from Part 1

Replace the Groups array with your actual security group ID (the $ECS_SG_ID we just created)

The IamInstanceProfile name (ec2_instance_role-profile) was created in Part 1 - make sure this matches your actual IAM instance profile name

Key Configuration Highlights

GPU Instance Type: g4dn.xlarge provides NVIDIA T4 GPU support
EBS Encryption: Enabled for data security
Enhanced Monitoring: Enabled for better observability
IMDSv2: Enforced for improved instance metadata security
GP3 Storage: Latest generation EBS for better price/performance

Now create the launch template:

EC2_LAUNCH_TEMPLATE_ID=$(aws ec2 create-launch-template \
    --launch-template-name DoclingLaunchTemplate \
    --tag-specifications 'ResourceType=launch-template,Tags=[{Key=Name,Value=ECS_EC2_Launch_Template}]' \
    --launch-template-data file://ec2-launch-template.json \
    --query "LaunchTemplate.LaunchTemplateId" \
    --output text)

# Add additional tags
aws ec2 create-tags --resources $EC2_LAUNCH_TEMPLATE_ID --cli-input-json file://tags.json

Step 5: Create ECS Cluster

Important: Create the ECS cluster before launching EC2 instances. The ECS agent on the instances needs an existing cluster to register with.

aws ecs create-cluster \
    --cluster-name docling-ecs-cluster \
    --tags key=Name,value=doclingECSCluster

# Get cluster ARN for tagging
DOCLING_CLUSTER_ARN=$(aws ecs describe-clusters \
    --clusters docling-ecs-cluster \
    --query "clusters[].clusterArn" \
    --output text)

# Add additional tags
aws ecs tag-resource \
    --resource-arn $DOCLING_CLUSTER_ARN \
    --tags file://cluster-tags.json

Step 6: Create Auto Scaling Group

Create the Auto Scaling Group with zero desired capacity initially. This ASG will use the public subnet we configured in Part 1:

aws autoscaling create-auto-scaling-group \
    --auto-scaling-group-name ECS_Asg \
    --launch-template LaunchTemplateId=$EC2_LAUNCH_TEMPLATE_ID,Version='$Latest' \
    --min-size 0 \
    --max-size 0 \
    --desired-capacity 0 \
    --vpc-zone-identifier $PUBLIC_SUBNET \
    --tags Key=Name,Value=ECS_AutoScaling

📋 Note: The $PUBLIC_SUBNET variable should contain the subnet ID from Part 1. If you need to retrieve your subnet ID, refer to the VPC setup section in Part 1.

Configure Auto Scaling Group Tags

Create an asg-tags.json file for propagating tags to launched instances:

[
   {
        "ResourceId": "ECS_Asg",
        "ResourceType": "auto-scaling-group",
        "Key": "Purpose",
        "Value": "DoclingSetup",
        "PropagateAtLaunch": true
    },
    {
        "ResourceId": "ECS_Asg",
        "ResourceType": "auto-scaling-group",
        "Key": "Environment",
        "Value": "Dev",
        "PropagateAtLaunch": true
    }
]

Apply the tags:

aws autoscaling create-or-update-tags --tags file://asg-tags.json

Step 7: Testing and Verification

Launch an Instance

Scale up the Auto Scaling Group to launch an instance:

aws autoscaling update-auto-scaling-group \
    --auto-scaling-group-name ECS_Asg \
    --min-size 1 \
    --max-size 1 \
    --desired-capacity 1

Monitor Scaling Activities

Check the status of the launch activity:

aws autoscaling describe-scaling-activities \
    --auto-scaling-group-name ECS_Asg \
    --query 'Activities[?StatusCode==`InProgress`]'

Verify Tag Propagation

Confirm that tags from the ASG were propagated to the EC2 instance:

aws ec2 describe-instances --filters "Name=tag-key,Values=Purpose"

Get Instance IP Address

Find the IP address of the launched EC2 instance for SSH access:

aws ec2 describe-instances \
    --filters "Name=instance-state-name,Values=running" "Name=tag-key,Values=Purpose" \
    --query "Reservations[*].Instances[*].[InstanceId,InstanceType,PrivateIpAddress,PublicIpAddress]" \
    --output table

This command will display a table showing the Instance ID, Instance Type, Private IP Address, and Public IP Address of all running instances tagged with the "Purpose" key. Make note of the Public IP Address as you'll need it for SSH access in the next step.

Verify ECS Agent

SSH into the launched instance and check the ECS agent status:

# SSH into the instance using the generated key
ssh -i ECSInstanceKey.pem ec2-user@<INSTANCE_PUBLIC_IP>

# Check ECS agent status
sudo systemctl status ecs

# Verify ECS agent container is running
sudo docker ps

Verify Cluster Registration

Check if the instance successfully registered with the ECS cluster:

aws ecs list-container-instances --cluster docling-ecs-cluster

Best Practices and Production Considerations

Security Enhancements

Restrict SSH Access: Replace 0.0.0.0/0 with your specific IP range
Use AWS Systems Manager: Consider Session Manager for secure shell access
Enable VPC Flow Logs: Monitor network traffic for security analysis
Use Secrets Manager: Store sensitive configuration data securely

Monitoring and Logging

CloudWatch Container Insights: Enable for comprehensive ECS monitoring
Custom Metrics: Set up custom CloudWatch metrics for your applications
Log Aggregation: Use CloudWatch Logs or a centralized logging solution

Cost Optimization

Spot Instances: Consider using Spot Instances for cost savings
Mixed Instance Types: Use multiple instance types in your ASG
Scheduled Scaling: Implement time-based scaling policies
ECS Capacity Providers: Use for automatic scaling based on resource utilization

Conclusion

You've successfully created a scalable, GPU-enabled ECS infrastructure using Auto Scaling Groups and Launch Templates. This setup builds upon the foundational networking and IAM infrastructure we established in Part 1 and provides:

Automated scaling based on demand
Consistent instance configuration through Launch Templates
Proper tagging for resource management and cost allocation
GPU support for compute-intensive workloads
High availability and fault tolerance
Integration with the VPC and security architecture from Part 1

The infrastructure is now ready to host containerized applications that require GPU processing power, with the flexibility to scale automatically based on your workload demands.

What's Next?

In Part 3 of this series, we'll cover:

Creating and deploying ECS Task Definitions
Setting up ECS Services for your applications
Implementing Application Load Balancer for traffic distribution
Advanced ECS configurations and monitoring

Stay tuned for the final part where we'll bring everything together with actual application deployment!

Series Navigation

Part 1: Foundation - Networking & IAM - VPC setup, subnets, security groups, and IAM roles
Part 2: ECS EC2 with Auto Scaling (Current) - Launch templates, Auto Scaling Groups, and ECS cluster setup
Part 3: Application Deployment - Task definitions, services, and load balancers

Next Steps

Consider implementing:

ECS Services and Task Definitions for your applications
Application Load Balancer for distributing traffic

This foundation, combined with the networking setup from Part 1, provides a robust starting point for deploying docling to AWS ECS cluster as containerized applications on AWS ECS with GPU support.

DEV Community