DEV Community

bikash119
bikash119

Posted on

Deploying GPU-Enabled ECS EC2 Instances with Auto Scaling Groups and Launch Templates

This is Part 2 of a 3-part series on Deploy Docling to AWS ECS infrastructure. In Part 1, we covered the foundational networking and IAM setup required for this deployment.

Setting up Amazon ECS (Elastic Container Service) with EC2 instances can be complex, especially when you need GPU support for compute-intensive workloads. In this comprehensive guide, we'll walk through creating a robust, scalable ECS infrastructure using Auto Scaling Groups (ASG) and Launch Templates, specifically configured for GPU workloads.

Why Use Auto Scaling Groups with ECS?

Auto Scaling Groups provide several key benefits for ECS deployments:

  • Automatic scaling based on demand and health checks
  • High availability across multiple availability zones
  • Cost optimization by scaling down during low usage periods
  • Consistent tagging and configuration through Launch Templates
  • Integration with ECS Capacity Providers for seamless container orchestration

Prerequisites

Before we begin, ensure you have:

  • AWS CLI configured with appropriate permissions
  • A VPC with public subnets already created - If you haven't set this up yet, please refer to Part 1 of this series where we cover the complete VPC setup including subnets, internet gateways, and route tables
  • IAM roles properly configured - The ec2_instance_role-profile referenced in our Launch Template was created in Part 1. If you skipped Part 1, you'll need to set up the necessary IAM roles and instance profiles
  • Basic understanding of AWS ECS, EC2, and Auto Scaling concepts

💡 Note: This guide assumes you have the VPC ID ($VPC_ID) and public subnet ID ($PUBLIC_SUBNET) from Part 1. If you need to retrieve these values, refer to the VPC setup section in Part 1.

Step 1: Create Security Infrastructure

Generate SSH Key Pair

First, let's create a key pair for secure access to our EC2 instances:

aws ec2 create-key-pair \
    --key-name ECS_Instance_Key \
    --tag-specifications 'ResourceType=key-pair,Tags=[{Key=Name,Value=ECS_Instance_Key}]' \
    --query 'KeyMaterial' \
    --output text > ECSInstanceKey.pem

# Secure the key file
chmod 400 ECSInstanceKey.pem
Enter fullscreen mode Exit fullscreen mode

Create Security Group

Next, we'll create a security group to control network access. This security group will be associated with the VPC we created in Part 1:

ECS_SG_ID=$(aws ec2 create-security-group \
    --tag-specifications 'ResourceType=security-group,Tags=[{Key=Name,Value=ECS_Instance_SG}]' \
    --vpc-id $VPC_ID \
    --group-name ECS_Instance_SG \
    --description "SG for ECS Instance" \
    --query "GroupId" \
    --output text)

# Add tags to the security group
aws ec2 create-tags --resources $ECS_SG_ID --cli-input-json file://tags.json
Enter fullscreen mode Exit fullscreen mode

📝 Reminder: The $VPC_ID variable should contain the VPC ID from Part 1. If you need to find your VPC ID, you can use the command provided in Part 1 or run: aws ec2 describe-vpcs --filters "Name=tag:Name,Values=your-vpc-name"

Configure Security Group Rules

⚠️ Security Notice: The following rule allows SSH access from anywhere on the internet. For production environments, restrict this to your specific IP range or use AWS Systems Manager Session Manager for more secure access.

aws ec2 authorize-security-group-ingress \
    --group-id $ECS_SG_ID \
    --protocol tcp \
    --port 22 \
    --cidr 0.0.0.0/0
Enter fullscreen mode Exit fullscreen mode

Step 2: Prepare User Data Script

Create a user data script that configures the ECS agent with GPU support:

#!/bin/bash
echo ECS_CLUSTER=docling-ecs-cluster >> /etc/ecs/ecs.config
echo ECS_BACKEND_HOST=https://ecs.us-east-1.amazonaws.com >> /etc/ecs/ecs.config
echo ECS_ENABLE_GPU_SUPPORT=true >> /etc/ecs/ecs.config
Enter fullscreen mode Exit fullscreen mode

Save this as user-data.sh and encode it in base64:

base64 user-data.sh
Enter fullscreen mode Exit fullscreen mode

Step 3: Get the Optimal GPU AMI

AWS provides optimized AMIs for ECS with GPU support. Let's fetch the latest recommended AMI ID:

aws ssm get-parameters \
    --names /aws/service/ecs/optimized-ami/amazon-linux-2/gpu/recommended \
    --region $(aws configure get region)
Enter fullscreen mode Exit fullscreen mode

Step 4: Create Launch Template

Launch Templates provide a way to store launch parameters so that you don't have to specify them every time you launch an instance. Create a JSON file called ec2-launch-template.json:

{
  "ImageId": "ami-0372b2cc554a36da2",
  "InstanceType": "g4dn.xlarge",
  "KeyName": "ECS_Instance_Key",
  "IamInstanceProfile": {
    "Name": "ec2_instance_role-profile"
  },
  "NetworkInterfaces": [
    {
      "AssociatePublicIpAddress": true,
      "DeleteOnTermination": true,
      "DeviceIndex": 0,
      "SubnetId": "<subnet id of public subnet",
      "Groups": ["value of $ECS_SG_ID"]
    }
  ],
  "UserData": "<replace_with_base64_encoded_user-data.sh>",
  "BlockDeviceMappings": [
    {
      "DeviceName": "/dev/xvda",
      "Ebs": {
        "VolumeSize": 30,
        "VolumeType": "gp3",
        "DeleteOnTermination": true,
        "Encrypted": true
      }
    }
  ],
  "TagSpecifications": [
    {
      "ResourceType": "instance",
      "Tags": [
        {
          "Key": "Name",
          "Value": "ECS-Instance"
        }
      ]
    },
    {
      "ResourceType": "volume",
      "Tags": [
        {
          "Key": "Name",
          "Value": "ECS-Instance-Volume"
        }
      ]
    }
  ],
  "Monitoring": {
    "Enabled": true
  },
  "MetadataOptions": {
    "HttpTokens": "required",
    "HttpPutResponseHopLimit": 2,
    "HttpEndpoint": "enabled"
  }
}
Enter fullscreen mode Exit fullscreen mode

⚠️ Important Configuration Notes:

  • Replace the SubnetId with your actual public subnet ID from Part 1
  • Replace the Groups array with your actual security group ID (the $ECS_SG_ID we just created)
  • The IamInstanceProfile name (ec2_instance_role-profile) was created in Part 1 - make sure this matches your actual IAM instance profile name

Key Configuration Highlights

  • GPU Instance Type: g4dn.xlarge provides NVIDIA T4 GPU support
  • EBS Encryption: Enabled for data security
  • Enhanced Monitoring: Enabled for better observability
  • IMDSv2: Enforced for improved instance metadata security
  • GP3 Storage: Latest generation EBS for better price/performance

Now create the launch template:

EC2_LAUNCH_TEMPLATE_ID=$(aws ec2 create-launch-template \
    --launch-template-name DoclingLaunchTemplate \
    --tag-specifications 'ResourceType=launch-template,Tags=[{Key=Name,Value=ECS_EC2_Launch_Template}]' \
    --launch-template-data file://ec2-launch-template.json \
    --query "LaunchTemplate.LaunchTemplateId" \
    --output text)

# Add additional tags
aws ec2 create-tags --resources $EC2_LAUNCH_TEMPLATE_ID --cli-input-json file://tags.json
Enter fullscreen mode Exit fullscreen mode

Step 5: Create ECS Cluster

Important: Create the ECS cluster before launching EC2 instances. The ECS agent on the instances needs an existing cluster to register with.

aws ecs create-cluster \
    --cluster-name docling-ecs-cluster \
    --tags key=Name,value=doclingECSCluster

# Get cluster ARN for tagging
DOCLING_CLUSTER_ARN=$(aws ecs describe-clusters \
    --clusters docling-ecs-cluster \
    --query "clusters[].clusterArn" \
    --output text)

# Add additional tags
aws ecs tag-resource \
    --resource-arn $DOCLING_CLUSTER_ARN \
    --tags file://cluster-tags.json
Enter fullscreen mode Exit fullscreen mode

Step 6: Create Auto Scaling Group

Create the Auto Scaling Group with zero desired capacity initially. This ASG will use the public subnet we configured in Part 1:

aws autoscaling create-auto-scaling-group \
    --auto-scaling-group-name ECS_Asg \
    --launch-template LaunchTemplateId=$EC2_LAUNCH_TEMPLATE_ID,Version='$Latest' \
    --min-size 0 \
    --max-size 0 \
    --desired-capacity 0 \
    --vpc-zone-identifier $PUBLIC_SUBNET \
    --tags Key=Name,Value=ECS_AutoScaling
Enter fullscreen mode Exit fullscreen mode

📋 Note: The $PUBLIC_SUBNET variable should contain the subnet ID from Part 1. If you need to retrieve your subnet ID, refer to the VPC setup section in Part 1.

Configure Auto Scaling Group Tags

Create an asg-tags.json file for propagating tags to launched instances:

[
   {
        "ResourceId": "ECS_Asg",
        "ResourceType": "auto-scaling-group",
        "Key": "Purpose",
        "Value": "DoclingSetup",
        "PropagateAtLaunch": true
    },
    {
        "ResourceId": "ECS_Asg",
        "ResourceType": "auto-scaling-group",
        "Key": "Environment",
        "Value": "Dev",
        "PropagateAtLaunch": true
    }
]
Enter fullscreen mode Exit fullscreen mode

Apply the tags:

aws autoscaling create-or-update-tags --tags file://asg-tags.json
Enter fullscreen mode Exit fullscreen mode

Step 7: Testing and Verification

Launch an Instance

Scale up the Auto Scaling Group to launch an instance:

aws autoscaling update-auto-scaling-group \
    --auto-scaling-group-name ECS_Asg \
    --min-size 1 \
    --max-size 1 \
    --desired-capacity 1
Enter fullscreen mode Exit fullscreen mode

Monitor Scaling Activities

Check the status of the launch activity:

aws autoscaling describe-scaling-activities \
    --auto-scaling-group-name ECS_Asg \
    --query 'Activities[?StatusCode==`InProgress`]'
Enter fullscreen mode Exit fullscreen mode

Verify Tag Propagation

Confirm that tags from the ASG were propagated to the EC2 instance:

aws ec2 describe-instances --filters "Name=tag-key,Values=Purpose"
Enter fullscreen mode Exit fullscreen mode

Get Instance IP Address

Find the IP address of the launched EC2 instance for SSH access:

aws ec2 describe-instances \
    --filters "Name=instance-state-name,Values=running" "Name=tag-key,Values=Purpose" \
    --query "Reservations[*].Instances[*].[InstanceId,InstanceType,PrivateIpAddress,PublicIpAddress]" \
    --output table
Enter fullscreen mode Exit fullscreen mode

This command will display a table showing the Instance ID, Instance Type, Private IP Address, and Public IP Address of all running instances tagged with the "Purpose" key. Make note of the Public IP Address as you'll need it for SSH access in the next step.

Verify ECS Agent

SSH into the launched instance and check the ECS agent status:

# SSH into the instance using the generated key
ssh -i ECSInstanceKey.pem ec2-user@<INSTANCE_PUBLIC_IP>

# Check ECS agent status
sudo systemctl status ecs

# Verify ECS agent container is running
sudo docker ps
Enter fullscreen mode Exit fullscreen mode

Verify Cluster Registration

Check if the instance successfully registered with the ECS cluster:

aws ecs list-container-instances --cluster docling-ecs-cluster
Enter fullscreen mode Exit fullscreen mode

Best Practices and Production Considerations

Security Enhancements

  1. Restrict SSH Access: Replace 0.0.0.0/0 with your specific IP range
  2. Use AWS Systems Manager: Consider Session Manager for secure shell access
  3. Enable VPC Flow Logs: Monitor network traffic for security analysis
  4. Use Secrets Manager: Store sensitive configuration data securely

Monitoring and Logging

  1. CloudWatch Container Insights: Enable for comprehensive ECS monitoring
  2. Custom Metrics: Set up custom CloudWatch metrics for your applications
  3. Log Aggregation: Use CloudWatch Logs or a centralized logging solution

Cost Optimization

  1. Spot Instances: Consider using Spot Instances for cost savings
  2. Mixed Instance Types: Use multiple instance types in your ASG
  3. Scheduled Scaling: Implement time-based scaling policies
  4. ECS Capacity Providers: Use for automatic scaling based on resource utilization

Conclusion

You've successfully created a scalable, GPU-enabled ECS infrastructure using Auto Scaling Groups and Launch Templates. This setup builds upon the foundational networking and IAM infrastructure we established in Part 1 and provides:

  • Automated scaling based on demand
  • Consistent instance configuration through Launch Templates
  • Proper tagging for resource management and cost allocation
  • GPU support for compute-intensive workloads
  • High availability and fault tolerance
  • Integration with the VPC and security architecture from Part 1

The infrastructure is now ready to host containerized applications that require GPU processing power, with the flexibility to scale automatically based on your workload demands.

What's Next?

In Part 3 of this series, we'll cover:

  • Creating and deploying ECS Task Definitions
  • Setting up ECS Services for your applications
  • Implementing Application Load Balancer for traffic distribution
  • Advanced ECS configurations and monitoring

Stay tuned for the final part where we'll bring everything together with actual application deployment!

Series Navigation

Next Steps

Consider implementing:

  • ECS Services and Task Definitions for your applications
  • Application Load Balancer for distributing traffic

This foundation, combined with the networking setup from Part 1, provides a robust starting point for deploying docling to AWS ECS cluster as containerized applications on AWS ECS with GPU support.

Top comments (0)