This is Part 3 (Final) of our 3-part series on docling deployment to complete AWS ECS infrastructure. In Part 1, we set up the foundational networking and IAM, and in Part 2, we created the ECS cluster with Auto Scaling Groups and Launch Templates. Now we'll deploy our actual application and make it accessible through an Application Load Balancer.
Welcome to the final part of our journey to deploy docling to AWS ECS infrastructure! In this comprehensive guide, we'll deploy the docling application (a GPU-accelerated document processing service) on our ECS infrastructure and expose it to the internet using an Application Load Balancer (ALB). We'll also explore the core concepts of load balancing through an intuitive restaurant analogy.
Understanding Application Load Balancer Components
Before diving into the implementation, let's understand how Application Load Balancers work using a restaurant analogy:
Core Concepts π½οΈ
Load Balancer
Think of the Load Balancer as the restaurant owner whose primary responsibility is to serve customers efficiently. The restaurant owner ensures that customers have a great dining experience and that the restaurant runs smoothly.
Listener
The Listener is like a host/hostess hired by the restaurant owner with specific instructions on which customer requests should be served where. For example:
- If a customer requests ice cream β direct them to the ice cream corner
- If a customer wants drinks β direct them to the bar area
- If a family arrives β guide them to the family seating section
Target Group
Target Groups are like groups of waiters for each section:
- Waiters at the ice cream corner
- Bartenders at the bar
- Waiters in the family seating area
- Each specialized group of staff forms a "Target Group"
Register Targets
Registering Targets is the process where waiters register themselves with their respective target groups, letting the system know they're available to serve customers in their designated area.
This analogy helps us understand how ALB distributes incoming traffic (customers) to the right backend services (waiters) based on configured rules (host instructions).
What We're Building
In this final part, we'll:
- Create and register ECS Task Definitions for the Docling application
- Set up ECS Services to manage our containers
- Configure an Application Load Balancer for external access
- Establish proper networking and security group rules
- Test our complete GPU-enabled document processing service
Prerequisites
Make sure you've completed:
- Part 1: Foundation - Networking & IAM - VPC, subnets, and IAM roles
- Part 2: ECS EC2 with Auto Scaling - ECS cluster, Launch Templates, and Auto Scaling Groups
You should have the following from previous parts:
-
$VPC_ID
- VPC ID from Part 1 -
$PUBLIC_SUBNET
and$PRIVATE_SUBNET
- Subnet IDs from Part 1 -
$ECS_SG_ID
- Security Group ID from Part 2 -
docling-ecs-cluster
- ECS cluster from Part 2 -
ECS_Asg
- Auto Scaling Group from Part 2
Step 1: Create ECS Task Definition
The Task Definition is like a blueprint that tells ECS how to run our Docling container. Create a file called docling-task-definition.json
:
{
"family": "docling-nvidia",
"networkMode": "host",
"requiresCompatibilities": ["EC2"],
"executionRoleArn": "arn:aws:iam::<YOUR_ACCOUNT_ID>:role/ecs_task_exec_role",
"taskRoleArn": "arn:aws:iam::<YOUR_ACCOUNT_ID>:role/ecs_task_role",
"containerDefinitions": [
{
"name": "docling-serve",
"image": "ghcr.io/docling-project/docling-serve-cu126:main",
"essential": true,
"portMappings": [
{
"containerPort": 5001,
"hostPort": 5001,
"protocol": "tcp"
}
],
"environment": [
{
"name": "DOCLING_SERVE_ENABLE_UI",
"value": "true"
}
],
"resourceRequirements": [
{
"value": "1",
"type": "GPU"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/docling-serve-nvidia",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs",
"awslogs-create-group": "true"
}
},
"linuxParameters": {
"capabilities": {
"add": ["SYS_ADMIN"]
}
}
}
],
"cpu": "2048",
"memory": "8192"
}
β οΈ Important: Replace
<YOUR_ACCOUNT_ID>
with your actual AWS account ID and ensure the role names match those created in Part 1.
Key Configuration Highlights
-
GPU Support:
resourceRequirements
specifies 1 GPU allocation -
Network Mode:
host
mode allows direct access to the EC2 instance's network - CloudWatch Logs: Automatic log group creation for monitoring
- Resource Allocation: 2 vCPUs and 8GB RAM for GPU-intensive processing
Register the Task Definition
TASK_ARN=$(aws ecs register-task-definition \
--cli-input-json file://docling-task-definition.json \
--query "taskDefinition.taskDefinitionArn" \
--output text)
echo "Task Definition ARN: $TASK_ARN"
Add Tags to Task Definition
aws ecs tag-resource --resource-arn $TASK_ARN --tags key=Name,value=ECS_Docling_Task
aws ecs tag-resource --resource-arn $TASK_ARN --tags file://cluster-tags.json
Verify Tags
aws ecs list-tags-for-resource --resource-arn $TASK_ARN
Step 2: Create ECS Service
ECS Services ensure that your desired number of tasks are running and healthy. Create docling-service-definition.json
:
{
"serviceName": "docling-serve",
"cluster": "docling-ecs-cluster",
"taskDefinition": "docling-nvidia:2",
"desiredCount": 0,
"launchType": "EC2"
}
π Note: We start with
desiredCount: 0
to prevent tasks from starting before we have EC2 instances available.
Create the Service
SERVICE_ARN=$(aws ecs create-service \
--cli-input-json file://docling-service-definition.json \
--query "service.serviceArn" \
--output text)
echo "Service ARN: $SERVICE_ARN"
Add Tags to Service
aws ecs tag-resource --resource-arn $SERVICE_ARN --tags file://cluster-tags.json
aws ecs tag-resource --resource-arn $SERVICE_ARN --tags key=Name,value=ECS_Docling_Service
Verify Service Tags
aws ecs list-tags-for-resource --resource-arn $SERVICE_ARN
Step 3: Test Basic Service Deployment
Before setting up the load balancer, let's verify our service works correctly.
Launch EC2 Instance
Scale up the Auto Scaling Group to launch an instance (this uses the configuration from Part 2):
aws autoscaling update-auto-scaling-group \
--auto-scaling-group-name ECS_Asg \
--min-size 1 \
--max-size 1 \
--desired-capacity 1
Start the Service
Update the service to run one task:
aws ecs update-service \
--cluster docling-ecs-cluster \
--service docling-serve \
--desired-count 1
Verify Container is Running
SSH into the instance and check the container status:
# Get the instance IP (from Part 2's command)
aws ec2 describe-instances \
--filters "Name=instance-state-name,Values=running" "Name=tag-key,Values=Purpose" \
--query "Reservations[*].Instances[*].[InstanceId,InstanceType,PrivateIpAddress,PublicIpAddress]" \
--output table
# SSH into the instance
ssh -i ECSInstanceKey.pem ec2-user@<INSTANCE_PUBLIC_IP>
# Check running containers
sudo docker ps
# View container logs
sudo docker logs <container_id> -f
You should see the Docling application starting up and utilizing the GPU.
Step 4: Set Up Application Load Balancer
Now let's create the ALB infrastructure to make our application accessible from the internet.
Create ALB Security Group
ALB_SG_ID=$(aws ec2 create-security-group \
--tag-specifications 'ResourceType=security-group,Tags=[{Key=Name,Value=ALB_SG}]' \
--vpc-id $VPC_ID \
--group-name ALB_SG \
--description "SG for ALB" \
--query "GroupId" \
--output text)
echo "ALB Security Group ID: $ALB_SG_ID"
Allow Internet Traffic to ALB
aws ec2 authorize-security-group-ingress \
--group-id $ALB_SG_ID \
--protocol tcp \
--port 5001 \
--cidr 0.0.0.0/0
Create the Load Balancer
Remember our restaurant analogy - this creates the "restaurant owner":
DOCLING_ALB=$(aws elbv2 create-load-balancer \
--name docling-alb \
--subnets $PRIVATE_SUBNET $PUBLIC_SUBNET \
--security-groups $ALB_SG_ID \
--scheme internet-facing \
--type application \
--tags Key=Name,Value=DoclingALB \
--query "LoadBalancers[].LoadBalancerArn" \
--output text)
echo "Load Balancer ARN: $DOCLING_ALB"
π Note: The load balancer spans both private and public subnets from Part 1 for high availability.
Add Tags to Load Balancer
aws elbv2 add-tags --resource-arns $DOCLING_ALB --tags file://alb-tags.json
Step 5: Create Target Group
In our restaurant analogy, this creates the "group of waiters" that will serve our customers:
DOCLING_TARGET_GRP=$(aws elbv2 create-target-group \
--name docling-targets \
--protocol HTTP \
--port 5001 \
--vpc-id $VPC_ID \
--health-check-path /docs \
--target-type ip \
--tags Key=Name,Value=doclingTargetGroup \
--query "TargetGroups[].TargetGroupArn" \
--output text)
echo "Target Group ARN: $DOCLING_TARGET_GRP"
Key Target Group Configuration
- target-type ip: Uses IP addresses rather than instance IDs
- health-check-path /docs: Docling's health check endpoint
- port 5001: The port our application listens on
Add Tags to Target Group
aws elbv2 add-tags --resource-arns $DOCLING_TARGET_GRP --tags file://alb-tags.json
Update Service to use Target Group
Now we need to update our ECS service to integrate with the Application Load Balancer. Create a file called service-alb.json
:
[
{
"targetGroupArn": "<TARGET_GROUP_ARN>",
"containerName": "docling-serve",
"containerPort": 5001
}
]
β οΈ Important: Replace
<TARGET_GROUP_ARN>
with the actual Target Group ARN we created above (stored in$DOCLING_TARGET_GRP
).
Update the service to use the load balancer:
aws ecs update-service \
--cluster docling-ecs-cluster \
--service docling-serve \
--load-balancers file://service-alb.json
echo "Service updated to use ALB integration"
This step is crucial as it tells ECS to automatically register and deregister tasks with the target group as they start and stop, eliminating the need for manual target registration.
Step 6: Create Listener
The listener is our "host/hostess" that directs traffic to the right place:
DOCLING_LISTENER=$(aws elbv2 create-listener \
--load-balancer-arn $DOCLING_ALB \
--protocol HTTP \
--port 5001 \
--default-actions Type=forward,TargetGroupArn=$DOCLING_TARGET_GRP \
--tags Key=Name,Value=DoclingListener \
--query "Listeners[].ListenerArn" \
--output text)
echo "Listener ARN: $DOCLING_LISTENER"
Add Tags to Listener
aws elbv2 add-tags --resource-arns $DOCLING_LISTENER --tags file://alb-tags.json
Step 7: Configure Security Rules
Allow ALB to Reach EC2 Instances
Update the EC2 security group (from Part 2) to allow traffic from the ALB:
aws ec2 authorize-security-group-ingress \
--group-id $ECS_SG_ID \
--protocol tcp \
--port 5001 \
--source-group $ALB_SG_ID
This creates a security rule allowing our "restaurant owner" (ALB) to communicate with our "kitchen" (EC2 instances).
Step 8: Final Testing and Verification
Check Target Health
Verify that our target is healthy:
aws elbv2 describe-target-health --target-group-arn $DOCLING_TARGET_GRP
You should see the target status as healthy
once the health checks pass.
Get Load Balancer DNS Name
ALB_DNS=$(aws elbv2 describe-load-balancers \
--load-balancer-arns $DOCLING_ALB \
--query "LoadBalancers[].DNSName" \
--output text)
echo "Access your application at: http://$ALB_DNS:5001"
Test the Application
Open your browser and navigate to http://<ALB_DNS>:5001/ui
. You should see the Docling web interface!
You can also test the API endpoint:
curl http://$ALB_DNS:5001/docs
Monitor Application Logs
Check the application logs through CloudWatch or directly on the instance:
# On the EC2 instance
sudo docker logs <container_id> -f
# Or check CloudWatch Logs
aws logs describe-log-groups --log-group-name-prefix "/ecs/docling-serve-nvidia"
Troubleshooting Common Issues
Service Won't Start
- Check if EC2 instances are running and registered with ECS cluster
- Verify task definition has correct IAM roles from Part 1
- Check CloudWatch logs for container startup errors
Can't Access Application
- Verify security group rules allow traffic
- Check target group health status
- Ensure load balancer is in correct subnets from Part 1
GPU Not Available
- Confirm EC2 instance type supports GPU (g4dn.xlarge)
- Check ECS agent configuration includes GPU support
- Verify NVIDIA drivers are installed (should be automatic with ECS GPU AMI)
Conclusion
Congratulations! π You've successfully deployed docling to AWS ECS infrastructure with GPU support. Throughout this 3-part series, we've covered:
What We've Accomplished
Part 1 Foundation: VPC networking, security groups, and IAM roles
Part 2 Infrastructure: ECS cluster, Launch Templates, and Auto Scaling Groups
Part 3 Application: Task definitions, services, and Application Load Balancer
The Complete Architecture
Your infrastructure now includes:
- Scalable GPU Computing: Auto Scaling Groups with GPU-enabled instances
- Container Orchestration: ECS managing Docling containers with resource requirements
- High Availability: Multi-subnet deployment with health checks
- Internet Accessibility: Application Load Balancer with proper security
- Monitoring: CloudWatch integration for logs and metrics
- Security: Layered security groups and IAM roles
Real-World Applications
This architecture is perfect for:
- AI/ML Workloads: GPU-accelerated machine learning inference
- Document Processing: Like our Docling example for document conversion
- Video Processing: GPU-accelerated video transcoding and analysis
- Scientific Computing: High-performance computing workloads
Series Navigation
- Part 1: Foundation - Networking & IAM β - VPC setup, subnets, security groups, and IAM roles
- Part 2: ECS EC2 with Auto Scaling β - Launch templates, Auto Scaling Groups, and ECS cluster setup
- Part 3: Application Deployment (Current) β - Task definitions, services, and load balancers
What's Next?
Consider exploring these advanced topics in future posts:
- High Availability & Scaling: Multi-AZ deployments, auto scaling policies, and health monitoring
- Monitoring & Observability: CloudWatch Container Insights, custom metrics, and distributed tracing
- Cost Optimization: Spot instances, reserved capacity, and right-sizing strategies
Thank you for following along with this comprehensive AWS ECS series. You should now be able to setup the n8n workflow which uses docling deployed to a GPU enabled AWS ECS infrastructure π
Top comments (0)