bikash119

Posted on Sep 8

Deploying Docling Application on ECS with Application Load Balancer

#aws #ecs #tutorial #docling

This is Part 3 (Final) of our 3-part series on docling deployment to complete AWS ECS infrastructure. In Part 1, we set up the foundational networking and IAM, and in Part 2, we created the ECS cluster with Auto Scaling Groups and Launch Templates. Now we'll deploy our actual application and make it accessible through an Application Load Balancer.

Welcome to the final part of our journey to deploy docling to AWS ECS infrastructure! In this comprehensive guide, we'll deploy the docling application (a GPU-accelerated document processing service) on our ECS infrastructure and expose it to the internet using an Application Load Balancer (ALB). We'll also explore the core concepts of load balancing through an intuitive restaurant analogy.

Understanding Application Load Balancer Components

Before diving into the implementation, let's understand how Application Load Balancers work using a restaurant analogy:

Core Concepts 🍽️

Load Balancer

Think of the Load Balancer as the restaurant owner whose primary responsibility is to serve customers efficiently. The restaurant owner ensures that customers have a great dining experience and that the restaurant runs smoothly.

Listener

The Listener is like a host/hostess hired by the restaurant owner with specific instructions on which customer requests should be served where. For example:

If a customer requests ice cream → direct them to the ice cream corner
If a customer wants drinks → direct them to the bar area
If a family arrives → guide them to the family seating section

Target Group

Target Groups are like groups of waiters for each section:

Waiters at the ice cream corner
Bartenders at the bar
Waiters in the family seating area
Each specialized group of staff forms a "Target Group"

Register Targets

Registering Targets is the process where waiters register themselves with their respective target groups, letting the system know they're available to serve customers in their designated area.

This analogy helps us understand how ALB distributes incoming traffic (customers) to the right backend services (waiters) based on configured rules (host instructions).

What We're Building

In this final part, we'll:

Create and register ECS Task Definitions for the Docling application
Set up ECS Services to manage our containers
Configure an Application Load Balancer for external access
Establish proper networking and security group rules
Test our complete GPU-enabled document processing service

Prerequisites

Make sure you've completed:

Part 1: Foundation - Networking & IAM - VPC, subnets, and IAM roles
Part 2: ECS EC2 with Auto Scaling - ECS cluster, Launch Templates, and Auto Scaling Groups

You should have the following from previous parts:

$VPC_ID - VPC ID from Part 1
$PUBLIC_SUBNET and $PRIVATE_SUBNET - Subnet IDs from Part 1
$ECS_SG_ID - Security Group ID from Part 2
docling-ecs-cluster - ECS cluster from Part 2
ECS_Asg - Auto Scaling Group from Part 2

Step 1: Create ECS Task Definition

The Task Definition is like a blueprint that tells ECS how to run our Docling container. Create a file called docling-task-definition.json:

{
  "family": "docling-nvidia",
  "networkMode": "host",
  "requiresCompatibilities": ["EC2"],
  "executionRoleArn": "arn:aws:iam::<YOUR_ACCOUNT_ID>:role/ecs_task_exec_role",
  "taskRoleArn": "arn:aws:iam::<YOUR_ACCOUNT_ID>:role/ecs_task_role",
  "containerDefinitions": [
    {
      "name": "docling-serve",
      "image": "ghcr.io/docling-project/docling-serve-cu126:main",
      "essential": true,
      "portMappings": [
        {
          "containerPort": 5001,
          "hostPort": 5001,
          "protocol": "tcp"
        }
      ],
      "environment": [
        {
          "name": "DOCLING_SERVE_ENABLE_UI",
          "value": "true"
        }
      ],
      "resourceRequirements": [
        {
          "value": "1",
          "type": "GPU"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/docling-serve-nvidia",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs",
          "awslogs-create-group": "true"
        }
      },
      "linuxParameters": {
        "capabilities": {
          "add": ["SYS_ADMIN"]
        }
      }
    }
  ],
  "cpu": "2048",
  "memory": "8192"
}

⚠️ Important: Replace <YOUR_ACCOUNT_ID> with your actual AWS account ID and ensure the role names match those created in Part 1.

Key Configuration Highlights

GPU Support: resourceRequirements specifies 1 GPU allocation
Network Mode: host mode allows direct access to the EC2 instance's network
CloudWatch Logs: Automatic log group creation for monitoring
Resource Allocation: 2 vCPUs and 8GB RAM for GPU-intensive processing

Register the Task Definition

TASK_ARN=$(aws ecs register-task-definition \
    --cli-input-json file://docling-task-definition.json \
    --query "taskDefinition.taskDefinitionArn" \
    --output text)

echo "Task Definition ARN: $TASK_ARN"

Add Tags to Task Definition

aws ecs tag-resource --resource-arn $TASK_ARN --tags key=Name,value=ECS_Docling_Task
aws ecs tag-resource --resource-arn $TASK_ARN --tags file://cluster-tags.json

Verify Tags

aws ecs list-tags-for-resource --resource-arn $TASK_ARN

Step 2: Create ECS Service

ECS Services ensure that your desired number of tasks are running and healthy. Create docling-service-definition.json:

{
  "serviceName": "docling-serve",
  "cluster": "docling-ecs-cluster",
  "taskDefinition": "docling-nvidia:2",
  "desiredCount": 0,
  "launchType": "EC2"
}

📝 Note: We start with desiredCount: 0 to prevent tasks from starting before we have EC2 instances available.

Create the Service

SERVICE_ARN=$(aws ecs create-service \
    --cli-input-json file://docling-service-definition.json \
    --query "service.serviceArn" \
    --output text)

echo "Service ARN: $SERVICE_ARN"

Add Tags to Service

aws ecs tag-resource --resource-arn $SERVICE_ARN --tags file://cluster-tags.json
aws ecs tag-resource --resource-arn $SERVICE_ARN --tags key=Name,value=ECS_Docling_Service

Verify Service Tags

aws ecs list-tags-for-resource --resource-arn $SERVICE_ARN

Step 3: Test Basic Service Deployment

Before setting up the load balancer, let's verify our service works correctly.

Launch EC2 Instance

Scale up the Auto Scaling Group to launch an instance (this uses the configuration from Part 2):

aws autoscaling update-auto-scaling-group \
    --auto-scaling-group-name ECS_Asg \
    --min-size 1 \
    --max-size 1 \
    --desired-capacity 1

Start the Service

Update the service to run one task:

aws ecs update-service \
    --cluster docling-ecs-cluster \
    --service docling-serve \
    --desired-count 1

Verify Container is Running

SSH into the instance and check the container status:

# Get the instance IP (from Part 2's command)
aws ec2 describe-instances \
    --filters "Name=instance-state-name,Values=running" "Name=tag-key,Values=Purpose" \
    --query "Reservations[*].Instances[*].[InstanceId,InstanceType,PrivateIpAddress,PublicIpAddress]" \
    --output table

# SSH into the instance
ssh -i ECSInstanceKey.pem ec2-user@<INSTANCE_PUBLIC_IP>

# Check running containers
sudo docker ps

# View container logs
sudo docker logs <container_id> -f

You should see the Docling application starting up and utilizing the GPU.

Step 4: Set Up Application Load Balancer

Now let's create the ALB infrastructure to make our application accessible from the internet.

Create ALB Security Group

ALB_SG_ID=$(aws ec2 create-security-group \
    --tag-specifications 'ResourceType=security-group,Tags=[{Key=Name,Value=ALB_SG}]' \
    --vpc-id $VPC_ID \
    --group-name ALB_SG \
    --description "SG for ALB" \
    --query "GroupId" \
    --output text)

echo "ALB Security Group ID: $ALB_SG_ID"

Allow Internet Traffic to ALB

aws ec2 authorize-security-group-ingress \
    --group-id $ALB_SG_ID \
    --protocol tcp \
    --port 5001 \
    --cidr 0.0.0.0/0

Create the Load Balancer

Remember our restaurant analogy - this creates the "restaurant owner":

DOCLING_ALB=$(aws elbv2 create-load-balancer \
    --name docling-alb \
    --subnets $PRIVATE_SUBNET $PUBLIC_SUBNET \
    --security-groups $ALB_SG_ID \
    --scheme internet-facing \
    --type application \
    --tags Key=Name,Value=DoclingALB \
    --query "LoadBalancers[].LoadBalancerArn" \
    --output text)

echo "Load Balancer ARN: $DOCLING_ALB"

📋 Note: The load balancer spans both private and public subnets from Part 1 for high availability.

Add Tags to Load Balancer

aws elbv2 add-tags --resource-arns $DOCLING_ALB --tags file://alb-tags.json

Step 5: Create Target Group

In our restaurant analogy, this creates the "group of waiters" that will serve our customers:

DOCLING_TARGET_GRP=$(aws elbv2 create-target-group \
    --name docling-targets \
    --protocol HTTP \
    --port 5001 \
    --vpc-id $VPC_ID \
    --health-check-path /docs \
    --target-type ip \
    --tags Key=Name,Value=doclingTargetGroup \
    --query "TargetGroups[].TargetGroupArn" \
    --output text)

echo "Target Group ARN: $DOCLING_TARGET_GRP"

Key Target Group Configuration

target-type ip: Uses IP addresses rather than instance IDs
health-check-path /docs: Docling's health check endpoint
port 5001: The port our application listens on

Add Tags to Target Group

aws elbv2 add-tags --resource-arns $DOCLING_TARGET_GRP --tags file://alb-tags.json

Update Service to use Target Group

Now we need to update our ECS service to integrate with the Application Load Balancer. Create a file called service-alb.json:

[
  {
    "targetGroupArn": "<TARGET_GROUP_ARN>",
    "containerName": "docling-serve",
    "containerPort": 5001
  }
]

⚠️ Important: Replace <TARGET_GROUP_ARN> with the actual Target Group ARN we created above (stored in $DOCLING_TARGET_GRP).

Update the service to use the load balancer:

aws ecs update-service \
    --cluster docling-ecs-cluster \
    --service docling-serve \
    --load-balancers file://service-alb.json

echo "Service updated to use ALB integration"

This step is crucial as it tells ECS to automatically register and deregister tasks with the target group as they start and stop, eliminating the need for manual target registration.

Step 6: Create Listener

The listener is our "host/hostess" that directs traffic to the right place:

DOCLING_LISTENER=$(aws elbv2 create-listener \
    --load-balancer-arn $DOCLING_ALB \
    --protocol HTTP \
    --port 5001 \
    --default-actions Type=forward,TargetGroupArn=$DOCLING_TARGET_GRP \
    --tags Key=Name,Value=DoclingListener \
    --query "Listeners[].ListenerArn" \
    --output text)

echo "Listener ARN: $DOCLING_LISTENER"

Add Tags to Listener

aws elbv2 add-tags --resource-arns $DOCLING_LISTENER --tags file://alb-tags.json

Step 7: Configure Security Rules

Allow ALB to Reach EC2 Instances

Update the EC2 security group (from Part 2) to allow traffic from the ALB:

aws ec2 authorize-security-group-ingress \
    --group-id $ECS_SG_ID \
    --protocol tcp \
    --port 5001 \
    --source-group $ALB_SG_ID

This creates a security rule allowing our "restaurant owner" (ALB) to communicate with our "kitchen" (EC2 instances).

Step 8: Final Testing and Verification

Check Target Health

Verify that our target is healthy:

aws elbv2 describe-target-health --target-group-arn $DOCLING_TARGET_GRP

You should see the target status as healthy once the health checks pass.

Get Load Balancer DNS Name

ALB_DNS=$(aws elbv2 describe-load-balancers \
    --load-balancer-arns $DOCLING_ALB \
    --query "LoadBalancers[].DNSName" \
    --output text)

echo "Access your application at: http://$ALB_DNS:5001"

Test the Application

Open your browser and navigate to http://<ALB_DNS>:5001/ui. You should see the Docling web interface!

You can also test the API endpoint:

curl http://$ALB_DNS:5001/docs

Monitor Application Logs

Check the application logs through CloudWatch or directly on the instance:

# On the EC2 instance
sudo docker logs <container_id> -f

# Or check CloudWatch Logs
aws logs describe-log-groups --log-group-name-prefix "/ecs/docling-serve-nvidia"

Troubleshooting Common Issues

Service Won't Start

Check if EC2 instances are running and registered with ECS cluster
Verify task definition has correct IAM roles from Part 1
Check CloudWatch logs for container startup errors

Can't Access Application

Verify security group rules allow traffic
Check target group health status
Ensure load balancer is in correct subnets from Part 1

GPU Not Available

Confirm EC2 instance type supports GPU (g4dn.xlarge)
Check ECS agent configuration includes GPU support
Verify NVIDIA drivers are installed (should be automatic with ECS GPU AMI)

Conclusion

Congratulations! 🎉 You've successfully deployed docling to AWS ECS infrastructure with GPU support. Throughout this 3-part series, we've covered:

What We've Accomplished

Part 1 Foundation: VPC networking, security groups, and IAM roles
Part 2 Infrastructure: ECS cluster, Launch Templates, and Auto Scaling Groups

Part 3 Application: Task definitions, services, and Application Load Balancer

The Complete Architecture

Your infrastructure now includes:

Scalable GPU Computing: Auto Scaling Groups with GPU-enabled instances
Container Orchestration: ECS managing Docling containers with resource requirements
High Availability: Multi-subnet deployment with health checks
Internet Accessibility: Application Load Balancer with proper security
Monitoring: CloudWatch integration for logs and metrics
Security: Layered security groups and IAM roles

Real-World Applications

This architecture is perfect for:

AI/ML Workloads: GPU-accelerated machine learning inference
Document Processing: Like our Docling example for document conversion
Video Processing: GPU-accelerated video transcoding and analysis
Scientific Computing: High-performance computing workloads

Series Navigation

Part 1: Foundation - Networking & IAM ✅ - VPC setup, subnets, security groups, and IAM roles
Part 2: ECS EC2 with Auto Scaling ✅ - Launch templates, Auto Scaling Groups, and ECS cluster setup
Part 3: Application Deployment (Current) ✅ - Task definitions, services, and load balancers

What's Next?

Consider exploring these advanced topics in future posts:

High Availability & Scaling: Multi-AZ deployments, auto scaling policies, and health monitoring
Monitoring & Observability: CloudWatch Container Insights, custom metrics, and distributed tracing
Cost Optimization: Spot instances, reserved capacity, and right-sizing strategies

Thank you for following along with this comprehensive AWS ECS series. You should now be able to setup the n8n workflow which uses docling deployed to a GPU enabled AWS ECS infrastructure 🚀