DEV Community

bikash119
bikash119

Posted on

Deploying Docling Application on ECS with Application Load Balancer

This is Part 3 (Final) of our 3-part series on docling deployment to complete AWS ECS infrastructure. In Part 1, we set up the foundational networking and IAM, and in Part 2, we created the ECS cluster with Auto Scaling Groups and Launch Templates. Now we'll deploy our actual application and make it accessible through an Application Load Balancer.

Welcome to the final part of our journey to deploy docling to AWS ECS infrastructure! In this comprehensive guide, we'll deploy the docling application (a GPU-accelerated document processing service) on our ECS infrastructure and expose it to the internet using an Application Load Balancer (ALB). We'll also explore the core concepts of load balancing through an intuitive restaurant analogy.

Understanding Application Load Balancer Components

Before diving into the implementation, let's understand how Application Load Balancers work using a restaurant analogy:

Core Concepts 🍽️

Load Balancer

Think of the Load Balancer as the restaurant owner whose primary responsibility is to serve customers efficiently. The restaurant owner ensures that customers have a great dining experience and that the restaurant runs smoothly.

Listener

The Listener is like a host/hostess hired by the restaurant owner with specific instructions on which customer requests should be served where. For example:

  • If a customer requests ice cream β†’ direct them to the ice cream corner
  • If a customer wants drinks β†’ direct them to the bar area
  • If a family arrives β†’ guide them to the family seating section

Target Group

Target Groups are like groups of waiters for each section:

  • Waiters at the ice cream corner
  • Bartenders at the bar
  • Waiters in the family seating area
  • Each specialized group of staff forms a "Target Group"

Register Targets

Registering Targets is the process where waiters register themselves with their respective target groups, letting the system know they're available to serve customers in their designated area.

This analogy helps us understand how ALB distributes incoming traffic (customers) to the right backend services (waiters) based on configured rules (host instructions).

What We're Building

In this final part, we'll:

  • Create and register ECS Task Definitions for the Docling application
  • Set up ECS Services to manage our containers
  • Configure an Application Load Balancer for external access
  • Establish proper networking and security group rules
  • Test our complete GPU-enabled document processing service

Prerequisites

Make sure you've completed:

You should have the following from previous parts:

  • $VPC_ID - VPC ID from Part 1
  • $PUBLIC_SUBNET and $PRIVATE_SUBNET - Subnet IDs from Part 1
  • $ECS_SG_ID - Security Group ID from Part 2
  • docling-ecs-cluster - ECS cluster from Part 2
  • ECS_Asg - Auto Scaling Group from Part 2

Step 1: Create ECS Task Definition

The Task Definition is like a blueprint that tells ECS how to run our Docling container. Create a file called docling-task-definition.json:

{
  "family": "docling-nvidia",
  "networkMode": "host",
  "requiresCompatibilities": ["EC2"],
  "executionRoleArn": "arn:aws:iam::<YOUR_ACCOUNT_ID>:role/ecs_task_exec_role",
  "taskRoleArn": "arn:aws:iam::<YOUR_ACCOUNT_ID>:role/ecs_task_role",
  "containerDefinitions": [
    {
      "name": "docling-serve",
      "image": "ghcr.io/docling-project/docling-serve-cu126:main",
      "essential": true,
      "portMappings": [
        {
          "containerPort": 5001,
          "hostPort": 5001,
          "protocol": "tcp"
        }
      ],
      "environment": [
        {
          "name": "DOCLING_SERVE_ENABLE_UI",
          "value": "true"
        }
      ],
      "resourceRequirements": [
        {
          "value": "1",
          "type": "GPU"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/docling-serve-nvidia",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs",
          "awslogs-create-group": "true"
        }
      },
      "linuxParameters": {
        "capabilities": {
          "add": ["SYS_ADMIN"]
        }
      }
    }
  ],
  "cpu": "2048",
  "memory": "8192"
}
Enter fullscreen mode Exit fullscreen mode

⚠️ Important: Replace <YOUR_ACCOUNT_ID> with your actual AWS account ID and ensure the role names match those created in Part 1.

Key Configuration Highlights

  • GPU Support: resourceRequirements specifies 1 GPU allocation
  • Network Mode: host mode allows direct access to the EC2 instance's network
  • CloudWatch Logs: Automatic log group creation for monitoring
  • Resource Allocation: 2 vCPUs and 8GB RAM for GPU-intensive processing

Register the Task Definition

TASK_ARN=$(aws ecs register-task-definition \
    --cli-input-json file://docling-task-definition.json \
    --query "taskDefinition.taskDefinitionArn" \
    --output text)

echo "Task Definition ARN: $TASK_ARN"
Enter fullscreen mode Exit fullscreen mode

Add Tags to Task Definition

aws ecs tag-resource --resource-arn $TASK_ARN --tags key=Name,value=ECS_Docling_Task
aws ecs tag-resource --resource-arn $TASK_ARN --tags file://cluster-tags.json
Enter fullscreen mode Exit fullscreen mode

Verify Tags

aws ecs list-tags-for-resource --resource-arn $TASK_ARN
Enter fullscreen mode Exit fullscreen mode

Step 2: Create ECS Service

ECS Services ensure that your desired number of tasks are running and healthy. Create docling-service-definition.json:

{
  "serviceName": "docling-serve",
  "cluster": "docling-ecs-cluster",
  "taskDefinition": "docling-nvidia:2",
  "desiredCount": 0,
  "launchType": "EC2"
}
Enter fullscreen mode Exit fullscreen mode

πŸ“ Note: We start with desiredCount: 0 to prevent tasks from starting before we have EC2 instances available.

Create the Service

SERVICE_ARN=$(aws ecs create-service \
    --cli-input-json file://docling-service-definition.json \
    --query "service.serviceArn" \
    --output text)

echo "Service ARN: $SERVICE_ARN"
Enter fullscreen mode Exit fullscreen mode

Add Tags to Service

aws ecs tag-resource --resource-arn $SERVICE_ARN --tags file://cluster-tags.json
aws ecs tag-resource --resource-arn $SERVICE_ARN --tags key=Name,value=ECS_Docling_Service
Enter fullscreen mode Exit fullscreen mode

Verify Service Tags

aws ecs list-tags-for-resource --resource-arn $SERVICE_ARN
Enter fullscreen mode Exit fullscreen mode

Step 3: Test Basic Service Deployment

Before setting up the load balancer, let's verify our service works correctly.

Launch EC2 Instance

Scale up the Auto Scaling Group to launch an instance (this uses the configuration from Part 2):

aws autoscaling update-auto-scaling-group \
    --auto-scaling-group-name ECS_Asg \
    --min-size 1 \
    --max-size 1 \
    --desired-capacity 1
Enter fullscreen mode Exit fullscreen mode

Start the Service

Update the service to run one task:

aws ecs update-service \
    --cluster docling-ecs-cluster \
    --service docling-serve \
    --desired-count 1
Enter fullscreen mode Exit fullscreen mode

Verify Container is Running

SSH into the instance and check the container status:

# Get the instance IP (from Part 2's command)
aws ec2 describe-instances \
    --filters "Name=instance-state-name,Values=running" "Name=tag-key,Values=Purpose" \
    --query "Reservations[*].Instances[*].[InstanceId,InstanceType,PrivateIpAddress,PublicIpAddress]" \
    --output table

# SSH into the instance
ssh -i ECSInstanceKey.pem ec2-user@<INSTANCE_PUBLIC_IP>

# Check running containers
sudo docker ps

# View container logs
sudo docker logs <container_id> -f
Enter fullscreen mode Exit fullscreen mode

You should see the Docling application starting up and utilizing the GPU.

Step 4: Set Up Application Load Balancer

Now let's create the ALB infrastructure to make our application accessible from the internet.

Create ALB Security Group

ALB_SG_ID=$(aws ec2 create-security-group \
    --tag-specifications 'ResourceType=security-group,Tags=[{Key=Name,Value=ALB_SG}]' \
    --vpc-id $VPC_ID \
    --group-name ALB_SG \
    --description "SG for ALB" \
    --query "GroupId" \
    --output text)

echo "ALB Security Group ID: $ALB_SG_ID"
Enter fullscreen mode Exit fullscreen mode

Allow Internet Traffic to ALB

aws ec2 authorize-security-group-ingress \
    --group-id $ALB_SG_ID \
    --protocol tcp \
    --port 5001 \
    --cidr 0.0.0.0/0
Enter fullscreen mode Exit fullscreen mode

Create the Load Balancer

Remember our restaurant analogy - this creates the "restaurant owner":

DOCLING_ALB=$(aws elbv2 create-load-balancer \
    --name docling-alb \
    --subnets $PRIVATE_SUBNET $PUBLIC_SUBNET \
    --security-groups $ALB_SG_ID \
    --scheme internet-facing \
    --type application \
    --tags Key=Name,Value=DoclingALB \
    --query "LoadBalancers[].LoadBalancerArn" \
    --output text)

echo "Load Balancer ARN: $DOCLING_ALB"
Enter fullscreen mode Exit fullscreen mode

πŸ“‹ Note: The load balancer spans both private and public subnets from Part 1 for high availability.

Add Tags to Load Balancer

aws elbv2 add-tags --resource-arns $DOCLING_ALB --tags file://alb-tags.json
Enter fullscreen mode Exit fullscreen mode

Step 5: Create Target Group

In our restaurant analogy, this creates the "group of waiters" that will serve our customers:

DOCLING_TARGET_GRP=$(aws elbv2 create-target-group \
    --name docling-targets \
    --protocol HTTP \
    --port 5001 \
    --vpc-id $VPC_ID \
    --health-check-path /docs \
    --target-type ip \
    --tags Key=Name,Value=doclingTargetGroup \
    --query "TargetGroups[].TargetGroupArn" \
    --output text)

echo "Target Group ARN: $DOCLING_TARGET_GRP"
Enter fullscreen mode Exit fullscreen mode

Key Target Group Configuration

  • target-type ip: Uses IP addresses rather than instance IDs
  • health-check-path /docs: Docling's health check endpoint
  • port 5001: The port our application listens on

Add Tags to Target Group

aws elbv2 add-tags --resource-arns $DOCLING_TARGET_GRP --tags file://alb-tags.json
Enter fullscreen mode Exit fullscreen mode

Update Service to use Target Group

Now we need to update our ECS service to integrate with the Application Load Balancer. Create a file called service-alb.json:

[
  {
    "targetGroupArn": "<TARGET_GROUP_ARN>",
    "containerName": "docling-serve",
    "containerPort": 5001
  }
]
Enter fullscreen mode Exit fullscreen mode

⚠️ Important: Replace <TARGET_GROUP_ARN> with the actual Target Group ARN we created above (stored in $DOCLING_TARGET_GRP).

Update the service to use the load balancer:

aws ecs update-service \
    --cluster docling-ecs-cluster \
    --service docling-serve \
    --load-balancers file://service-alb.json

echo "Service updated to use ALB integration"
Enter fullscreen mode Exit fullscreen mode

This step is crucial as it tells ECS to automatically register and deregister tasks with the target group as they start and stop, eliminating the need for manual target registration.

Step 6: Create Listener

The listener is our "host/hostess" that directs traffic to the right place:

DOCLING_LISTENER=$(aws elbv2 create-listener \
    --load-balancer-arn $DOCLING_ALB \
    --protocol HTTP \
    --port 5001 \
    --default-actions Type=forward,TargetGroupArn=$DOCLING_TARGET_GRP \
    --tags Key=Name,Value=DoclingListener \
    --query "Listeners[].ListenerArn" \
    --output text)

echo "Listener ARN: $DOCLING_LISTENER"
Enter fullscreen mode Exit fullscreen mode

Add Tags to Listener

aws elbv2 add-tags --resource-arns $DOCLING_LISTENER --tags file://alb-tags.json
Enter fullscreen mode Exit fullscreen mode

Step 7: Configure Security Rules

Allow ALB to Reach EC2 Instances

Update the EC2 security group (from Part 2) to allow traffic from the ALB:

aws ec2 authorize-security-group-ingress \
    --group-id $ECS_SG_ID \
    --protocol tcp \
    --port 5001 \
    --source-group $ALB_SG_ID
Enter fullscreen mode Exit fullscreen mode

This creates a security rule allowing our "restaurant owner" (ALB) to communicate with our "kitchen" (EC2 instances).

Step 8: Final Testing and Verification

Check Target Health

Verify that our target is healthy:

aws elbv2 describe-target-health --target-group-arn $DOCLING_TARGET_GRP
Enter fullscreen mode Exit fullscreen mode

You should see the target status as healthy once the health checks pass.

Get Load Balancer DNS Name

ALB_DNS=$(aws elbv2 describe-load-balancers \
    --load-balancer-arns $DOCLING_ALB \
    --query "LoadBalancers[].DNSName" \
    --output text)

echo "Access your application at: http://$ALB_DNS:5001"
Enter fullscreen mode Exit fullscreen mode

Test the Application

Open your browser and navigate to http://<ALB_DNS>:5001/ui. You should see the Docling web interface!

You can also test the API endpoint:

curl http://$ALB_DNS:5001/docs
Enter fullscreen mode Exit fullscreen mode

Monitor Application Logs

Check the application logs through CloudWatch or directly on the instance:

# On the EC2 instance
sudo docker logs <container_id> -f

# Or check CloudWatch Logs
aws logs describe-log-groups --log-group-name-prefix "/ecs/docling-serve-nvidia"
Enter fullscreen mode Exit fullscreen mode

Troubleshooting Common Issues

Service Won't Start

  • Check if EC2 instances are running and registered with ECS cluster
  • Verify task definition has correct IAM roles from Part 1
  • Check CloudWatch logs for container startup errors

Can't Access Application

  • Verify security group rules allow traffic
  • Check target group health status
  • Ensure load balancer is in correct subnets from Part 1

GPU Not Available

  • Confirm EC2 instance type supports GPU (g4dn.xlarge)
  • Check ECS agent configuration includes GPU support
  • Verify NVIDIA drivers are installed (should be automatic with ECS GPU AMI)

Conclusion

Congratulations! πŸŽ‰ You've successfully deployed docling to AWS ECS infrastructure with GPU support. Throughout this 3-part series, we've covered:

What We've Accomplished

Part 1 Foundation: VPC networking, security groups, and IAM roles
Part 2 Infrastructure: ECS cluster, Launch Templates, and Auto Scaling Groups

Part 3 Application: Task definitions, services, and Application Load Balancer

The Complete Architecture

Your infrastructure now includes:

  • Scalable GPU Computing: Auto Scaling Groups with GPU-enabled instances
  • Container Orchestration: ECS managing Docling containers with resource requirements
  • High Availability: Multi-subnet deployment with health checks
  • Internet Accessibility: Application Load Balancer with proper security
  • Monitoring: CloudWatch integration for logs and metrics
  • Security: Layered security groups and IAM roles

Real-World Applications

This architecture is perfect for:

  • AI/ML Workloads: GPU-accelerated machine learning inference
  • Document Processing: Like our Docling example for document conversion
  • Video Processing: GPU-accelerated video transcoding and analysis
  • Scientific Computing: High-performance computing workloads

Series Navigation

What's Next?

Consider exploring these advanced topics in future posts:

  • High Availability & Scaling: Multi-AZ deployments, auto scaling policies, and health monitoring
  • Monitoring & Observability: CloudWatch Container Insights, custom metrics, and distributed tracing
  • Cost Optimization: Spot instances, reserved capacity, and right-sizing strategies

Thank you for following along with this comprehensive AWS ECS series. You should now be able to setup the n8n workflow which uses docling deployed to a GPU enabled AWS ECS infrastructure πŸš€

Top comments (0)