π Abstract
This project provides an AWS CDK solution for automated virus scanning of S3 objects using ClamAV. It addresses the performance limitations of serverless ClamAV implementations by running ClamAV daemon (clamd) as a containerized REST API on AWS ECS Fargate. This architecture eliminates the 15-30 second cold start delay associated with loading ClamAV libraries in Lambda functions, enabling near-instant scan results.
The solution uses a hybrid approach: ECS Fargate hosts the persistent ClamAV daemon service, while Lambda functions handle S3 event processing and orchestration. Files uploaded to monitored S3 buckets trigger Lambda functions that call the ClamAV REST API, which performs fast scans using the pre-loaded daemon. Results are published to SNS topics for downstream processing.
π Table of Contents
- π Abstract
- π Table of Contents
- β οΈ Opening Problems
- ποΈ Architecture Overview
- π Deep-dive the Solution
- π Deploy CDK Stack and Test
- π§Ή Cleanup Stack
- π― Conclusion
β οΈ Opening Problems
The cdk-serverless-clamscan project provides a serverless solution for scanning S3 objects with ClamAV using AWS Lambda. While this approach offers simplicity and automatic scaling, it suffers from a critical performance limitation:
Cold Start Delay: Lambda functions must load the entire ClamAV scanning library on each lambda invocation, resulting in 15-30 second delays before scanning can begin. This makes the solution impractical for time-sensitive workloads or high-volume scanning scenarios.
The Solution: As suggested in the ClamAV GitHub discussion, using clamd (ClamAV daemon) as a persistent service significantly improves performance. The daemon pre-loads virus definitions into memory and remains running, allowing scans via clamdscan to execute instantly without initialization overhead.
This project implements that recommendation by:
- Running
clamdas a containerized REST API on AWS ECS Fargate - Using Lambda functions to orchestrate S3 event processing
- Calling the persistent ClamAV API for near-instant scan results
ποΈ Architecture Overview
π» Infrastructure as Code with AWS CDK
This project uses AWS CDK (Cloud Development Kit) with TypeScript to define and provision all infrastructure resources.
CDK Project Structure:
src/
βββ bin/main.ts # CDK app entry point
βββ lib/
β βββ stacks/
β β βββ s3-clamav-scan-stack.ts # Main stack orchestration
β βββ constructs/
β β βββ ecs-cluster-provider/ # ECS cluster, NLB, ClamAV service
β β βββ s3-serverless-clamscan/ # Lambda scanner, SNS, SQS
β βββ shared/
β βββ environment.ts # Environment configurations
β βββ constants.ts # Shared constants
The solution consists of the following components:
π§ Core Infrastructure
- VPC with Multi-AZ Configuration: Isolated network with public, private-isolated, and private-with-egress subnets
- ECS Fargate Cluster: Hosts the ClamAV REST API containers with Fargate Spot for cost optimization
- Network Load Balancer (NLB): Internal load balancer for distributing traffic to ClamAV API containers
- S3 Gateway Endpoint: Direct S3 access from private subnets without NAT gateway costs
π¦ Application Components
-
ClamAV REST API: Flask-based API running in Docker containers with
clamd,nginx, anduwsgi - Lambda Scanner Function: Python function triggered by S3 events to orchestrate scanning
- S3 Upload Bucket event trigger: Monitored bucket where files are uploaded for scanning
- SNS Topics: Separate topics for clean and infected file notifications
- SQS Error Queue: Dead letter queue for failed scan operations with retry logic
π Deep-dive the Solution
π‘οΈ 1. ClamAV REST API on ECS Fargate
The ClamAV API is containerized and runs on ECS Fargate, providing a scalable and persistent scanning service.
Docker Container Components:
# Key components from Dockerfile
- Base: python:3.14-bookworm
- ClamAV packages: clamav, clamav-daemon
- Web stack: nginx, uwsgi, Flask
- Process manager: supervisor
- Fresh virus definitions via freshclam
API Endpoints:
-
GET /- Health check endpoint (returns "OK") -
POST /scan_file- Scan endpoint accepting JSON payload with S3 bucket and key
π’ 2. Result Notification System
The solution uses SNS topics to notify downstream systems of scan results:
SNS Topics:
-
clamav-clean-topic: Notifications for clean files -
clamav-infected-topic: Notifications for infected files
Message Format:
{
"input_bucket": "bucket-name",
"input_key": "path/to/file.pdf",
"status": "CLEAN" | "INFECTED",
"message": "Scanning bucket-name/path/to/file.pdf\n<ClamAV output>"
}
Use Cases:
- Clean Files: Trigger downstream processing (e.g., move to processed bucket, extract metadata)
- Infected Files: Quarantine, alert security team, delete, or move to isolated bucket
- Integration: Subscribe Lambda, SQS, email, or other services to topics
π Deploy CDK Stack and Test
β Prerequisites
- AWS account with appropriate permissions
- AWS CDK CLI installed (
npm install -g aws-cdk) - Node.js 16+ and pnpm package manager
- Docker for building container images
- AWS CLI configured with credentials
π₯ Installation and Deployment
- Clone the repository:
git clone https://github.com/vumdao/cdk-clamav-rest-api-on-aws-ecs.git
cd cdk-clamav-rest-api-on-aws-ecs
- Install dependencies:
pnpm install
- Deploy the stack:
pnpm run deploy
- Build and push Docker image to ECR:
During the cdk deployment, build and push the ClamAV API Docker image:
# Navigate to the Dockerfile directory
cd src/lib/constructs/s3-serverless-clamscan/clamd-api
# Build the Docker image
docker build -t simflexcloud/clamav-api .
# Authenticate Docker to your ECR registry (replace region and account ID)
aws ecr get-login-password --region ap-southeast-1 | docker login --username AWS --password-stdin 123456789012.dkr.ecr.ap-southeast-1.amazonaws.com
# Tag the image for ECR
docker tag simflexcloud/clamav-api:latest 123456789012.dkr.ecr.ap-southeast-1.amazonaws.com/simflexcloud/clamav-api:latest
# Push the image to ECR
docker push 123456789012.dkr.ecr.ap-southeast-1.amazonaws.com/simflexcloud/clamav-api:latest
Expected Output:
β
S3ClamAvStack
Outputs:
S3ClamAvStack.EcsClusterProviderclamav-apiEndpoint = nlb-xxxx.elb.ap-southeast-1.amazonaws.com:5000
π§ͺ Testing the Solution
- Test with clean file:
- Test with the EICAR test virus
π§Ή Cleanup Stack
Destroy the CDK stack:
pnpm run destroy
π― Conclusion
- This solution successfully addresses the performance limitations of serverless ClamAV implementations by leveraging a persistent
clamddaemon running on ECS Fargate. Key achievements include: - This architecture demonstrates how combining AWS managed services (ECS Fargate, Lambda, S3) with open-source tools (ClamAV) can create production-ready solutions that overcome the limitations of purely serverless approaches while maintaining operational simplicity and cost efficiency







Top comments (0)