DEV Community

Cover image for ClamAV (Anti-Virus) as a REST application on AWS ECS

ClamAV (Anti-Virus) as a REST application on AWS ECS

πŸ“‹ Abstract

This project provides an AWS CDK solution for automated virus scanning of S3 objects using ClamAV. It addresses the performance limitations of serverless ClamAV implementations by running ClamAV daemon (clamd) as a containerized REST API on AWS ECS Fargate. This architecture eliminates the 15-30 second cold start delay associated with loading ClamAV libraries in Lambda functions, enabling near-instant scan results.

The solution uses a hybrid approach: ECS Fargate hosts the persistent ClamAV daemon service, while Lambda functions handle S3 event processing and orchestration. Files uploaded to monitored S3 buckets trigger Lambda functions that call the ClamAV REST API, which performs fast scans using the pre-loaded daemon. Results are published to SNS topics for downstream processing.

πŸ“š Table of Contents

⚠️ Opening Problems

The cdk-serverless-clamscan project provides a serverless solution for scanning S3 objects with ClamAV using AWS Lambda. While this approach offers simplicity and automatic scaling, it suffers from a critical performance limitation:

Cold Start Delay: Lambda functions must load the entire ClamAV scanning library on each lambda invocation, resulting in 15-30 second delays before scanning can begin. This makes the solution impractical for time-sensitive workloads or high-volume scanning scenarios.

The Solution: As suggested in the ClamAV GitHub discussion, using clamd (ClamAV daemon) as a persistent service significantly improves performance. The daemon pre-loads virus definitions into memory and remains running, allowing scans via clamdscan to execute instantly without initialization overhead.

This project implements that recommendation by:

  • Running clamd as a containerized REST API on AWS ECS Fargate
  • Using Lambda functions to orchestrate S3 event processing
  • Calling the persistent ClamAV API for near-instant scan results

πŸ—οΈ Architecture Overview

Architecture Flow

πŸ’» Infrastructure as Code with AWS CDK

This project uses AWS CDK (Cloud Development Kit) with TypeScript to define and provision all infrastructure resources.

CDK Project Structure:

src/
β”œβ”€β”€ bin/main.ts                          # CDK app entry point
β”œβ”€β”€ lib/
β”‚   β”œβ”€β”€ stacks/
β”‚   β”‚   └── s3-clamav-scan-stack.ts     # Main stack orchestration
β”‚   β”œβ”€β”€ constructs/
β”‚   β”‚   β”œβ”€β”€ ecs-cluster-provider/       # ECS cluster, NLB, ClamAV service
β”‚   β”‚   └── s3-serverless-clamscan/     # Lambda scanner, SNS, SQS
β”‚   └── shared/
β”‚       β”œβ”€β”€ environment.ts               # Environment configurations
β”‚       └── constants.ts                 # Shared constants
Enter fullscreen mode Exit fullscreen mode

Infrastructure Diagram

The solution consists of the following components:

πŸ”§ Core Infrastructure

  • VPC with Multi-AZ Configuration: Isolated network with public, private-isolated, and private-with-egress subnets
  • ECS Fargate Cluster: Hosts the ClamAV REST API containers with Fargate Spot for cost optimization
  • Network Load Balancer (NLB): Internal load balancer for distributing traffic to ClamAV API containers
  • S3 Gateway Endpoint: Direct S3 access from private subnets without NAT gateway costs

πŸ“¦ Application Components

  • ClamAV REST API: Flask-based API running in Docker containers with clamd, nginx, and uwsgi
  • Lambda Scanner Function: Python function triggered by S3 events to orchestrate scanning
  • S3 Upload Bucket event trigger: Monitored bucket where files are uploaded for scanning
  • SNS Topics: Separate topics for clean and infected file notifications
  • SQS Error Queue: Dead letter queue for failed scan operations with retry logic

πŸ” Deep-dive the Solution

πŸ›‘οΈ 1. ClamAV REST API on ECS Fargate

The ClamAV API is containerized and runs on ECS Fargate, providing a scalable and persistent scanning service.

Docker Container Components:

# Key components from Dockerfile
- Base: python:3.14-bookworm
- ClamAV packages: clamav, clamav-daemon
- Web stack: nginx, uwsgi, Flask
- Process manager: supervisor
- Fresh virus definitions via freshclam
Enter fullscreen mode Exit fullscreen mode

API Endpoints:

  • GET / - Health check endpoint (returns "OK")
  • POST /scan_file - Scan endpoint accepting JSON payload with S3 bucket and key

πŸ“’ 2. Result Notification System

The solution uses SNS topics to notify downstream systems of scan results:

SNS Topics:

  • clamav-clean-topic: Notifications for clean files
  • clamav-infected-topic: Notifications for infected files

Message Format:

{
  "input_bucket": "bucket-name",
  "input_key": "path/to/file.pdf",
  "status": "CLEAN" | "INFECTED",
  "message": "Scanning bucket-name/path/to/file.pdf\n<ClamAV output>"
}
Enter fullscreen mode Exit fullscreen mode

Use Cases:

  • Clean Files: Trigger downstream processing (e.g., move to processed bucket, extract metadata)
  • Infected Files: Quarantine, alert security team, delete, or move to isolated bucket
  • Integration: Subscribe Lambda, SQS, email, or other services to topics

πŸš€ Deploy CDK Stack and Test

βœ… Prerequisites

  • AWS account with appropriate permissions
  • AWS CDK CLI installed (npm install -g aws-cdk)
  • Node.js 16+ and pnpm package manager
  • Docker for building container images
  • AWS CLI configured with credentials

πŸ“₯ Installation and Deployment

  1. Clone the repository:
git clone https://github.com/vumdao/cdk-clamav-rest-api-on-aws-ecs.git
cd cdk-clamav-rest-api-on-aws-ecs
Enter fullscreen mode Exit fullscreen mode
  1. Install dependencies:
pnpm install
Enter fullscreen mode Exit fullscreen mode
  1. Deploy the stack:
pnpm run deploy
Enter fullscreen mode Exit fullscreen mode
  1. Build and push Docker image to ECR:

During the cdk deployment, build and push the ClamAV API Docker image:

# Navigate to the Dockerfile directory
cd src/lib/constructs/s3-serverless-clamscan/clamd-api

# Build the Docker image
docker build -t simflexcloud/clamav-api .

# Authenticate Docker to your ECR registry (replace region and account ID)
aws ecr get-login-password --region ap-southeast-1 | docker login --username AWS --password-stdin 123456789012.dkr.ecr.ap-southeast-1.amazonaws.com

# Tag the image for ECR
docker tag simflexcloud/clamav-api:latest 123456789012.dkr.ecr.ap-southeast-1.amazonaws.com/simflexcloud/clamav-api:latest

# Push the image to ECR
docker push 123456789012.dkr.ecr.ap-southeast-1.amazonaws.com/simflexcloud/clamav-api:latest
Enter fullscreen mode Exit fullscreen mode

Expected Output:

βœ…  S3ClamAvStack

Outputs:
S3ClamAvStack.EcsClusterProviderclamav-apiEndpoint = nlb-xxxx.elb.ap-southeast-1.amazonaws.com:5000
Enter fullscreen mode Exit fullscreen mode

S3 Event Trigger lambda function

πŸ§ͺ Testing the Solution

  1. Test with clean file:

Upload clean file

Scan result

  1. Test with the EICAR test virus

Upload virus file

Scan result

🧹 Cleanup Stack

Destroy the CDK stack:

pnpm run destroy
Enter fullscreen mode Exit fullscreen mode

🎯 Conclusion

  • This solution successfully addresses the performance limitations of serverless ClamAV implementations by leveraging a persistent clamd daemon running on ECS Fargate. Key achievements include:
  • This architecture demonstrates how combining AWS managed services (ECS Fargate, Lambda, S3) with open-source tools (ClamAV) can create production-ready solutions that overcome the limitations of purely serverless approaches while maintaining operational simplicity and cost efficiency

Top comments (0)