Searching the internet, you can find guides showing how to create a serverless virus scanning with ClamAV:
- Using Serverless to Scan Files with ClamAV in a Lambda Container
- Using Serverless to Scan Files with a ClamAV Lambda Layer
- Serverless tutorial: Let’s build a virus scanning solution with automated database updates
Even in the AWS Blog you can find good ideas:
- Virus scan S3 buckets with a serverless ClamAV based CDK construct
- AWS SAM application to keep your S3 objects safe from viruses using ClamAV Open Source software
However, a few aspects fell short of my expectations:
1) They are not using the latest version of ClamAV
The examples above are installing ClamAV directly from the OS package registry (e.g., yum install clamav
).
They aren't always in sync with the latest release, and if you don't update or upgrade the OS package registry, you will install older versions.
Using yum install clamav
for Amazon Linux 2, we receive the 0.100.x
version, whereas the ClamAV release is already in the 1.x
version.
2) Outdated Node.js runtime
Examples using Node.js use the outdated Node.js 14.x runtime. This runtime is already under Deprecation (Phase 1) in the AWS timeline for Node.js supported runtimes and is not part of the maintenance window of Node.js releases.
Using outdated and unsupported versions is a risk I try to avoid!
3) Using plain JavaScript
While not a deal-breaker, I don't use plain JavaScript in my production projects.
How would a TypeScript example look in the latest AWS Lambda runtime for Node.js?
4) How to use top-level await in Node.js handlers
Top-level await support in AWS Lambda is not new, but how can we configure it with everything else? (e.g., TypeScript, esbuild, etc). How can we output ESM code in the lambda handler?
Here it comes Amazon Linux 2023
Starting on Node.js 20.x runtime, the default operational system for the AWS Lambda base image is Amazon Linux 2023.
The AL2023 brings a new package management tool called dnf
.
dnf
is the successor to yum
, the package management tool in Amazon Linux 2.
While many of the commands are compatible, for example, for the following Amazon Linux 2 yum
commands:
$ sudo yum install packagename
$ sudo yum search packagename
$ sudo yum remove packagename
In AL2023, they become these commands:
$ sudo dnf install packagename
$ sudo dnf search packagename
$ sudo dnf remove packagename
Not everything stays the same. Be aware of changes! 🚨
You can check the page changes in dnf
CLI compared to yum
.
Let's build a newer example
We'll keep the example project similar to the guides listed at the beginning:
- A S3 bucket with object notification to an AWS Lambda
- When an object is created in S3, a notification triggers the AWS Lambda
- The Lambda will read the file from the bucket, write it to
/tmp
, and runclamscan
on it - The returned code from
clamscan
will be used to check the file status
There are a few constraints I want to define for our newer example:
- Because we need to download the ClamAV virus database in our lambda source code, the uncompressed file size of 250MB can be an issue.
- We will be using AWS Lambda Container image code, which enables us to have up to 10GB of uncompressed image size.
- The Docker
build
process should take care of transpiling TypeScript to JavaScript and installing production-only dependencies. - We want to download and install the latest ClamAV during the Docker
build
process and update its virus definitions.
The Dockerfile
We can use Docker multi-stage builds to create these steps:
# ========================================
# Builder Image
# ========================================
FROM --platform=linux/x86_64 public.ecr.aws/lambda/nodejs:20 as builder
COPY package.json package-lock.json index.ts ./
#
# 1) install dependencies with dev dependencies
# 2) build the project
# 3) remove dev dependencies
# 4) install dependencies without dev dependencies
#
RUN npm install && \
npm run build && \
rm -rf node_modules && \
npm install --omit=dev
# ========================================
# Runtime Image
# ========================================
FROM --platform=linux/x86_64 public.ecr.aws/lambda/nodejs:20 as runtime
ENV CLAMAV_PKG=clamav-1.2.1.linux.x86_64.rpm
RUN <<-EOF
set -ex
#
# install glibc-langpack-en to support english language and utf-8
# this was required by clamscan to avoid error "WARNING: Failed to set locale"
#
dnf install wget glibc-langpack-en -y
#
# 1) download latest ClamAV from https://www.clamav.net/downloads
# 2) install using `rpm` and it requires full path for local packages
# 3) remove the downloaded package and clean up for smaller runtime image
#
wget https://www.clamav.net/downloads/production/${CLAMAV_PKG}
rpm -ivh "${LAMBDA_TASK_ROOT}/${CLAMAV_PKG}"
rm -rf ${CLAMAV_PKG}
dnf remove wget -y
dnf clean all
#
# the current working directory is "/var/task" as defined in the base image:
# https://github.com/aws/aws-lambda-base-images/blob/nodejs20.x/Dockerfile.nodejs20.x
#
# 1) "lib/database" is the path to download the virus database
# 2) "freshclam.download.log" and "freshclam.conf.log" are the log files for freshclam CLI
#
mkdir -p ${LAMBDA_TASK_ROOT}/lib/database
touch ${LAMBDA_TASK_ROOT}/lib/{freshclam.download.log,freshclam.conf.log}
chmod -R 777 ${LAMBDA_TASK_ROOT}/lib
#
# default configuration path for freshclam is "/usr/local/etc/freshclam.conf"
# we create a symbolic link to the default configuration path and copy our custom config file
#
ln -s /usr/local/etc/freshclam.conf ${LAMBDA_TASK_ROOT}/lib/freshclam.conf
EOF
COPY freshclam.conf /var/task/lib/freshclam.conf
#
# freshclam CLI is a virus database update tool for ClamAV, documentation:
# https://linux.die.net/man/1/freshclam
#
RUN <<-EOF
set -ex
export LOG_FILE_PATH="${LAMBDA_TASK_ROOT}/lib/freshclam.conf.log"
freshclam --verbose --stdout --user root \
--log=${LOG_FILE_PATH} \
--datadir=${LAMBDA_TASK_ROOT}/lib/database
if grep -q "Can't download daily.cvd\|Can't download main.cvd\|Can't download bytecode.cvd" ${LOG_FILE_PATH}; then
echo "ERROR: Unable to download ClamAV database files - your request may be being rate limited"
exit 1;
fi
EOF
#
# copy application files from the builder image
#
COPY --from=builder /var/task/dist/* /var/task/
COPY --from=builder /var/task/node_modules /var/task/node_modules
CMD [ "index.handler" ]
The above Dockerfile covers:
- Install and build the TypeScript Lambda to JavaScript with production dependencies using a multi-stage build. The first stage
as builder
creates thedist
folder andnode_modules
folder used by theas runtime
stage - Download the latest ClamAV from their release page, install it using
rpm
and remove cache for smaller final image - Download the ClamAV virus database definitions with
freshclam
- You can change the
CLAMAV_PKG
to be in sync with the latest version of ClamAV
🚨 Important: To update your database definition, you need to re-build this image every once in a while
The required freshclam.conf
file contains the following:
CompressLocalDatabase yes
DatabaseDirectory /var/task/lib/database
DatabaseMirror database.clamav.net
DNSDatabaseInfo current.cvd.clamav.net
ScriptedUpdates no
UpdateLogFile /var/task/lib/freshclam.conf.log
🚨 Important: The full path files (e.g., /var/task/*
must match the Dockerfile definitions
The TypeScript AWS Lambda Handler
For a S3 notification event, we can write our handler similar to:
import { S3CreateEvent } from "aws-lambda";
import { GetObjectCommand, S3Client } from "@aws-sdk/client-s3";
import { spawnSync } from "node:child_process";
import { mkdir, writeFile } from "node:fs/promises";
const s3Client = await new S3Client({});
//
// directories for clamscan
// "/tmp/files_to_scan" where we will store the files from s3 to scan
// "/tmp/clamscan_tmp" required by clamscan to store temporary files during the virus scan
//
await mkdir("/tmp/files_to_scan", { recursive: true });
await mkdir("/tmp/clamscan_tmp", { recursive: true });
async function handler(event: S3CreateEvent) {
console.log(JSON.stringify(event, null, 2));
for (const record of event.Records) {
const bucketName = record.s3.bucket.name;
const objectKey = record.s3.object.key;
const getObjectCommand = new GetObjectCommand({
Bucket: bucketName,
Key: objectKey,
});
const s3Object = await s3Client.send(getObjectCommand);
const s3ObjectContent = (await s3Object.Body?.transformToString()) as string;
const tmpFilePath = `/tmp/files_to_scan/${objectKey}`;
await writeFile(tmpFilePath, s3ObjectContent, { encoding: "utf-8" });
//
// clamscan CLI documentation:
// https://linux.die.net/man/1/clamscan
//
const clamavScan = spawnSync(
"clamscan",
["--verbose", "--stdout", `--database=/var/task/lib/database`, `--tempdir=/tmp/clamscan_tmp`, tmpFilePath],
{
encoding: "utf-8",
stdio: "pipe",
},
);
console.log(JSON.stringify(clamavScan, null, 2));
// You can find the return codes here:
// https://linux.die.net/man/1/clamscan
if (clamavScan.status === 0) {
console.log("no virus found");
} else if (clamavScan.status === 1) {
console.log("virus found");
} else if (clamavScan.status === 2) {
console.log("some error(s) occured in clamscan");
}
await unlink(tmpFilePath);
}
}
export { handler };
We use the top-level await feature and create two folders when the lambda container starts.
Later, we use spawnSync
to trigger the clamscan
binary installed via the Dockerfile.
Ensure you use full path definitions in the clamscan
parameters, for example: /var/task/lib/database
, to load the correct virus definitions.
We can test the ClamAV detection using any EICAR text files. The result should look like:
Now, we have our Dockerfile, ClamAV configuration, and Lambda handler.
Where do we deploy all of that?
The CDK TypeScript Project
Because Docker is building our lambda handler, we create its own package.json
with dependencies:
{
"name": "clamav-scan",
"version": "1.0.0",
"type": "module",
"scripts": {
"build": "rimraf dist && esbuild index.ts --format=esm --outfile=dist/index.mjs"
},
"devDependencies": {
"@types/aws-lambda": "^8.10.130",
"esbuild": "^0.19.8",
"rimraf": "^5.0.5",
"typescript": "^5.3.2"
},
"dependencies": {
"@aws-sdk/client-s3": "^3.465.0"
}
}
Using "type": "module"
will tell TypeScript and Node.js that we are aiming to use ECMAScript Modules in our source code (ESM).
The build
command asks esbuild
to output our source code in the ESM format with the --format=esm
flag.
The last piece of the puzzle, is the tsconfig.json
:
{
"compilerOptions": {
"esModuleInterop": true,
"forceConsistentCasingInFileNames": true,
"isolatedModules": true,
"module": "NodeNext",
"moduleResolution": "NodeNext",
"noEmit": true,
"preserveConstEnums": true,
"skipLibCheck": true,
"sourceMap": false,
"strict": true,
"target": "ESNext"
},
"exclude": ["node_modules"]
}
Using NodeNext
for moduleResolution
/ module
and ESNext
for target
, will tell the TypeScript engine tsc
to output code in ESM format.
The complete example can be found on GitHub:
oieduardorabelo / s3-virus-scanning-typescript-aws-lambda-container
S3 virus scanning with TypeScript and Node.js 20.x AWS Lambda Container
ClamAV 1.2.1 with AWS Lambda Container Images for Node.js 20.x
CDK project for deploying a ClamAV 1.2.1 with AWS Lambda Container Images for Node.js 20.x
This helps you to scan files for viruses using AWS Lambda functions
🚨 Important:
- Virus definitions are updated during build
- Ensure you are building the container regularly to keep your definitions up to date
- You can update the Dockerfile to use a different version of ClamAV
🚨 WARNING: You are being rate-limited
This is super important and caught me off guard multiple times.
Pay attention to the number of viruses your database is using:
During the build process of your Docker image, the ClamAV database mirror can rate limit your IP address and block you from downloading the virus definitions.
For example, visiting to https://database.clamav.net/main.cvd, can return the following:
Ensure your freshclam
is downloading and loading the definitions:
daily.cvd
main.cvd
and bytecode.cvd
By default freshclam
CLI will NOT throw an error when that happens.
That's why in the Dockerfile we are grepping the log file generated by the CLI and looking for errors:
if grep -q "Can't download daily.cvd\|Can't download main.cvd\|Can't download bytecode.cvd" ${LOG_FILE_PATH}; then
echo "ERROR: Unable to download ClamAV database files - your request may be being rate limited"
exit 1;
fi
And we manually throw an error when any of the rate-limiting messages are detected! 🏁
Top comments (0)