Update: I've written how to do this with lambda containers as well!
Let's create an environment that scans a file via an S3 event by utilizing ClamAV binaries on a Lambda layer. You can retrieve the full source code at this GitHub repository.
Note: As of 8/11/2021, the unzipped size of the ClamAV binaries and virus definitions are under the limit. This may change in the future.
Serverless config
So, we'll need a few things: an S3 bucket, a function, and a lambda layer. I've also included logging groups, and permissions in the serverless.yml
file:
service: clambda-av
provider:
name: aws
runtime: nodejs14.x
iamRoleStatements:
- Effect: Allow
Action:
- s3:GetObject
- s3:PutObjectTagging
Resource: "arn:aws:s3:::clambda-av-files/*"
functions:
virusScan:
handler: handler.virusScan
memorySize: 2048
events:
- s3:
bucket: clambda-av-files
event: s3:ObjectCreated:*
layers:
- {Ref: ClamavLambdaLayer}
timeout: 120
package:
exclude:
- node_modules/**
- coverage/**
layers:
clamav:
path: layer
Dockerfile
Before we deploy this, we need to get our ClamAV binaries. Amazon has their own Docker base image we can use to build these binaries. With the base image, we can start making our binaries. Through the power of trial and error, I've found the necessary binaries that's required. Here's the full Dockerfile:
FROM amazonlinux:2
WORKDIR /home/build
RUN set -e
RUN echo "Prepping ClamAV"
RUN rm -rf bin
RUN rm -rf lib
RUN yum update -y
RUN amazon-linux-extras install epel -y
RUN yum install -y cpio yum-utils tar.x86_64 gzip zip
RUN yumdownloader -x \*i686 --archlist=x86_64 clamav
RUN rpm2cpio clamav-0*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 clamav-lib
RUN rpm2cpio clamav-lib*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 clamav-update
RUN rpm2cpio clamav-update*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 json-c
RUN rpm2cpio json-c*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 pcre2
RUN rpm2cpio pcre*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 libtool-ltdl
RUN rpm2cpio libtool-ltdl*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 libxml2
RUN rpm2cpio libxml2*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 bzip2-libs
RUN rpm2cpio bzip2-libs*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 xz-libs
RUN rpm2cpio xz-libs*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 libprelude
RUN rpm2cpio libprelude*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 gnutls
RUN rpm2cpio gnutls*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 nettle
RUN rpm2cpio nettle*.rpm | cpio -vimd
RUN mkdir -p bin
RUN mkdir -p lib
RUN mkdir -p var/lib/clamav
RUN chmod -R 777 var/lib/clamav
COPY ./freshclam.conf .
RUN cp usr/bin/clamscan usr/bin/freshclam bin/.
RUN cp usr/lib64/* lib/.
RUN cp freshclam.conf bin/freshclam.conf
RUN yum install shadow-utils.x86_64 -y
RUN groupadd clamav
RUN useradd -g clamav -s /bin/false -c "Clam Antivirus" clamav
RUN useradd -g clamav -s /bin/false -c "Clam Antivirus" clamupdate
RUN LD_LIBRARY_PATH=./lib ./bin/freshclam --config-file=bin/freshclam.conf
RUN zip -r9 clamav_lambda_layer.zip bin
RUN zip -r9 clamav_lambda_layer.zip lib
RUN zip -r9 clamav_lambda_layer.zip var
RUN zip -r9 clamav_lambda_layer.zip etc
You'll note that the COPY ./freshclam.conf .
line implies that there's another file that we need and you'd be correct:
DatabaseMirror database.clamav.net
CompressLocalDatabase yes
ScriptedUpdates no
DatabaseDirectory /home/build/var/lib/clamav
Building the binaries with Docker
Next is our bash script build.sh
to run Docker on the Dockerfile above to build and extract the binaries:
#!/bin/bash
rm -rf ./layer
mkdir layer
docker build -t clamav -f Dockerfile .
docker run --name clamav clamav
docker cp clamav:/home/build/clamav_lambda_layer.zip .
docker rm clamav
mv clamav_lambda_layer.zip ./layer
pushd layer
unzip -n clamav_lambda_layer.zip
rm clamav_lambda_layer.zip
popd
Scanning Handler
Furthermore, we also need our handler to do the scanning and tagging of the file. The tagging is more of a placeholder. You can create a separate quarantine bucket or a separate clean bucket - whichever you prefer.
Now, we need our handler and we should be set:
const { execSync } = require("child_process");
const { writeFileSync, unlinkSync } = require("fs");
const AWS = require("aws-sdk");
const s3 = new AWS.S3();
module.exports.virusScan = async (event, context) => {
if (!event.Records) {
console.log("Not an S3 event invocation!");
return;
}
for (const record of event.Records) {
if (!record.s3) {
console.log("Not an S3 Record!");
continue;
}
// get the file
const s3Object = await s3
.getObject({
Bucket: record.s3.bucket.name,
Key: record.s3.object.key
})
.promise();
// write file to disk
writeFileSync(`/tmp/${record.s3.object.key}`, s3Object.Body);
try {
// scan it
const scanStatus = execSync(`clamscan --database=/opt/var/lib/clamav /tmp/${record.s3.object.key}`);
await s3
.putObjectTagging({
Bucket: record.s3.bucket.name,
Key: record.s3.object.key,
Tagging: {
TagSet: [
{
Key: 'av-status',
Value: 'clean'
}
]
}
})
.promise();
} catch(err) {
if (err.status === 1) {
// tag as dirty, OR you can delete it
await s3
.putObjectTagging({
Bucket: record.s3.bucket.name,
Key: record.s3.object.key,
Tagging: {
TagSet: [
{
Key: 'av-status',
Value: 'dirty'
}
]
}
})
.promise();
}
}
// delete the temp file
unlinkSync(`/tmp/${record.s3.object.key}`);
}
};
You may be asking yourself on the --database option above, "Wait... why /opt/
?" That's because all of the layer files are placed into that directory for the lambda function to use once it's mounted at runtime.
Here's the algorithm:
- If a file is clean, then it is tagged with a key/value pair of
av-status = 'clean'
- If a file is NOT clean (virus), then it is tagged with a key/value pair of
av-status = 'dirty'
Since this is for demonstration purposes, you can obviously customize this flow however you'd like. 😀
Deploying
Now that we have all of our files, we can do the following:
- Build the binaries via
./build.sh
(make sure it's executable viachmod +x build.sh
after creation) - Run
sls deploy
joseph@bertha > sls deploy
Serverless: Packaging service...
Serverless: Excluding development dependencies...
Serverless: Excluding development dependencies...
Serverless: Creating Stack...
Serverless: Checking Stack create progress...
........
Serverless: Stack create finished...
Serverless: Uploading CloudFormation file to S3...
Serverless: Uploading artifacts...
Serverless: Uploading service clambda-av.zip file to S3 (41 KB)...
Serverless: Uploading service clamav.zip file to S3 (222.57 MB)...
Serverless: Validating template...
Serverless: Updating Stack...
Serverless: Checking Stack update progress...
........................
Serverless: Stack update finished...
Service Information
service: clambda-av
stage: dev
region: us-east-1
stack: clambda-av-dev
resources: 9
api keys:
None
endpoints:
None
functions:
virusScan: clambda-av-dev-virusScan
layers:
clamav: arn:aws:lambda:us-east-1:**********:layer:clamav:8
Now, let's test it by uploading a clean file to S3. I happen to have a PDF laying around:
joseph@bertha > aws s3 cp ~/document.pdf s3://clambda-av-files/
upload: ../../document.pdf to s3://clambda-av-files/document.pdf
The downside is that clamscan
in the Lambda layer takes ~30 seconds or so boot, load the virus definitions, and scan the file with the results. We can check the tag after some time via:
joseph@bertha > aws s3api get-object-tagging --bucket clambda-av-files --key document.pdf
{
"TagSet": [
{
"Key": "av-status",
"Value": "clean"
}
]
}
We can test the virus scanner with a test virus signature found at EICAR:
X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*
After saving that text to a file called test-virus.pdf
. Let's upload it and see what happens:
joseph@bertha > aws s3 cp ~/test-virus.pdf s3://clambda-av-files/
upload: ../../test-virus.pdf to s3://clambda-av-files/test-virus.pdf
After waiting another thirty seconds or so, let's check the tag on it:
joseph@bertha > aws s3api get-object-tagging --bucket clambda-av-files --key test-virus.pdf
{
"TagSet": [
{
"Key": "av-status",
"Value": "dirty"
}
]
}
Drawbacks and potential problems
There are a few drawbacks to this, especially since the size of the Lambda layer is quite large:
- File size constraints due to the
/tmp
means you cannot upload a file larger than 512MB. - ClamAV virus definitions will no doubt get larger, thus potentially interfering with the maximum deployment size (which is 250MB, including layers, as of 8/11/2021).
-
Lambda code storage limitation may eventually be reached with consecutive deployments, although this can be mitigated with the
serverless-prune-plugin
.
Thanks
Thank y'all for reading. If you have any questions, feel free to ask in the comments! I also welcome suggestions. 🙂
Top comments (11)
I googled this in a hackathon when I had no idea how I was gonna solve this problem, so big thanks ;)
BTW, I had to change freshclam.conf so DatabaseDirectory is
/home/build/var/lib/clamav
in order to get the Docker build to work./opt/var/lib/clamav
is still used in the Lambda function.Ah, good catch! Sorry, I've been inactive here in the midst of the holidays and switching jobs. Thanks for that, I'll update the code.
Hi @matt Morgan
I still see /home/build/opt/var/lib/clamav in DatabaseDirectory, should we change this to /home/build/var/lib/clamav ?
Awesome article!! Do you have anything around updating the virus definitions on a daily basis?
Easily updating them would potentially be expensive (because of the whole versioning shenanigans), because you'd have to redeploy the lambda layer each time. I'd recommend looking into an EC2 / Fargate solution for that and I have an article outlined that I need to actually write and push out. It's just very lengthy, and it's a lot of Terraform work to explain -- I'm working on splitting it up.
github.com/bluesentry/bucket-antiv...
Yup, for a Python solution, that looks cool. Still limited by Lambda storage, funfortunately.
Great post! It was extremely helpful. For me, the code didn’t work for a Lambda runtime of NodeJS 18 or higher (not terribly surprising given the age of this article). Do you happen to know how the Docker file or ClamAV binaries would need to change in order to work with those newer runtimes?
I'm with the same problem! hehe. In my case, the clamscan cant be executed inside of lambda (/bin/sh: clamscan: command not found). But I tried it on docker and works.
If you can resolve it, please share here!
Looks like it can't find the file. Might have to reconfigure the path to get things to play nicely -- this article is over two years old, and I have no idea what OS you're running.