In my previous post, I had used a lambda function with a designated lambda layer. The layer's binaries were created within a Docker image based from Amazon's amazonlinux:2
image. We can use those binaries in conjunction with AWS's lambda container images feature without much worry of deployment size limitations as we did with the lambda function and its layer.
History
For those that did not read the previous post, this is going to establish an S3 bucket with an event trigger towards a lambda function. This lambda function will be a container with the handler code and ClamAV binaries and virus definitions. It will get the S3 object via the metadata in the trigger, scan it, and mark it as clean or dirty per the results of the ClamAV scan.
TLDR: Here's the GitHub repository.
Infrastructure
This is obviously going to be different - instead of using a lambda layer, we'll be using a Docker image stored on ECR. This is nearly effortless, thanks to Serverless.
Serverless
By default, Server will create an ECR repository for us and the image will live in it. All we have to do is give it the path of the Dockerfile
.
service: clambda-av
provider:
name: aws
runtime: nodejs14.x
ecr:
images:
clambdaAv:
path: ./
iamRoleStatements:
- Effect: Allow
Action:
- s3:GetObject
- s3:PutObjectTagging
Resource: "arn:aws:s3:::clambda-av-files/*"
functions:
virusScan:
image:
name: clambdaAv
memorySize: 2048
events:
- s3:
bucket: clambda-av-files
event: s3:ObjectCreated:*
timeout: 120
package:
exclude:
- node_modules/**
- coverage/**
Dockerfile
Since we're using Javascript, we'll be using the nodejs14 image
as the base. Unfortunately, we cannot easily install our ClamAV binaries through this image and thus have to use the amazonlinux:2
image, as stated above. Fortunately, Docker allows us do that with ease via multi-stage Docker builds. I've never done this up until now, but it was a pretty quick and interesting process:
FROM amazonlinux:2 AS layer-image
WORKDIR /home/build
RUN set -e
RUN echo "Prepping ClamAV"
RUN rm -rf bin
RUN rm -rf lib
RUN yum update -y
RUN amazon-linux-extras install epel -y
RUN yum install -y cpio yum-utils tar.x86_64 gzip zip
RUN yumdownloader -x \*i686 --archlist=x86_64 clamav
RUN rpm2cpio clamav-0*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 clamav-lib
RUN rpm2cpio clamav-lib*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 clamav-update
RUN rpm2cpio clamav-update*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 json-c
RUN rpm2cpio json-c*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 pcre2
RUN rpm2cpio pcre*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 libtool-ltdl
RUN rpm2cpio libtool-ltdl*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 libxml2
RUN rpm2cpio libxml2*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 bzip2-libs
RUN rpm2cpio bzip2-libs*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 xz-libs
RUN rpm2cpio xz-libs*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 libprelude
RUN rpm2cpio libprelude*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 gnutls
RUN rpm2cpio gnutls*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 nettle
RUN rpm2cpio nettle*.rpm | cpio -vimd
RUN mkdir -p bin
RUN mkdir -p lib
RUN mkdir -p var/lib/clamav
RUN chmod -R 777 var/lib/clamav
COPY ./freshclam.conf .
RUN cp usr/bin/clamscan usr/bin/freshclam bin/.
RUN cp usr/lib64/* lib/.
RUN cp freshclam.conf bin/freshclam.conf
RUN yum install shadow-utils.x86_64 -y
RUN groupadd clamav
RUN useradd -g clamav -s /bin/false -c "Clam Antivirus" clamav
RUN useradd -g clamav -s /bin/false -c "Clam Antivirus" clamupdate
RUN LD_LIBRARY_PATH=./lib ./bin/freshclam --config-file=bin/freshclam.conf
FROM public.ecr.aws/lambda/nodejs:14
COPY --from=layer-image /home/build ./
COPY handler.js ./
CMD ["handler.virusScan"]
This Dockerfile does two things:
- Builds the ClamAV binaries into a stage aliased
layer-image
along with the ClamAV virus definitions - Builds the Lambda image with the handler itself, and then pulls in the ClamAV binaries and virus definitions from the
layer-image
stage
Handler
This doesn't change the handler much from my previous post:
const { execSync } = require("child_process");
const { writeFileSync, unlinkSync } = require("fs");
const AWS = require("aws-sdk");
const s3 = new AWS.S3();
module.exports.virusScan = async (event, context) => {
if (!event.Records) {
console.log("Not an S3 event invocation!");
return;
}
for (const record of event.Records) {
if (!record.s3) {
console.log("Not an S3 Record!");
continue;
}
// get the file
const s3Object = await s3
.getObject({
Bucket: record.s3.bucket.name,
Key: record.s3.object.key
})
.promise();
// write file to disk
writeFileSync(`/tmp/${record.s3.object.key}`, s3Object.Body);
try {
// scan it
execSync(`./bin/clamscan --database=./var/lib/clamav /tmp/${record.s3.object.key}`);
await s3
.putObjectTagging({
Bucket: record.s3.bucket.name,
Key: record.s3.object.key,
Tagging: {
TagSet: [
{
Key: 'av-status',
Value: 'clean'
}
]
}
})
.promise();
} catch(err) {
if (err.status === 1) {
// tag as dirty, OR you can delete it
await s3
.putObjectTagging({
Bucket: record.s3.bucket.name,
Key: record.s3.object.key,
Tagging: {
TagSet: [
{
Key: 'av-status',
Value: 'dirty'
}
]
}
})
.promise();
}
}
// delete the temp file
unlinkSync(`/tmp/${record.s3.object.key}`);
}
};
Summary
From our previous adventure (this is the last time I'm linking it, I swear), this removes the extra step of building the binaries with a bash script. It also removes the need for a lambda layer.
If you'd like to check out the full code, again, it's in the GitHub repository. Please do not hesitate to ask questions or post any comments or issues you may have in this article or by opening an issue on the repository if applicable. Thanks for reading!
Top comments (17)
I have a question, hope you might know more than me on this. Been stumped by this for a few days and only finding this post which is absolutly awesome btw. You've really saved me here. I'm am trying to do the same thing minus the serverless stuff. I want to be able to use the API in a TypeScript project like this
`
import ClamScan from 'clamscan';
class ClamScanGetter {
static initClamScan() {
const clamScanOpts: ClamScan.Options = {
clamscan: {
active: true,
},
preference: 'clamscan'
}
return new ClamScan().init(clamScanOpts);
}
}
const clamScan = await ClamScanGetter.initClamScan();
const scan = await clamScan.scanDir("/tmp");
`
If I install it this way would you know if this would work ?
As long as it's deployed and the binary is accessible, it should work.
Thank you for sharing this great article, I am trying this in Java, How second stage of dockerfile recognizes installed package from layer-image? It just copies files over, but when I try to execute clamscan inside the image it does not see it as an executable. Looks like in multistage docker we can not run packages installed in an earlier stage, am I wrong here?
Have you done in java? if so can please share the java code if possible here.
i'm also trying in java with jenkins pipeline
This wasn't created in java, nor are there any plans to do it in java; however, it shouldn't be too difficult. 🙂
Hey Joseph, I was playing around and got it to work almost immediately, huge thanks to you! Now my question is, is it normal that it takes around 30 seconds to scan a small file (image of 60 kb)? Or might there anything be wrong with my setup?
The reason being is because for each file scanned, the ClamAV binary has to boot up and load its virus definitions -- it taking ~30 seconds to do so sounds 100% accurate.
edit: The scans themselves take < 1s. It's always the initial boot.
I have plans to write another Serverless article on having a ClamAV daemon running in a Fargate-configured container sometime either before the end of this year or sometime next year. I've just been rather busy this year, as life tends to take up most of my time. 😁
I do have a Terraform one here, however: dev.to/sutt0n/scanning-files-with-...
I was doing a console.log before and after execSync and there was a difference of 25 sec. Cold starting the lambda even took 60 seconds, I guess these extra 35 sec are what you mean by "boot up", but even a warm lambda took so long because (it seems like) the scanning itself takes forever. Appreciate your other post and will take a look, but I really liked the simplicity of this lambda which fits our needs perfectly.
"Boot up" meaning the ClamAV binary has to first load the virus definitions before it scans the file upon execution, which takes around ~30 seconds. That's normal, funfortunately, ha.
Hi Joseph, according to the README this: git clone github.com/sutt0n/serverless-clama...
is asking me for the username and password so I do not have it to clone it.
The repo in the README wasn't correct. Thanks for letting me know.
The yumdownloader is unable to find the clamv packages now.
Mac? Not x86 machine?
Specify the platform FROM --platform=linux/amd64 amazonlinux:2
stackoverflow.com/questions/738262...
My apologies for not paying attention to this, my notifications seem skewed. I'll update the repo code with this -- thank you!
Edit: This was actually done on a Mac initially. I'm going to assume that since M1 was released, some packages' archs have been modified / changed a bit.
I updated a working copy to AWS SDK3. github.com/jrobens/serverless-clam.... Change to build.sh.
I also updated to typscript/v3 AWS SDK.
this is awesome! So the virus definitions are updated during build?
Sorry for the late response -- yes, IIRC, the virus definitions are only updated when it's built/deployed.