Joseph Sutton

Posted on Aug 19, 2021

Using Serverless to Scan Files with ClamAV in a Lambda Container

#aws #javascript #serverless #security

In my previous post, I had used a lambda function with a designated lambda layer. The layer's binaries were created within a Docker image based from Amazon's amazonlinux:2 image. We can use those binaries in conjunction with AWS's lambda container images feature without much worry of deployment size limitations as we did with the lambda function and its layer.

History

For those that did not read the previous post, this is going to establish an S3 bucket with an event trigger towards a lambda function. This lambda function will be a container with the handler code and ClamAV binaries and virus definitions. It will get the S3 object via the metadata in the trigger, scan it, and mark it as clean or dirty per the results of the ClamAV scan.

TLDR: Here's the GitHub repository.

Infrastructure

This is obviously going to be different - instead of using a lambda layer, we'll be using a Docker image stored on ECR. This is nearly effortless, thanks to Serverless.

Serverless

By default, Server will create an ECR repository for us and the image will live in it. All we have to do is give it the path of the Dockerfile.

service: clambda-av

provider:
  name: aws
  runtime: nodejs14.x
  ecr:
    images:
      clambdaAv:
        path: ./
  iamRoleStatements:
    - Effect: Allow
      Action:
        - s3:GetObject
        - s3:PutObjectTagging
      Resource: "arn:aws:s3:::clambda-av-files/*"

functions:
  virusScan:
    image:
      name: clambdaAv
    memorySize: 2048
    events:
      - s3: 
          bucket: clambda-av-files
          event: s3:ObjectCreated:*
    timeout: 120

package:
  exclude:
    - node_modules/**
    - coverage/**

Dockerfile

Since we're using Javascript, we'll be using the nodejs14 image as the base. Unfortunately, we cannot easily install our ClamAV binaries through this image and thus have to use the amazonlinux:2 image, as stated above. Fortunately, Docker allows us do that with ease via multi-stage Docker builds. I've never done this up until now, but it was a pretty quick and interesting process:

FROM amazonlinux:2 AS layer-image

WORKDIR /home/build

RUN set -e

RUN echo "Prepping ClamAV"

RUN rm -rf bin
RUN rm -rf lib

RUN yum update -y
RUN amazon-linux-extras install epel -y
RUN yum install -y cpio yum-utils tar.x86_64 gzip zip

RUN yumdownloader -x \*i686 --archlist=x86_64 clamav
RUN rpm2cpio clamav-0*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 clamav-lib
RUN rpm2cpio clamav-lib*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 clamav-update
RUN rpm2cpio clamav-update*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 json-c
RUN rpm2cpio json-c*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 pcre2
RUN rpm2cpio pcre*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 libtool-ltdl
RUN rpm2cpio libtool-ltdl*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 libxml2
RUN rpm2cpio libxml2*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 bzip2-libs
RUN rpm2cpio bzip2-libs*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 xz-libs
RUN rpm2cpio xz-libs*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 libprelude
RUN rpm2cpio libprelude*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 gnutls
RUN rpm2cpio gnutls*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 nettle
RUN rpm2cpio nettle*.rpm | cpio -vimd

RUN mkdir -p bin
RUN mkdir -p lib
RUN mkdir -p var/lib/clamav
RUN chmod -R 777 var/lib/clamav

COPY ./freshclam.conf .

RUN cp usr/bin/clamscan usr/bin/freshclam bin/.
RUN cp usr/lib64/* lib/.
RUN cp freshclam.conf bin/freshclam.conf

RUN yum install shadow-utils.x86_64 -y

RUN groupadd clamav
RUN useradd -g clamav -s /bin/false -c "Clam Antivirus" clamav
RUN useradd -g clamav -s /bin/false -c "Clam Antivirus" clamupdate

RUN LD_LIBRARY_PATH=./lib ./bin/freshclam --config-file=bin/freshclam.conf

FROM public.ecr.aws/lambda/nodejs:14

COPY --from=layer-image /home/build ./

COPY handler.js ./

CMD ["handler.virusScan"]

This Dockerfile does two things:

Builds the ClamAV binaries into a stage aliased layer-image along with the ClamAV virus definitions
Builds the Lambda image with the handler itself, and then pulls in the ClamAV binaries and virus definitions from the layer-image stage

Handler

This doesn't change the handler much from my previous post:

const { execSync } = require("child_process");
const { writeFileSync, unlinkSync } = require("fs");
const AWS = require("aws-sdk");

const s3 = new AWS.S3();

module.exports.virusScan = async (event, context) => {
  if (!event.Records) {
    console.log("Not an S3 event invocation!");
    return;
  }

  for (const record of event.Records) {
    if (!record.s3) {
      console.log("Not an S3 Record!");
      continue;
    }

    // get the file
    const s3Object = await s3
      .getObject({
        Bucket: record.s3.bucket.name,
        Key: record.s3.object.key
      })
      .promise();

    // write file to disk
    writeFileSync(`/tmp/${record.s3.object.key}`, s3Object.Body);

    try { 
      // scan it
      execSync(`./bin/clamscan --database=./var/lib/clamav /tmp/${record.s3.object.key}`);

      await s3
        .putObjectTagging({
          Bucket: record.s3.bucket.name,
          Key: record.s3.object.key,
          Tagging: {
            TagSet: [
              {
                Key: 'av-status',
                Value: 'clean'
              }
            ]
          }
        })
        .promise();
    } catch(err) {
      if (err.status === 1) {
        // tag as dirty, OR you can delete it
        await s3
          .putObjectTagging({
            Bucket: record.s3.bucket.name,
            Key: record.s3.object.key,
            Tagging: {
              TagSet: [
                {
                  Key: 'av-status',
                  Value: 'dirty'
                }
              ]
            }
          })
          .promise();
      }
    }

    // delete the temp file
    unlinkSync(`/tmp/${record.s3.object.key}`);
  }
};

Summary

From our previous adventure (this is the last time I'm linking it, I swear), this removes the extra step of building the binaries with a bash script. It also removes the need for a lambda layer.

If you'd like to check out the full code, again, it's in the GitHub repository. Please do not hesitate to ask questions or post any comments or issues you may have in this article or by opening an issue on the repository if applicable. Thanks for reading!

Top comments (17)

Michael Mc Daid • Oct 19 '23

I have a question, hope you might know more than me on this. Been stumped by this for a few days and only finding this post which is absolutly awesome btw. You've really saved me here. I'm am trying to do the same thing minus the serverless stuff. I want to be able to use the API in a TypeScript project like this
`
import ClamScan from 'clamscan';

class ClamScanGetter {
static initClamScan() {
const clamScanOpts: ClamScan.Options = {
clamscan: {
active: true,
},
preference: 'clamscan'
}
return new ClamScan().init(clamScanOpts);
}
}

const clamScan = await ClamScanGetter.initClamScan();
const scan = await clamScan.scanDir("/tmp");
`
If I install it this way would you know if this would work ?

Joseph Sutton • Nov 28 '23

As long as it's deployed and the binary is accessible, it should work.

Marc Schleeweiß • Dec 20 '22

Hey Joseph, I was playing around and got it to work almost immediately, huge thanks to you! Now my question is, is it normal that it takes around 30 seconds to scan a small file (image of 60 kb)? Or might there anything be wrong with my setup?

Joseph Sutton • Dec 20 '22 • Edited

The reason being is because for each file scanned, the ClamAV binary has to boot up and load its virus definitions -- it taking ~30 seconds to do so sounds 100% accurate.

edit: The scans themselves take < 1s. It's always the initial boot.

I have plans to write another Serverless article on having a ClamAV daemon running in a Fargate-configured container sometime either before the end of this year or sometime next year. I've just been rather busy this year, as life tends to take up most of my time. 😁

I do have a Terraform one here, however: dev.to/sutt0n/scanning-files-with-...

Marc Schleeweiß • Dec 21 '22

I was doing a console.log before and after execSync and there was a difference of 25 sec. Cold starting the lambda even took 60 seconds, I guess these extra 35 sec are what you mean by "boot up", but even a warm lambda took so long because (it seems like) the scanning itself takes forever. Appreciate your other post and will take a look, but I really liked the simplicity of this lambda which fits our needs perfectly.

Joseph Sutton • Dec 21 '22 • Edited

"Boot up" meaning the ClamAV binary has to first load the virus definitions before it scans the file upon execution, which takes around ~30 seconds. That's normal, funfortunately, ha.

AmRf • Dec 2 '22

Thank you for sharing this great article, I am trying this in Java, How second stage of dockerfile recognizes installed package from layer-image? It just copies files over, but when I try to execute clamscan inside the image it does not see it as an executable. Looks like in multistage docker we can not run packages installed in an earlier stage, am I wrong here?

shyjurahim • May 19 '23

Have you done in java? if so can please share the java code if possible here.
i'm also trying in java with jenkins pipeline

Joseph Sutton • May 31 '23

This wasn't created in java, nor are there any plans to do it in java; however, it shouldn't be too difficult. 🙂

amabrouk-zaizi • May 30 '23

Hi Joseph, according to the README this: git clone github.com/sutt0n/serverless-clama...
is asking me for the username and password so I do not have it to clone it.

Joseph Sutton • May 31 '23 • Edited

The repo in the README wasn't correct. Thanks for letting me know.

kiruba3441 • Sep 12 '22

The yumdownloader is unable to find the clamv packages now.

Dude • Sep 24 '22

Mac? Not x86 machine?

Specify the platform FROM --platform=linux/amd64 amazonlinux:2

stackoverflow.com/questions/738262...

Joseph Sutton • Oct 10 '22 • Edited

My apologies for not paying attention to this, my notifications seem skewed. I'll update the repo code with this -- thank you!

Edit: This was actually done on a Mac initially. I'm going to assume that since M1 was released, some packages' archs have been modified / changed a bit.