DEV Community

Joseph Sutton
Joseph Sutton

Posted on • Updated on

Using Serverless to Scan Files with a ClamAV Lambda Layer

Update: I've written how to do this with lambda containers as well!

Let's create an environment that scans a file via an S3 event by utilizing ClamAV binaries on a Lambda layer. You can retrieve the full source code at this GitHub repository.

Note: As of 8/11/2021, the unzipped size of the ClamAV binaries and virus definitions are under the limit. This may change in the future.

alt text

Serverless config

So, we'll need a few things: an S3 bucket, a function, and a lambda layer. I've also included logging groups, and permissions in the serverless.yml file:



service: clambda-av

provider:
  name: aws
  runtime: nodejs14.x
  iamRoleStatements:
    - Effect: Allow
      Action:
        - s3:GetObject
        - s3:PutObjectTagging
      Resource: "arn:aws:s3:::clambda-av-files/*"

functions:
  virusScan:
    handler: handler.virusScan
    memorySize: 2048
    events:
      - s3: 
          bucket: clambda-av-files
          event: s3:ObjectCreated:*
    layers:
      - {Ref: ClamavLambdaLayer}
    timeout: 120

package:
  exclude:
    - node_modules/**
    - coverage/**

layers:
  clamav:
    path: layer


Enter fullscreen mode Exit fullscreen mode

Dockerfile

Before we deploy this, we need to get our ClamAV binaries. Amazon has their own Docker base image we can use to build these binaries. With the base image, we can start making our binaries. Through the power of trial and error, I've found the necessary binaries that's required. Here's the full Dockerfile:



FROM amazonlinux:2

WORKDIR /home/build

RUN set -e

RUN echo "Prepping ClamAV"

RUN rm -rf bin
RUN rm -rf lib

RUN yum update -y
RUN amazon-linux-extras install epel -y
RUN yum install -y cpio yum-utils tar.x86_64 gzip zip

RUN yumdownloader -x \*i686 --archlist=x86_64 clamav
RUN rpm2cpio clamav-0*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 clamav-lib
RUN rpm2cpio clamav-lib*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 clamav-update
RUN rpm2cpio clamav-update*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 json-c
RUN rpm2cpio json-c*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 pcre2
RUN rpm2cpio pcre*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 libtool-ltdl
RUN rpm2cpio libtool-ltdl*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 libxml2
RUN rpm2cpio libxml2*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 bzip2-libs
RUN rpm2cpio bzip2-libs*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 xz-libs
RUN rpm2cpio xz-libs*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 libprelude
RUN rpm2cpio libprelude*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 gnutls
RUN rpm2cpio gnutls*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 nettle
RUN rpm2cpio nettle*.rpm | cpio -vimd

RUN mkdir -p bin
RUN mkdir -p lib
RUN mkdir -p var/lib/clamav
RUN chmod -R 777 var/lib/clamav

COPY ./freshclam.conf .

RUN cp usr/bin/clamscan usr/bin/freshclam bin/.
RUN cp usr/lib64/* lib/.
RUN cp freshclam.conf bin/freshclam.conf

RUN yum install shadow-utils.x86_64 -y

RUN groupadd clamav
RUN useradd -g clamav -s /bin/false -c "Clam Antivirus" clamav
RUN useradd -g clamav -s /bin/false -c "Clam Antivirus" clamupdate

RUN LD_LIBRARY_PATH=./lib ./bin/freshclam --config-file=bin/freshclam.conf

RUN zip -r9 clamav_lambda_layer.zip bin
RUN zip -r9 clamav_lambda_layer.zip lib
RUN zip -r9 clamav_lambda_layer.zip var
RUN zip -r9 clamav_lambda_layer.zip etc


Enter fullscreen mode Exit fullscreen mode

You'll note that the COPY ./freshclam.conf . line implies that there's another file that we need and you'd be correct:



DatabaseMirror database.clamav.net
CompressLocalDatabase yes
ScriptedUpdates no
DatabaseDirectory /home/build/var/lib/clamav


Enter fullscreen mode Exit fullscreen mode

Building the binaries with Docker

Next is our bash script build.sh to run Docker on the Dockerfile above to build and extract the binaries:



#!/bin/bash

rm -rf ./layer
mkdir layer

docker build -t clamav -f Dockerfile .
docker run --name clamav clamav
docker cp clamav:/home/build/clamav_lambda_layer.zip .
docker rm clamav
mv clamav_lambda_layer.zip ./layer

pushd layer
unzip -n clamav_lambda_layer.zip
rm clamav_lambda_layer.zip
popd


Enter fullscreen mode Exit fullscreen mode

Scanning Handler

Furthermore, we also need our handler to do the scanning and tagging of the file. The tagging is more of a placeholder. You can create a separate quarantine bucket or a separate clean bucket - whichever you prefer.

Now, we need our handler and we should be set:



const { execSync } = require("child_process");
const { writeFileSync, unlinkSync } = require("fs");
const AWS = require("aws-sdk");

const s3 = new AWS.S3();

module.exports.virusScan = async (event, context) => {
  if (!event.Records) {
    console.log("Not an S3 event invocation!");
    return;
  }

  for (const record of event.Records) {
    if (!record.s3) {
      console.log("Not an S3 Record!");
      continue;
    }

    // get the file
    const s3Object = await s3
      .getObject({
        Bucket: record.s3.bucket.name,
        Key: record.s3.object.key
      })
      .promise();

    // write file to disk
    writeFileSync(`/tmp/${record.s3.object.key}`, s3Object.Body);

    try { 
      // scan it
      const scanStatus = execSync(`clamscan --database=/opt/var/lib/clamav /tmp/${record.s3.object.key}`);

      await s3
        .putObjectTagging({
          Bucket: record.s3.bucket.name,
          Key: record.s3.object.key,
          Tagging: {
            TagSet: [
              {
                Key: 'av-status',
                Value: 'clean'
              }
            ]
          }
        })
        .promise();
    } catch(err) {
      if (err.status === 1) {
        // tag as dirty, OR you can delete it
        await s3
          .putObjectTagging({
            Bucket: record.s3.bucket.name,
            Key: record.s3.object.key,
            Tagging: {
              TagSet: [
                {
                  Key: 'av-status',
                  Value: 'dirty'
                }
              ]
            }
          })
          .promise();
      }
    }

    // delete the temp file
    unlinkSync(`/tmp/${record.s3.object.key}`);
  }
};


Enter fullscreen mode Exit fullscreen mode

You may be asking yourself on the --database option above, "Wait... why /opt/?" That's because all of the layer files are placed into that directory for the lambda function to use once it's mounted at runtime.

Here's the algorithm:

  • If a file is clean, then it is tagged with a key/value pair of av-status = 'clean'
  • If a file is NOT clean (virus), then it is tagged with a key/value pair of av-status = 'dirty'

Since this is for demonstration purposes, you can obviously customize this flow however you'd like. 😀

Deploying

Now that we have all of our files, we can do the following:

  1. Build the binaries via ./build.sh (make sure it's executable via chmod +x build.sh after creation)
  2. Run sls deploy


joseph@bertha > sls deploy
Serverless: Packaging service...
Serverless: Excluding development dependencies...
Serverless: Excluding development dependencies...
Serverless: Creating Stack...
Serverless: Checking Stack create progress...
........
Serverless: Stack create finished...
Serverless: Uploading CloudFormation file to S3...
Serverless: Uploading artifacts...
Serverless: Uploading service clambda-av.zip file to S3 (41 KB)...
Serverless: Uploading service clamav.zip file to S3 (222.57 MB)...
Serverless: Validating template...
Serverless: Updating Stack...
Serverless: Checking Stack update progress...
........................
Serverless: Stack update finished...
Service Information
service: clambda-av
stage: dev
region: us-east-1
stack: clambda-av-dev
resources: 9
api keys:
  None
endpoints:
  None
functions:
  virusScan: clambda-av-dev-virusScan
layers:
  clamav: arn:aws:lambda:us-east-1:**********:layer:clamav:8


Enter fullscreen mode Exit fullscreen mode

Now, let's test it by uploading a clean file to S3. I happen to have a PDF laying around:



joseph@bertha > aws s3 cp ~/document.pdf s3://clambda-av-files/
upload: ../../document.pdf to s3://clambda-av-files/document.pdf


Enter fullscreen mode Exit fullscreen mode

The downside is that clamscan in the Lambda layer takes ~30 seconds or so boot, load the virus definitions, and scan the file with the results. We can check the tag after some time via:



joseph@bertha > aws s3api get-object-tagging --bucket clambda-av-files --key document.pdf

{
    "TagSet": [
        {
            "Key": "av-status",
            "Value": "clean"
        }
    ]
}


Enter fullscreen mode Exit fullscreen mode

We can test the virus scanner with a test virus signature found at EICAR:



X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*


Enter fullscreen mode Exit fullscreen mode

After saving that text to a file called test-virus.pdf. Let's upload it and see what happens:



joseph@bertha > aws s3 cp ~/test-virus.pdf s3://clambda-av-files/
upload: ../../test-virus.pdf to s3://clambda-av-files/test-virus.pdf


Enter fullscreen mode Exit fullscreen mode

After waiting another thirty seconds or so, let's check the tag on it:



joseph@bertha > aws s3api get-object-tagging --bucket clambda-av-files --key test-virus.pdf

{
"TagSet": [
{
"Key": "av-status",
"Value": "dirty"
}
]
}

Enter fullscreen mode Exit fullscreen mode




Drawbacks and potential problems

There are a few drawbacks to this, especially since the size of the Lambda layer is quite large:

  1. File size constraints due to the /tmp means you cannot upload a file larger than 512MB.
  2. ClamAV virus definitions will no doubt get larger, thus potentially interfering with the maximum deployment size (which is 250MB, including layers, as of 8/11/2021).
  3. Lambda code storage limitation may eventually be reached with consecutive deployments, although this can be mitigated with the serverless-prune-plugin.

Thanks

Thank y'all for reading. If you have any questions, feel free to ask in the comments! I also welcome suggestions. 🙂

Top comments (11)

Collapse
 
elthrasher profile image
Matt Morgan

I googled this in a hackathon when I had no idea how I was gonna solve this problem, so big thanks ;)

BTW, I had to change freshclam.conf so DatabaseDirectory is /home/build/var/lib/clamav in order to get the Docker build to work. /opt/var/lib/clamav is still used in the Lambda function.

Collapse
 
sutt0n profile image
Joseph Sutton

Ah, good catch! Sorry, I've been inactive here in the midst of the holidays and switching jobs. Thanks for that, I'll update the code.

Collapse
 
rajashekhar29 profile image
Rajashekhar29

Hi @matt Morgan
I still see /home/build/opt/var/lib/clamav in DatabaseDirectory, should we change this to /home/build/var/lib/clamav ?

Collapse
 
redstone78 profile image
redstone78 • Edited

Awesome article!! Do you have anything around updating the virus definitions on a daily basis?

Collapse
 
sutt0n profile image
Joseph Sutton • Edited

Easily updating them would potentially be expensive (because of the whole versioning shenanigans), because you'd have to redeploy the lambda layer each time. I'd recommend looking into an EC2 / Fargate solution for that and I have an article outlined that I need to actually write and push out. It's just very lengthy, and it's a lot of Terraform work to explain -- I'm working on splitting it up.

Collapse
 
muthu profile image
Muthu
Collapse
 
sutt0n profile image
Joseph Sutton

Yup, for a Python solution, that looks cool. Still limited by Lambda storage, funfortunately.

Collapse
 
programkr19 profile image
Kenneth

Great post! It was extremely helpful. For me, the code didn’t work for a Lambda runtime of NodeJS 18 or higher (not terribly surprising given the age of this article). Do you happen to know how the Docker file or ClamAV binaries would need to change in order to work with those newer runtimes?

Collapse
 
marcelheeinrich profile image
Marcel Heinrich

I'm with the same problem! hehe. In my case, the clamscan cant be executed inside of lambda (/bin/sh: clamscan: command not found). But I tried it on docker and works.
If you can resolve it, please share here!

Collapse
 
Sloan, the sloth mascot
Comment deleted
Collapse
 
sutt0n profile image
Joseph Sutton

Looks like it can't find the file. Might have to reconfigure the path to get things to play nicely -- this article is over two years old, and I have no idea what OS you're running.