DEV Community

Gordon Johnston for Lineup Ninja

Posted on • Edited on

Zip files on S3 with AWS Lambda and Node

This post was updated 20 Sept 2022 to improve reliability with large numbers of files.

  • Update the stream handling so streams are only opened to S3 when the file is ready to be processed by the Zip Archiver. This fixes timeouts that could be seen when processing a large number of files.
  • Use keep alive with S3 and limit connected sockets.

It's not an uncommon requirement to want to package files on S3 into a Zip file for a user to download multiple files in a single package. Maybe it's common enough for AWS to offer this functionality themselves one day. Until then you can write a short script to do it.

If you want to provide this service in a serverless environment such as AWS Lambda you have two main constraints that define the approach you can take.

1 - /tmp is only 512Mb. Your first idea might be to download the files from S3, zip them up, upload the result. This will work fine until you fill up /tmp with the temporary files!

2 - Memory is constrained to 3GB. You could store the temporary files on the heap, but again you are constrained to 3GB. Even in a regular server environment you're not going to want a simple zip function to take 3GB of RAM!

So what can you do? The answer is to stream the data from S3, through an archiver and back onto S3.

Fortunately this Stack Overflow post and its comments pointed the way and this post is basically a rehash of it!

The below code is Typescript but the Javascript is just the same with the types removed.

Start with the imports you need

import * as Archiver from 'archiver';
import * as AWS from 'aws-sdk';
import { createReadStream } from 'fs';
import { Readable, Stream } from 'stream';
import * as lazystream from 'lazystream';
Enter fullscreen mode Exit fullscreen mode

Firstly configure the aws-sdk so that it will use keepalives when communicating with S3, and also limit the maximum number of connections. This improves efficiency and helps avoid hitting an unexpected connection limit. Instead of this section you could set AWS_NODEJS_CONNECTION_REUSE_ENABLED in your lambda environment.

    // Set the S3 config to use keep-alives
    const agent = new https.Agent({ keepAlive: true, maxSockets: 16 });

    AWS.config.update({ httpOptions: { agent } });
Enter fullscreen mode Exit fullscreen mode

Let's start by creating the streams to fetch the data from S3. To prevent timeouts to S3 the streams are wrapped with 'lazystream', this delays the actual opening of the stream until the archiver is ready to read the data.

Let's assume you have a list of keys in keys. For each key we need to create a ReadStream. To track the keys and streams lets create a S3DownloadStreamDetails type. The 'filename' will ultimately be the filename in the Zip, so you can do any transformation you need for that at this stage.

    type S3DownloadStreamDetails = { stream: Readable; filename: string };
Enter fullscreen mode Exit fullscreen mode

Now for our array of keys, we can iterate after it to create the S3StreamDetails objects

    const s3DownloadStreams: S3DownloadStreamDetails[] = keys.map((key: string) => {
        return {
            stream: new lazystream.Readable(() => {
                console.log(`Creating read stream for ${fileToDownload.key}`);
                return s3.getObject({ Bucket: s3UGCBucket, Key: fileToDownload.key }).createReadStream();
            }),
            filename: key,
        };
    });
Enter fullscreen mode Exit fullscreen mode

Now prepare the upload side by creating a Stream.PassThrough object and assigning that as the Body of the params for a S3.PutObjectRequest.


    const streamPassThrough = new Stream.PassThrough();
    const params: AWS.S3.PutObjectRequest = {
        ACL: 'private',
        Body: streamPassThrough
        Bucket: 'Bucket Name',
        ContentType: 'application/zip',
        Key: 'The Key on S3',
        StorageClass: 'STANDARD_IA', // Or as appropriate
    };

Enter fullscreen mode Exit fullscreen mode

Now we can start the upload process.

    const s3Upload = s3.upload(params, (error: Error): void => {
        if (error) {
            console.error(`Got error creating stream to s3 ${error.name} ${error.message} ${error.stack}`);
            throw error;
        }
    });

Enter fullscreen mode Exit fullscreen mode

If you want to monitor the upload process, for example to give feedback to users then you can attach a handler to httpUploadProgress like this.

    s3Upload.on('httpUploadProgress', (progress: { loaded: number; total: number; part: number; key: string }): void => {
        console.log(progress); // { loaded: 4915, total: 192915, part: 1, key: 'foo.jpg' }
    });
Enter fullscreen mode Exit fullscreen mode

Now create the archiver

    const archive = Archiver('zip');
    archive.on('error', (error: Archiver.ArchiverError) => { throw new Error(`${error.name} ${error.code} ${error.message} ${error.path} ${error.stack}`); });
Enter fullscreen mode Exit fullscreen mode

Now we can connect the archiver to pipe data to the upload stream and append all the download streams to it

    await new Promise((resolve, reject) => {

        console.log('Starting upload');

        s3Upload.on('close', resolve);
        s3Upload.on('end', resolve);
        s3Upload.on('error', reject);

        archive.pipe(s3StreamUpload);
        s3DownloadStreams.forEach((streamDetails: S3DownloadStreamDetails) => archive.append(streamDetails.stream, { name: streamDetails.filename }));
        archive.finalize();
    }).catch((error: { code: string; message: string; data: string }) => { throw new Error(`${error.code} ${error.message} ${error.data}`); });
Enter fullscreen mode Exit fullscreen mode

Finally wait for the uploader to finish

    await s3Upload.promise();
Enter fullscreen mode Exit fullscreen mode

and you're done.

I've tested this with +10GB archives and it works like a charm. I hope this has helped you out.

Latest comments (36)

Collapse
 
ukiyo profile image
Sunandini

Thanks for the post

How can I zip a folder which contains folders and files on s3?

Collapse
 
yufeikang profile image
kang

Hello,

I recently came across your blog post on using lambda to zip S3 files, and I wanted to thank you for sharing such a helpful resource! While testing out the example code, I noticed a few typos, so I took the liberty of fixing them and adapting the code to my needs. I'm happy to report that the lambda function works perfectly now and has saved me a lot of time and effort.

If anyone is interested, I've created a GitHub repository with my updated code that you can check out here: github.com/yufeikang/serverless-zi.... I hope this will be helpful to others who may be looking for a more reliable solution.

Thank you again for your excellent work!

Collapse
 
lukeecart profile image
Luke Cartwright

Is there a limit on the number of files that can be zipped?

Collapse
 
ratkod profile image
RatkoD

Can I get some help here. Since I'm not a programer myself, do I just need to add all of this in a .js file and apload it to Lambda or there is something more?

Collapse
 
damianobertuna profile image
Damiano Bertuna • Edited

Hi everybody,

we tried with the solution suggested here but we are facing the following problem.
Suppose we want to zip these files:

{
"files": [
    {
      "fileName": "File1_1GB.bin",
      "key": "File1_1GB.bin"
    },
    {
      "fileName": "File2_1GB.bin",
      "key": "File2_1GB.bin"
    },
    {
      "fileName": "File3_1GB.bin",
      "key": "File3_1GB.bin"
    },
    {
      "fileName": "File4_1GB.bin",
      "key": "File4_1GB.bin"
    },
    {
      "fileName": "File5_1GB.bin",
      "key": "File5_1GB.bin"
    },
],
  "bucketRegion": "REGION_NAME",
  "originBucketName": "BUCKET_NAME",
  "destBucketName": "DESTBUCKET",
  "zipName": "ZippedFiles.zip"
}
Enter fullscreen mode Exit fullscreen mode

In the ZippedFiles.zip created we have correctly 5 files but they are not of the correct size, like:

  • File1 1GB;
  • File2 1GB;
  • File3 1GB;
  • File4 34KB;
  • File5 34KB;

Our configuration is 15 minutes the timeout and 10GB the memory.

What can be the problem?

Thanks in advance.

Reagards.

Collapse
 
dakebusi profile image
dakebusi

Did you get an answer on how to fix your problem? I'm facing exactly the same issue.

Collapse
 
venkatkoushik_muthyapu_6 profile image
Venkat Koushik Muthyapu

Can we use this if we have folders in the folder that we zip?

Collapse
 
lj91421 profile image
lj91421

This has been really useful and straightforward to get working but i am having issues with unit testing.
Has anyone been able to write jest unit tests for this? I am trying to use AWSMock for AWS and i am struggling to get a test working at the moment

Collapse
 
kevnk profile image
Kevin Kirchner

I was really hoping to find a PHP version of this solution! I'm happy it's possible in node at least!

Collapse
 
prosonf profile image
Pedro Rosón Fdez

These handlers are wrong:

        s3Upload.on('close', resolve);
        s3Upload.on('end', resolve);
        s3Upload.on('error', reject);
Enter fullscreen mode Exit fullscreen mode

They have to be over streamPassThrough:

        streamPassThrough.on('close', resolve);
        streamPassThrough.on('end', resolve);
        streamPassThrough.on('error', reject);
Enter fullscreen mode Exit fullscreen mode
Collapse
 
vsmith profile image
Victor S'mith

Hello, it worked for me:

s3Upload.on('close', resolv());
s3Upload.on('end', resolve());
s3Upload.on('error', reject());

Collapse
 
etifontaine profile image
Etienne Fontaine

Hi! Thank you for your article.
Do you have a benchmark for this ? Like how long does it takes to zip 100 files of 1Mo or 50 files of 2Mo for exemple ?

Thank you