DEV Community

Zip files on S3 with AWS Lambda and Node

Gordon Johnston on September 11, 2019

This post was updated 20 Sept 2022 to improve reliability with large numbers of files. Update the stream handling so streams are only opened to ...
Collapse
 
rokumatsumoto profile image
Samet Gunaydin • Edited

Thanks for the post!

Here are the typos.

Now for our array of keys/keys, we can iterate ofter/after it to create the S3StreamDetails objects

Now we can connect the archiver to pipe date/data to the upload stream and append all the download streams to it

With s3StreamUpload variable, you mean s3Upload?

Collapse
 
arautela profile image
arautela

Hi Samet,

I am getting an error which says "The request signature we calculated does not match the signature you provided. Check your key and signing method." when I execute await s3Upload.promise() in the end. Any help will be highly appreciated.

My code is below

var aws = require("aws-sdk");
const s3 = new aws.S3();
aws.config.update({
accessKeyId: 'my-access-key',
secretAccessKey: 'my-secret'
});

const _archiver = require('archiver');
var stream = require('stream');

const bucketName = 'myBucket';
const zipFileName = 'zipper.zip';

const streamPassThrough = new stream.PassThrough();
var params = {
ACL: 'private',
Body: streamPassThrough,
Bucket: bucketName,
ContentType: 'application/zip',
Key: zipFileName
};

//This returns us a stream.. consider it as a real pipe sending fluid to S3 bucket.. Don't forget it
const s3Upload = s3.upload(params,
(err, resp) => {
if (err) {
console.error('Got error creating stream to s3 ${ err.name } ${ err.message } ${ err.stack }');
throw err;
}
console.log(resp);
});

exports.handler = async (_req, _ctx, _cb) => {
var _keys = ['PDF/00CRO030.pdf', 'PDF/MM07200231.pdf'];

var _s3DownloadStreams = await Promise.all(_keys.map(_key => new Promise((_resolve, _reject) => {
    s3.getObject({ Bucket: bucketName, Key: _key }).promise()
        .then(_data => _resolve(
            { data: _data.Body, name: `${_key.split('/').pop()}` })
        );
}
))).catch(_err => { throw new Error(_err) } );

await new Promise((_resolve, _reject) => {
    // var _myStream = s3Upload(bucketName, zipFileName);       //Now we instantiate that pipe...
    var _archive = _archiver('zip');
    _archive.on('error', err => { throw new Error(err); });

    //Your promise gets resolved when the fluid stops running... so that's when you get to close and resolve
    s3Upload.on('close', _resolve());
    s3Upload.on('end', _resolve());
    s3Upload.on('error', _reject());

    _archive.pipe(streamPassThrough);           //Pass that pipe to _archive so it can push the fluid straigh down to S3 bucket
    _s3DownloadStreams.forEach(_itm => _archive.append(_itm.data, { name: _itm.name }));        //And then we start adding files to it
    _archive.finalize();                //Tell is, that's all we want to add. Then when it finishes, the promise will resolve in one of those events up there
}).catch(_err => {
    throw new Error(_err)
});
console.log(params);
await s3Upload.promise().catch(_err => {
    console.log('Error Is : ' + _err)
});
//_cb(null, {});        //Handle response back to server

};

Collapse
 
rokumatsumoto profile image
Samet Gunaydin

Did you double check your accessKeyId and secretAccessKey? There might be a space before or after your keys (access key or secret key).

Thread Thread
 
arautela profile image
arautela

Hi Samet, yes I checked and the credentials are correct. However, I am able to download the bytes array from Keys array using s3.getObject.

Is there something which I am missing?

Thread Thread
 
rokumatsumoto profile image
Samet Gunaydin

Could you compare your implementation with this?

github.com/rokumatsumoto/aws-node-...

Also please share your implementation with gist link (gist.github.com/)

Thread Thread
 
prosonf profile image
Pedro Rosón Fdez

The difference is this one I posted here dev.to/prosonf/comment/18d6d, the handlers on close, end and error.

Collapse
 
byvalueyogev profile image
byvalue-yogev • Edited

I've just finished implementing with the help of Samet <3, thank you both!

I had to change:

s3Upload.on('close', resolve);
s3Upload.on('end', resolve);
s3Upload.on('error', reject);
to:
s3Upload.on('close', resolve());
s3Upload.on('end', resolve());
s3Upload.on('error', reject());
if not the promise doesn't resolves and the code after:
await s3Upload.promise();
doesn't execute

Collapse
 
schmeitzde profile image
Benedikt Schmeitz

s3StreamUpload should be replaced by streamPassThrough.

Collapse
 
elgordino profile image
Gordon Johnston

Many thanks. Post updated!

Collapse
 
pavelloz profile image
Paweł Kowalski

Great post. I was just trying to do the same thing couple days back (ATM im keeping things in memory before uploading) and failed. I dont need more than ~500MB, but i believe streams are more efficient anyways - better safe than sorry.

BTW. When you operate on a lot of files, using keepAlive can help a lot - theburningmonk.com/2019/02/lambda-...

Also, importing only s3 client from SDK and bundling lambda with webpack makes its cold start much faster - theburningmonk.com/2019/03/just-ho...

Collapse
 
pavelloz profile image
Paweł Kowalski

Ah, i just noticed, that i have opposite case. I have a zip file that i want to extract, which seems to be a little bit different, because you have to stream zip file, but to upload a file to s3 you need a key, and thats a problem ;-)

Collapse
 
bonwon profile image
bonwon

That case is detailed at medium.com/@johnpaulhayes/how-extr....

Thread Thread
 
pavelloz profile image
Paweł Kowalski • Edited

Read the zip file from S3 using the Boto3 S3 resource Object into a BytesIO buffer object

Yeah, buffer.

Thats what ive got, i wanted to have streams to have possibility to support big files, not files that can fit into memory.

This guy is calling 500MB huge because thats the max temp size on lambda (which would be ok, but realistically, saving extracted files to tmp just to upload them to s3 is kind of wasteful and nobody should do that anyways), well, for me thats not huge at all, i was aiming at couple GBs for a good measure.

Also when he writes This method does not use up disk space and therefore is not limited by size. he is wrong. Limit is limited, and it would be far less than 3GB (max possible memory on lambda), depending on file types (binary/text) and their number.

Collapse
 
dimascrocco profile image
Dimas Crocco

Hello, thanks for sharing the solution. For those who want to understand what is going on under the hood or if you are facing issues (Timeout errors, memory issues, etc.) please, take some time to READ THIS excellent issue on github.
There you can find a client/server example to reproduce all this operation and also different directions to take down this kind of problem (zip generation/download on s3).

Collapse
 
prosonf profile image
Pedro Rosón Fdez

These handlers are wrong:

        s3Upload.on('close', resolve);
        s3Upload.on('end', resolve);
        s3Upload.on('error', reject);
Enter fullscreen mode Exit fullscreen mode

They have to be over streamPassThrough:

        streamPassThrough.on('close', resolve);
        streamPassThrough.on('end', resolve);
        streamPassThrough.on('error', reject);
Enter fullscreen mode Exit fullscreen mode
Collapse
 
vsmith profile image
Victor S'mith

Hello, it worked for me:

s3Upload.on('close', resolv());
s3Upload.on('end', resolve());
s3Upload.on('error', reject());

Collapse
 
rp21buzz profile image
Rakesh P • Edited

Hi All,

I know People need to work it out on their own, but still i am hoping this would save some time for others. Here's plain Nodejs Code in javascript. I have tested this out for around 1GB so far for my requirement. worked like a charm. :-)

// create readstreams for all the output files and store them
lReqFiles.forEach(function(tFileKey){
s3FileReadStreams.push({
"stream": UTIL_S3.S3.getObject({
Bucket: CFG.aws.s3.bucket,
Key: tFileKey
}).createReadStream(),
"filename": tFileKey
});
});
//
const Stream = STREAM.Stream;
const streamPassThrough = new Stream.PassThrough();
// Create a zip archive using streamPassThrough style for the linking request in s3bucket
outputFile = ${CFG.aws.s3.outputDir}/archives/${lReqId}.zip;
const params = {
ACL: 'private',
Body: streamPassThrough,
Bucket: CFG.aws.s3.bucket,
ContentType: 'application/zip',
Key: outputFile
};
const s3Upload = UTIL_S3.S3.upload(params, (err, resp) => {
if (err) {
console.error(Got error creating stream to s3 ${err.name} ${err.message} ${err.stack});
throw err;
}
console.log(resp);
}).on('httpUploadProgress', (progress) => {
console.log(progress); // { loaded: 4915, total: 192915, part: 1, key: 'foo.jpg' }
});
// create the archiver
const archive = Archiver('zip');
archive.on('error', (error) => {
throw new Error(${error.name} ${error.code} ${error.message} ${error.path} ${error.stack});
});
// connect the archiver to upload streamPassThrough and pipe all the download streams to it
await new Promise((resolve, reject) => {
console.log("Starting upload of the output Files Zip Archive");
//
s3Upload.on('close', resolve());
s3Upload.on('end', resolve());
s3Upload.on('error', reject());
//
archive.pipe(streamPassThrough);
s3FileReadStreams.forEach((s3FileDwnldStream) => {
archive.append(s3FileDwnldStream.stream, { name: s3FileDwnldStream.filename })
});
archive.finalize();
//
}).catch((error) => {
throw new Error(${error.code} ${error.message} ${error.data});
});
//
// Finally wait for the uploader to finish
await s3Upload.promise();
//

Collapse
 
rp21buzz profile image
Rakesh P

Adding this helped for a stable connection as @pawal-kowalski suggested

// setup sslAgent KeepAlive to true in AWS-SDK config for stable results
const AWS = require('aws-sdk');
const https = require('https');
const sslAgent = new https.Agent({
KeepAlive: true,
rejectUnauthorized: true
});
sslAgent.setMaxListeners(0);
AWS.config.update({
httpOptions: {
agent: sslAgent,
}
});
//

Collapse
 
jassenjj profile image
Yasen Velichkov • Edited

Hi Gordon,

do you have your code somewhere? Neither your steps, nor the code from stackoverflow.com/questions/386335... works for me and I am afraid it's beyond my knowledge to make it work in a Lambda.

Thanks!

Collapse
 
rokumatsumoto profile image
Samet Gunaydin

Hi Yasen,

github.com/rokumatsumoto/aws-node-...

I can help, if you have any questions.

Collapse
 
bimal1331 profile image
Bimal Kumar

This doesn't work if I have more than 2K small files(each file is size <100KB) on S3. I tried increasing lambda RAM to maximum, but upload doesn't start and lambda times out after 15 minutes. I'm guessing it's due to so many readable streams getting created or some issue with archiver package. Can you suggest something for this case?

Collapse
 
dimascrocco profile image
Dimas Crocco

please check this issue to understand what is going on and how to solve it

Collapse
 
williamlecable profile image
williamlecable

Hi everyone,
I have some problems with lamda. First, compilation is blocked by :
s3Upload.on('close', resolve());
s3Upload.on('end', resolve());
s3Upload.on('error', reject());

'"close"' is not assignable to parameter of type '"httpUploadProgress"'
'"end"' is not assignable to parameter of type '"httpUploadProgress"'
'"error"' is not assignable to parameter of type '"httpUploadProgress"'

And when I try to call the lambda CloudWatch throw this error :
{
"errorType": "Runtime.UnhandledPromiseRejection",
"errorMessage": "TypeError: archiver_1.Archiver is not a function",
"reason": {
"errorType": "TypeError",
"errorMessage": "archiver_1.Archiver is not a function",
"stack": [
"TypeError: archiver_1.Archiver is not a function",
" at Function.generatePackage (/var/task/dist/src/service/package.service.js:77:36)",
" at /var/task/dist/src/service/package.service.js:40:34",
" at Array.forEach ()",
" at /var/task/dist/src/service/package.service.js:39:17",
" at processTicksAndRejections (internal/process/task_queues.js:94:5)"
]
}
}

Did you have some advices in order to resolve that 2 errors ?
Thanks a lot.

Collapse
 
melitus profile image
Aroh Sunday

Import archive this way

import Archive from "archive"

With that, you will not get Archive is not a function again

Collapse
 
damianobertuna profile image
Damiano Bertuna • Edited

Hi everybody,

we tried with the solution suggested here but we are facing the following problem.
Suppose we want to zip these files:

{
"files": [
    {
      "fileName": "File1_1GB.bin",
      "key": "File1_1GB.bin"
    },
    {
      "fileName": "File2_1GB.bin",
      "key": "File2_1GB.bin"
    },
    {
      "fileName": "File3_1GB.bin",
      "key": "File3_1GB.bin"
    },
    {
      "fileName": "File4_1GB.bin",
      "key": "File4_1GB.bin"
    },
    {
      "fileName": "File5_1GB.bin",
      "key": "File5_1GB.bin"
    },
],
  "bucketRegion": "REGION_NAME",
  "originBucketName": "BUCKET_NAME",
  "destBucketName": "DESTBUCKET",
  "zipName": "ZippedFiles.zip"
}
Enter fullscreen mode Exit fullscreen mode

In the ZippedFiles.zip created we have correctly 5 files but they are not of the correct size, like:

  • File1 1GB;
  • File2 1GB;
  • File3 1GB;
  • File4 34KB;
  • File5 34KB;

Our configuration is 15 minutes the timeout and 10GB the memory.

What can be the problem?

Thanks in advance.

Reagards.

Collapse
 
dakebusi profile image
dakebusi

Did you get an answer on how to fix your problem? I'm facing exactly the same issue.

Collapse
 
lj91421 profile image
lj91421

This has been really useful and straightforward to get working but i am having issues with unit testing.
Has anyone been able to write jest unit tests for this? I am trying to use AWSMock for AWS and i am struggling to get a test working at the moment

Collapse
 
kevnk profile image
Kevin Kirchner

I was really hoping to find a PHP version of this solution! I'm happy it's possible in node at least!

Collapse
 
yufeikang profile image
kang

Hello,

I recently came across your blog post on using lambda to zip S3 files, and I wanted to thank you for sharing such a helpful resource! While testing out the example code, I noticed a few typos, so I took the liberty of fixing them and adapting the code to my needs. I'm happy to report that the lambda function works perfectly now and has saved me a lot of time and effort.

If anyone is interested, I've created a GitHub repository with my updated code that you can check out here: github.com/yufeikang/serverless-zi.... I hope this will be helpful to others who may be looking for a more reliable solution.

Thank you again for your excellent work!

Collapse
 
etifontaine profile image
Etienne Fontaine

Hi! Thank you for your article.
Do you have a benchmark for this ? Like how long does it takes to zip 100 files of 1Mo or 50 files of 2Mo for exemple ?

Thank you

Collapse
 
jsheflin profile image
jsheflin

How do I use this for an entire directory (I know it's not really a directory) with hundreds of files?

Collapse
 
ukiyo profile image
Sunandini

Thanks for the post

How can I zip a folder which contains folders and files on s3?

Collapse
 
lukeecart profile image
Luke Cartwright

Is there a limit on the number of files that can be zipped?

Collapse
 
ratkod profile image
RatkoD

Can I get some help here. Since I'm not a programer myself, do I just need to add all of this in a .js file and apload it to Lambda or there is something more?

Collapse
 
melitus profile image
Aroh Sunday • Edited

I tried zipping operation with archive module following the way you did but always get memory over usage error

Collapse
 
venkatkoushik_muthyapu_6 profile image
Venkat Koushik Muthyapu

Can we use this if we have folders in the folder that we zip?