DEV Community

Cover image for AWS S3 Iterator in TypeScript
Dina
Dina

Posted on

1 1

AWS S3 Iterator in TypeScript

Amazon Simple Storage Service (Amazon S3) is a very popular object storage service that offers industry-leading scalability, data availability, security, and performance. It allows you to virtually store unlimited number of files (objects). AWS provides JavaScript SDK to interact with S3 APIs. You can use aws-sdk to do variety of things like getting the list of objects in an S3 bucket, upload an object, download the object etc.

While working with it recently, I figured it would be handy to use the Iterator pattern to fetch files from s3 in a loop using for await...of. So came up with a basic implementation below. Please provide your feedback

import * as AWS from 'aws-sdk';
import {GetObjectRequest, ListObjectsV2Request} from 'aws-sdk/clients/s3';
// basic aws configuration stuff
AWS.config.update({
region: 'ap-southeast-2',
});
// using this interface to store all the keys
interface S3Key {
key: string;
etag: string;
}
class S3FileIterator {
private s3 = new AWS.S3({apiVersion: '2006-03-01'});
maxKeys = 1000; // Sets the maximum number of keys returned in the response. By default the API returns up to 1,000 key names.
bucketName: string;
prefix: string;
keysList: S3Key[] = []; // state to store list of keys that came back for the object key prefix
constructor(bucketName: string, prefix: string) {
this.bucketName = bucketName;
this.prefix = prefix;
}
/**
* This loads all the keys matching the prefix
* Should be called after object is created and before iterating
*/
async loadKeys() {
let listObjectParams: ListObjectsV2Request = {
Bucket: this.bucketName,
MaxKeys: this.maxKeys,
Prefix: this.prefix,
};
let isTruncated: boolean | undefined = true;
let continuationToken: string | undefined = '';
while (isTruncated) {
if (continuationToken) {
listObjectParams = {
...listObjectParams,
ContinuationToken: continuationToken,
};
}
const data = await this.s3.listObjectsV2(listObjectParams).promise();
if (data.Contents) {
data.Contents.forEach(item => {
if (item.Key && item.ETag) {
this.keysList.push({etag: item.ETag, key: item.Key});
}
});
isTruncated = data.IsTruncated;
continuationToken = data.NextContinuationToken;
}
}
return this.keysList;
}
/**
* The Symbol.asyncIterator well-known symbol specifies the default AsyncIterator
* for an object. If this property is set on an object, it is
* an async iterable and can be used in a for await...of loop.
*/
[Symbol.asyncIterator]() {
let index = 0;
return {
next: async () => {
if (index < this.keysList.length) {
const getObjectParams: GetObjectRequest = {
Bucket: this.bucketName,
Key: this.keysList[index].key,
};
const object = await this.s3.getObject(getObjectParams).promise();
index += 1;
return {
value: object,
done: false,
};
}
return {
done: true,
};
},
};
}
}
// instantiate the object
const myS3Files = new S3FileIterator('bucket-name', 'filePrefix');
async function main() {
// load all keys
// TODO: load only as required (anyone got good ideas?)
await myS3Files.loadKeys();
// best part: iterate through the filtered array of s3 objects
for await (const s3File of myS3Files) {
console.log(s3File?.ETag);
}
}
main();
view raw s3-iterator.ts hosted with ❤ by GitHub

Downside of this pattern or the code above is that only 1 file is processed at a time as we are awaiting the download of the file and processing it before moving to the next iteration. We could use Promise.all with regular loops. Do you guys think its doable in this pattern?

One improvement I definitely want to see is to only load next set of keys when we are running out of objects in the keysList instead of loading all (currently using loadKeys() function). Can someone fork and build that out? 😉 ciao!!

Image of Datadog

The Future of AI, LLMs, and Observability on Google Cloud

Datadog sat down with Google’s Director of AI to discuss the current and future states of AI, ML, and LLMs on Google Cloud. Discover 7 key insights for technical leaders, covering everything from upskilling teams to observability best practices

Learn More

Top comments (0)

Billboard image

Deploy and scale your apps on AWS and GCP with a world class developer experience

Coherence makes it easy to set up and maintain cloud infrastructure. Harness the extensibility, compliance and cost efficiency of the cloud.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay