DEV Community

Cover image for S3 Batch Operations-Copy large amount objects from one bucket to another
Chathra Serasinghe
Chathra Serasinghe

Posted on • Edited on

S3 Batch Operations-Copy large amount objects from one bucket to another

What can you do with S3 Batch operations?

You can perform a selected single operation(copy, replace tags...) on larger number of objects in a S3 bucket using a single request.

How does S3 batch operations work?

1.) Choose objects which you want to perform any operation
You can provide it using:
- Using inventory reports
- Using CSV
- S3 Replication configuration(can create a manifest)
(Note: in this case only operation will be replicate)
2.) Select an operation
- Copy
- Invoke Lambda function
- Replace all object tags
- Delete all object tags
- Replace access control list (ACL)
- Restore achieved objects
- Enable Object lock
- Enable legal hold (same protection as a retention period, but it has no expiration date)
- Replicate( this option is only available when you choose objects using S3 replication configuration based manifest)

3.) Run, View Progress and get reports

Once you run the job you will be able to see the progress and then the reports will be generated in the s3 bucket you provided for reporting. You will be able ask s3 batch operations to only generate reports for all objects or only failed objects(my preference for most of the cases).

What options you can use to copy objects from bucket to another bucket?

  • Amazon S3 batch operations
  • AWS SDK
  • cross-Region replication or same-Region replication
  • S3DistCp with Amazon EMR
  • AWS DataSync
  • Run parallel uploads using the AWS Command Line Interface (AWS CLI)

Too many options, What to choose?

AWS CLI ---> Not efficient when transferring large number of objects.

According to AWS,

custom application using an AWS SDK might be more efficient
at performing a transfer at the scale of hundreds of millions
of objects than AWS CLI

  • S3DistCp with Amazon EMR ---> Too expensive

  • AWS Data Sync ---> Still expensive and if you have files with special characters you may encounter some issues.(I have an faced issue - invalid UTF8 characters in file name)
    Refer the link below:
    https://forums.aws.amazon.com/thread.jspa?threadID=337210

  • Cross-Region replication or same-Region replication--->
    It replicates new objects and changes to existing objects. If you need to transfer existing objects then its not ideal.

  • AWS SDK --> Extremely Powerful when uploading larger size objects but you may need to design the application to scale.

  • AWS Batch operations ---> It is extremely powerful to do a task on multiple objects with a single request. However, if you use the copy operation of S3 batch operations alone, there are some limitations, such as the size of the objects to be transferred, which can be up to 5 GB.

How about combining of S3 batch operations+ SDK(Invoke lambda)?

Yes. This is a great choice(at least to me). Lambda provides you flexibility to handle your things in your own way(customizations) and also provides opportunity to use powerful SDK which can copy larger files than 5GB.Then S3 batch operations provide you more convenience.

In simple terms, you are going to use S3 batch operations job using Invoke Lambda function operation.

What do you need to create?
1) A role for Lambda
2) A role for S3 batch operations
3) Lambda function(using Python Boto3)

Pre-requisites:

  • Source bucket
  • Destination bucket
  • Manifest file and a bucket to place it. If you are using CSV file, you will need to make sure following.
    • Note:- Object keys must be URL-encoded
  • Bucket(s) for manifest and S3 batch operation completion reports

You can use my solution to generate csv manifest files and create s3 batch operations jobs.

Please clone my GIT repository and refer README.MD file for detailed steps. code

Note:- This code was motivated by the blog below as well as my own experience with S3 batch operations.

References:
https://aws.amazon.com/blogs/storage/copying-objects-greater-than-5-gb-with-amazon-s3-batch-operations/

Top comments (0)