1. Introduction
AWS S3 (Simple Storage Service) is an object storage service. It allows users to store and retrieve data from anywhere on the web. S3 is designed for businesses of all sizes and can store a virtually unlimited number of objects, including photos, videos, log files, backups, and other types of data.
S3 multipart upload is a feature that allows you to upload large objects in parts (i.e., chunks) instead of uploading the entire object in a single HTTP request. It is particularly useful when uploading very large files. With multipart upload, you can upload individual parts of the object in parallel, which can significantly speed up the overall upload process.
2. Implementation
2.1. Provide access and create S3 bucket
First, we need to provide access to AWS for the IDE, where we are implementing this example. So, if you don't know anyway, you can refer to my previous post, in the 2.1 section.
Next, we create a S3 bucket to store some files. I will name my bucket is multipart-uploading0924. You need named a difference name for your one. Because the S3 bucket name is unique in global.
2.2. Setup Spring boot project
In this demo, we will create an API that can upload files to S3 bucket. You can use the Spring Start Project in Eclipse or Spring Initializer to create a Spring Boot project. After that, we need to add this dependency to POM file.
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-s3</artifactId>
<version>1.12.424</version>
</dependency>
2.3. Create API endpoints using to upload file
I want to show you the better performance when using multipart upload feature of S3. So we will create 2 function, one for normal uploading and another for multipart uploading.
First, creating a S3 client that will interact with created bucket. You must change the name of the profile you already set up, and the region where you place the bucket in below code. In this case, longngo0924
is a profile that I will use to create S3 client and my bucket is placed in ap-southeast-2
region (Sydney)
private AmazonS3 getS3ClientInstance() {
if (s3client != null)
return s3client;
return AmazonS3ClientBuilder.standard().withCredentials(new ProfileCredentialsProvider("longngo0924"))
.withRegion(Regions.AP_SOUTHEAST_2).build();
}
We create a function in service layer with the name is uploadFileV1
. This function corresponding for normal upload type, and we write some code like this.
public Map<String, String> uploadFileV1(MultipartFile multipartFile) throws IllegalStateException, IOException {
Map<String, String> map = new HashMap<>();
s3client = getS3ClientInstance();
File file = convertToFile(multipartFile);
PutObjectRequest putObjectRequest = new PutObjectRequest(bucketName, file.getName(), file);
long start = System.currentTimeMillis();
PutObjectResult result = s3client.putObject(putObjectRequest);
long end = System.currentTimeMillis();
log.info("Complete Normal Uploading {}s", (end - start) / 1000);
if (result != null) {
map.put("fileSize", String.valueOf(multipartFile.getSize() / 1000000) + "MB");
map.put("time", String.valueOf((end - start) / 1000) + "s");
} else {
map.put("message", "Upload Failed");
}
return map;
}
In the above function, we receive a multipart file from controller layer. Then, we convert it to file and construct a put object request call to S3.
The second function will be named is uploadFileV2
. This function is similar with the first. The difference is we will upload file in parallel with multi thresh, every thresh upload a partial of our file. And the implemented code like this.
public Map<String, String> uploadFileV2(MultipartFile multipartFile)
throws IOException, AmazonServiceException, AmazonClientException, InterruptedException {
Map<String, String> map = new HashMap<>();
s3client = getS3ClientInstance();
File file = convertToFile(multipartFile);
TransferManager tm = TransferManagerBuilder.standard().withS3Client(s3client)
.withMultipartUploadThreshold((long) (50 * 1024 * 1025)).build();
long start = System.currentTimeMillis();
Upload result = tm.upload(bucketName, file.getName(), file);
result.waitForCompletion();
long end = System.currentTimeMillis();
log.info("Complete Multipart Uploading {}s", (end - start) / 1000);
map.put("fileSize", String.valueOf(multipartFile.getSize() / 1000000) + "MB");
map.put("time", String.valueOf((end - start) / 1000) + "s");
return map;
}
You can see our file being cut into piece with size is 50 MB. One point in here, the upload method of TransferManager
is non-blocking and returns immediately. So we need to use waitForCompletion
method to wait the response if we need.
Finally, we add some endpoints in controller layer to receive uploading request.
@PostMapping("/v1/uploading")
public Map<String, String> uploadFileV1(@RequestParam MultipartFile file)
throws IllegalStateException, IOException {
return uploadFileService.uploadFileV1(file);
}
@PostMapping("/v2/uploading")
public Map<String, String> uploadFileV2(@RequestParam MultipartFile file)
throws IllegalStateException, IOException,
AmazonServiceException, AmazonClientException, InterruptedException {
return uploadFileService.uploadFileV2(file);
}
2.4. Upload file to S3 bucket
For testing purpose, we use Postman to test implemented APIs. First, we will use the endpoint corresponding for normal upload function and upload a 200 MB size file
And for now, we test the endpoint for multipart upload
We can see with the same 200 MB file, the multipart upload function have the better performance than normal upload function. Speed is twice as fast in this example. And there are files we already uploaded by APIs.
3. Summary
Using S3 Multipart Uploading feature, the upload process will have better performance and this benefit is significant for an application need to upload large file regularly. Additionally, if any part of the upload fails, you only need to re-upload that specific part rather than the entire object. This can save time and bandwidth.
The implementation of all these examples can be found in my GitHub
Happy Coding :)
Top comments (1)
Hey, nicely put together. The same code when deployed into some staging environment does it still upload the file in 47s and 28s ? For me spring takes a significant amount of time to load the file into memory before it reaches the controller. Have you experienced the same ? Please let me know.