dev_chiku

Posted on Nov 24, 2023

20 times faster data submission to DynamoDB by parallel processing with Rust.

#rust #dynamodb #parallel #programming

Introduction

Rust is claimed to be a fast language, and we compared the speed of data registration to DynamoDB between serial processing and multiple parallel algorithm processing using Rust.

DynamoDB

DynamoDB is officially claimed as follows.

Amazon DynamoDB has single-digit millisecond response times and can consistently deliver this performance for the most demanding application. To illustrate, on 2022 Amazon Prime Day, Amazon DynamoDB reliably handled 105.2 million requests/second across trillions of API calls with single-digit millisecond performance.

The above alone may seem like a blast, but there are the following limitations for both reading and writing
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ServiceQuotas.html#limits-api

There is a limit of 25 requests at a time for BatchWriteItem data submission and 1MB per call for Query data retrieval. This makes it extremely fast for small amounts of data, but it is not suitable for OLTP systems that handle large amounts of data.

When we selected the DynamoDB technology in the past, we wondered how fast it would be if we parallelized it. I had some doubts about the speed of DynamoDB when I was selecting the technology, so I decided to try it out this time because I had some time to spare.

Assumptions

This time, 25 data will be submitted by BatchWriteItem.
The data structure is minimal, with only the id column (primary key) and the sork key column.
The mode is pay-as-you-go, and result consistency (transactions) is not considered.

There are no shared variable references from each parallel process.
(No variable references with exclusive control locks)

Please note that performance verification is performed on processes that may not meet the requirements of actual operations, such as not considering transactions and not having exclusive control.

Parallel Processing

For parallel processing, the following processes are tested.

Channels
- Create channels based on the number of items and send a message for each chunk (25 items each). Receivers receive messages in aggregate and in parallel; the CSP (Communicating Sequential Processes) model corresponds to this.
Fork-Join
- A task is split (forked) into smaller subtasks and processed in parallel. This is used in Java's java.util.concurrent package.
Parallel Loops
- Executes each iteration of a loop in parallel, supported by parallel processing libraries such as OpenMP.
MapReduce
- The Reduce step integrates the results generated by the Map step to produce the final result; widely used in big data processing, such as Hadoop; supported by parallel processing libraries, such as OpenMP.
Worker/Master Pattern
- A master process manages tasks and a worker process executes these tasks.

Source.

https://github.com/chikugoy/dynamodb_client_rs/

Result

The following results were obtained for the data submissions. The average time (ms) for each of the 10 attempts to submit 1000 data items.

Processing	Execution Time (ms)
Serial processing	619
Channel parallel processing	71
Fork join parallel processing	38
Parallel loop processing	146
Worker/Master Pattern	46

If it is the fastest Fork join, it is about 20 times faster than serial processing. If it takes 38 ms for 1,000 records, it takes 370 ms for 10,000 records, and 38 seconds for 1 million records, which is much faster than RDB!

However, when we tried to submit 10,000 cases of data, we found that about 20% of the data was dropped.

It is possible to retrieve the dropped code and reexecute it as shown in the code below, but if there are unprocessed_items, it will be about 10 times slower than the serial one because it references the data to AWS again.

https://github.com/chikugoy/dynamodb_client_rs/blob/5cda81ed5af4c70ade090706124514313157547b/src/module/aggregate_result.rs#L36-L51

Summary

As a result, it seems that it is safer to use the series method when submitting a large amount of data in DynamoDB. However, it may be possible to control the number of parallel processes to be executed at the same time, or to adjust the number of data to be submitted in one parallel process, etc., and it may be possible to use parallel processing.

The parallel processing by Rust has achieved a speedup of up to 20 times faster than serial processing, which is a dream come true. However, since this was verified in a windless environment with no other processes using resources such as CPUs, it may be necessary to verify the performance by executing parallel processes in even greater parallelism in order to assume production operation. I think it will be necessary to verify the performance of the system in parallel.

Also, since this verification was not done with other languages under the same conditions, it cannot be said that parallel processing is faster because of Rust. However, I think that Rust is a language suitable for parallel processing because it is easy to safely perform exclusion control, as claimed in official documents and books.

DEV Community