DEV Community

Lorenzo Hidalgo Gadea for AWS Community Builders

Posted on • Originally published at lhidalgo.dev on

Mastering DynamoDB: Batch Operations Explained

TL;DR; This article covers the usage of DynamoDB BatchWrite and BatchGet operations, and how implementing them can help you improve the efficiency by reducing the amount of requests needed in your workload.

Introduction

Have you ever developed any type of workload that interacts with DynamoDB?

If so, you probably have encountered the requirement of retrieving or inserting multiple specific records, be it from a single or various DynamoDB tables.

This article aims to provide you with it by providing all the required resources and knowledge to implement the usage of DynamoDB batch operations and, as a bonus point, increase the efficiency of your current workloads.

What are Batch Operations?

Introduction

When talking about batch operations or batch processing we refer to the action of aggregating a set of instructions in a single request for them to be executed all at once. In terms of interacting with DynamoDB, we could see it as sending a single request that would allow us to retrieve or insert multiple records at once.

Common Bad practices

Continuing with the sample situation mentioned in the introduction, you may face the requirement of having to retrieve or store multiple records at once.

Code snippet with a

For that scenario, most junior developers might rely on looping over a set of keys and sending the GetItem requests in sequence or a mid-level developer might propose to parallelize all those requests using for example a Promise.all, but both approaches are flawed and wont scale well.

On one side, the for-loop will even be detected by some linters (with rules like no-await-in-loop) as this implementation would increase the execution time exponentially.

On the other side, the Promise.all approach will be a tad more efficient by parallelizing the requests, but with high workloads, developers would end up facing issues like the maximum connection limit reached error.

Recommended Implementation

Now that we have gone over some bad practices in implementing it and that you have probably thought of a few projects that could be improved, well dive into how we can take the most advantage of it.

DynamoDB offers two different types of operations BatchGetItem and BatchWrtieItem which we will take a look into as part of this article.

There is also BatchExecuteStatement for those using PartiQL, but we will leave that one for a future article to cover PartiQL in detail.

BatchGetItem

This operation type will allow us to aggregate up to the equivalent of 100 GetItem requests in a single request.

Code snippet showing a  raw `BatchGetCommand` endraw  function for fetching items from two tables,

Meaning that with this operation we could retrieve up to 100 records or 16 MB from a single or multiple table at once.

BatchWriteItem

💡PutRequests will overwrite any existing records with the provided keys.

This operation, even if it only contains write as part of its name, will allow us to aggregate up to 25 PutItem and DeleteItem operations in a single request.

Screenshot of a JavaScript code snippet for a  raw `BatchWriteCommand` endraw . It includes request items for two tables,

Similar to the previous option, well still be limited by the 16 MB maximum, but we would theoretically be able to replace 25 sequential or parallel requests with a single one.

Pagination for Batch operations

Pagination is only valid for the 16 MB limit if the requests dont follow the 100 record read or the 25 record write limit DynamoDB will throw a ValidationException instead.

Similar to the Scan and Query operations, using any of the above Batch*Item operations can incur in the scenario where the 16 MB maximum is reached and some type of pagination is required.

Screenshot of a JavaScript code snippet defining an asynchronous function named  raw `executeRequest` endraw . It uses a try-catch block to handle a  raw `payload` endraw , checking for  raw `UnprocessedItems` endraw . If any, it recursively calls itself with a  raw `BatchWriteItemCommand` endraw . Errors are logged to the console.

For Batch* operations this comes in the form of the UnprocessedKeys attribute that can be part of the response.

Developers are expected to check for this attribute in the response and, if desired, implement its usage as a recursive function to process them automatically.

Full examples for Retrieving, Inserting, and Deleting records using BatchOperations with a recursive implementation to automatically handle the UnprocessedKeys can be found here.

Real-world Use Cases

Now that we are aware of all options and limitations regarding how we can process records in batch in DynamoDB, lets see some scenarios that will showcase some real-life improvements.

Scenario 1: Retrieving Data from Multi-table Design Architecture

For this first scenario, lets imagine we are looking to improve the performance of a REST API that, given an array of productId, will return us the list of desired product details with their respective stock and exact warehouse location. The data is stored in multiple tables, one for each data model (products, stock tracking, and warehouse product location).

Before

JavaScript code snippet that retrieves product, stock, and location data for a list of product IDs and returns them in an array.

The initial implementation was developed by having a for-loop to go over all the provided productIds and sequentially retrieve all the required data from the different tables.

After

From that initial implementation, you should be able to detect two distinct flaws:

  • no-await-in-loop - There is a loop with asynchronous operations inside, which is usually a bad practice, as all operations for a given operation will need to be completed before the next one can start.

  • Sequential await getItem requests - This is also a bad practice, as the three operations are independent from each other and wed ideally not want for them to be blocked by each other.

A better approach would look something like this:

A code snippet with four steps: 1) Checks if  raw `idList` endraw  has more than 33 items and throws an error if true. 2) Builds a payload with  raw `buildPayload(idList)` endraw . 3) Awaits a recursive batch get with  raw `recursiveBatchGet(payload)` endraw . 4) Maps the responses to products with  raw `mapResponse(batchGetResponses)` endraw  and returns them.

  1. Input Validation - Set a limit of maximum items to be requested to avoid requiring parallel BatchGetItem requests.

    For example - max. 100 items per BatchGetItem request and every product requires 3 GetItem requests means that a single BatchGetItem request can retrieve up to 33 product details.

  2. Build Payloads - a helper function will be needed to programmatically build the required payload for the BatchGetItem operations taking into consideration the different tables that need to be accessed for each product ID.

  3. Recursive BatchGetItem - a helper function that recursively calls itself to ensure that all UnprocessedKeys are retried.

  4. Response parsing - a helper function that transforms the BatchGetItem response to the given schema that the consumers are expecting for this API

Applying all these changes should significantly increase the efficiency and performance of the API.

Scenario 2: Inserting Data in a Single-table Design Architecture

The second scenario would imply a DynamoDB single table design architecture where we have a single table to store all the information needed for a Dashboard to analyze racehorses historical data. Records such as basic horse information, performance statistics, and race results are stored in the same table.

Before

Code snippet for storing horse details, statistics, and race information using the  raw `putItem` endraw  function in an asynchronous manner.

Similar to the first scenario, we can see that the initial implementation is based on a set of sequential PutItem requests.

After

From that initial implementation, you should be able to detect two distinct flaws:

  • no-await-in-loop - There is a loop with asynchronous operations inside, which is usually a bad practice, as all operations for a given operation will need to be completed before the next one can start.

  • Sequential await putItem requests - This is also a bad practice, as the three operations are independent from each other and wed ideally not want for them to be blocked by each other.

A better approach would look something like this:

Code snippet showing two steps: 1. Building a payload with the function  raw `buildPayload` endraw  using parameters  raw `horse` endraw ,  raw `stats` endraw , and  raw `races` endraw .2. Performing a recursive batch write with the function  raw `recursiveBatchWrite` endraw , using the payload.

  1. Build Payloads - a helper function will be needed to programmatically build the required payload for the BatchGetItem operations taking into consideration the different tables that need to be accessed for each product ID.

  2. Recursive BatchWriteItem - a helper function that recursively calls itself to ensure that all UnprocessedKeys are retried.

Applying all these changes should significantly reduce the required time to upload all information.

Conclusion

Utilizing batch operations in DynamoDB is a powerful strategy to optimize your database interactions. By aggregating multiple requests into a single operation, you can improve performance, reduce latency, and manage resources more effectively. Whether you're dealing with multi-table architectures or single-table designs, batch operations offer a scalable solution to handle large volumes of data efficiently. As you continue to work with DynamoDB, consider integrating batch operations into your workflows to maximize the potential of your applications.

Recap of key points

  • BatchGetItem can retrieve up to 100 records or 16 MB of data in a single request.

  • BatchWriteItem can be used to insert or delete up to 25 records or 16 MB of data in a single request.

  • Using Batch* operations can help you reduce the execution time considerably by aggregating requests that were currently being done in sequence.

Additional resources and references

Top comments (0)