This is my second blog under Master Dynamo Series. In my previous blog, we have explored what DynamoDB is and how it can be effectively leveraged for building Scalable and High-Performance Applications. In this blog, we will be diving further deep into the scalability and high performance aspect by utilizing batch operations.
Imagine wrestling with DynamoDB intricacies while navigating the world of AWS Lambda functions. You're trying to handle a massive load of data, but there's a catch – you've got to cram in a ton of read and write operations within a tight 15-minute window. It's like tiptoeing on the edge, risking a mission failure if you hit that 15-minute limit before your 100k+ records are dealt with.
We've all been there in the chaos of serverless DynamoDB struggles. But fear not! Today, I'm unveiling the secret sauce that not only solves this puzzle but also cuts down your Lambda costs. Say hello to DynamoDB Batch Operations – the superhero that makes data handling in serverless setups not just possible but surprisingly easy and fun.
Join me on a journey to explore the magic of DynamoDB Batch Operations, where efficiency meets simplicity, turning your serverless data tasks into a breeze.
Table of Contents
- Introduction
- BatchGet Operation
- BatchWrite Operation
- Concurrent Calls and Parallelization
- Gotchas
- Best Practices
- Save Costs
- Conclusion
- References
Introduction
DynamoDB, a cornerstone of AWS's vast array of services, is a highly scalable and fully managed NoSQL database service. It is designed to deliver fast and predictable performance, making it an ideal choice for applications that require consistent, single-digit millisecond latency at any scale. Its flexible data model and reliable performance make it a popular choice for mobile, web, gaming, ad tech, IoT, and many other applications that need to operate at large scale.
DynamoDB plays a vital role in our infrastructure, and optimising its operations is essential for maintaining cost-effectiveness and performance. Individual read and write operations can become bottlenecks, leading to increased Lambda runtime, costs and also sometimes hits 15 minutes hard timeout limit of lambda execution. Batch operations offer a solution by consolidating multiple requests into a single call, reducing the overhead associated with individual operations.
BatchGet Operation
BatchGet is a DynamoDB operation that allows fetching multiple items from one or more tables using a single call. This operation is particularly beneficial when there is a need to retrieve several items in parallel, minimizing the time it takes to complete multiple GetItem requests.
Benefits of BatchGet:
- Reduced request/response overhead
- Faster data retrieval for multiple items
- Improved parallelization
Example: Retrieving multiple items with BatchGet in JavaScript
// Importing required modules from AWS SDK for JavaScript
import { DynamoDBClient, BatchGetCommand } from '@aws-sdk/client-dynamodb';
// Creating a new instance of DynamoDB client for interacting with the service.
// The region is specified where your DynamoDB resources reside.
const dynamoDbClient = new DynamoDBClient({ region: 'your_region' });
// Name of the DynamoDB table you want to interact with.
const tableName = 'your_table_name';
// Array of keys which you want to retrieve from the table.
// Each key is an object that represents the Primary Key (Partition key i.e., P.K and Sort key i.e., SK) of the item in your table.
const keysToGet = [{ PK: 'value1', SK: 'Abc' },
{ PK: 'value2', SK: 'Bcd' },
{ PK: 'value3', Sk: 'Def' }];
// Creating a new command representing a BatchGet operation.
// It allows retrieval of multiple items from one or more tables.
const command = new BatchGetCommand({
RequestItems: {
[tableName]: {
Keys: keysToGet, // Providing keys of the items to be retrieved
},
},
});
// Sending the command to DynamoDB service using the client.
// Response will contain the information about the items successfully retrieved.
const response = await dynamoDbClient.send(command);
// Printing the retrieved items on the console.
// "Responses" attribute of the response contains the items retrieved, indexed by table name.
console.log(response.Responses![tableName]);
This JavaScript code utilizes the AWS SDK to interact with Amazon's DynamoDB service. It starts by creating an instance of the DynamoDBClient and defines the table name from which data is to be fetched. It then outlines a set of primary keys (PartitionKey and SortKey) in keysToGet array for the items to retrieve from the DynamoDB table. The BatchGetCommand is created with these specified keys and passed to the send method of the DynamoDB client. The resultant response contains the requested items from DynamoDB, which are logged to the console.
BatchWrite Operation
BatchWrite is a powerful operation designed for putting new items or deleting existing items in a single call. It is not intended for updating existing items. For updating existing items, we recommend using the UpdateItem operation.
Advantages of BatchWrite:
- Reduced request/response overhead.
- Faster data writing for multiple items.
- Efficient handling of large-scale data changes.
Limitation of BatchWrite for Updates:
- BatchWrite can't be used to update existing items directly.
Simple Hack: To compensate for this limitation, we just need to use concurrent Update calls. Although it might not be as performant as Batch Operations. It still offers a better alternative to deal with delay introduced by single update operation.
Example: Writing multiple items with BatchWrite in JavaScript
// Creating an instance of DynamoDBClient with the desired region
const dynamoDbClient = new DynamoDBClient({ region: 'your_region' });
// Defining the name of the DynamoDB table you want to interact with
const tableName = 'your_table_name';
// Defining the array of items you want to write into the DynamoDB table
// Each item is an object, which includes primary keys (Partition Key i.e., PK and Sort Key i.e., SK) and other attributes
const itemsToWrite = [
{ PK: 'value1', SK: 'new_value1', attribute1: 'abc1'},
{ PK: 'value2', SK: 'new_value2', attribute1: 'abc2'},
{ PK: 'value3', SK: 'new_value3', attribute1: 'abc3'},
];
// Creating a BatchWriteCommand with the defined items
// The map function is used to format each item in the itemsToWrite array into an appropriate format for a PutRequest
const command = new BatchWriteCommand({
RequestItems: {
[tableName]: itemsToWrite.map((item) => ({
PutRequest: { Item: item },
})),
},
});
// Sending the command to the DynamoDB Client using the 'send' method and waiting for the response
const response = await dynamoDbClient.send(command);
// Outputting the result of the operation to the console
console.log(response);
This JavaScript code uses the AWS SDK to write data into Amazon's DynamoDB service. An instance of DynamoDBClient is created initially and the table name where the data should be written to is defined. An array itemsToWrite is declared with several objects each containing primary keys (PK and SK) and an additional attribute. A BatchWriteCommand is constructed with these items transformed into a format suitable for a PUT request. This command is then passed to the send method of the DynamoDB client. The response from this operation, which contains details of the write transaction, is logged to the console.
Concurrent Calls and Parallelization
To further enhance performance, consider making multiple BatchGet or BatchWrite calls concurrently. This approach takes advantage of parallelisation, allowing DynamoDB to process requests more efficiently. We can use bluebird npm package which provides an utility method map
for making concurrent calls. We can use this utility method to make concurrent batchGet and batchWrite requests. We can also implement our own similar concurrent execution logic in vanilla javascript. However, thats on you to decide how you would want to work it out.
Example: Implementing concurrent BatchGet calls in JavaScript using Bluebird.map
import { DynamoDBClient, BatchGetCommand } from '@aws-sdk/client-dynamodb'; // Import required AWS SDK modules
import BlueBird from "bluebird"; // import Bluebird for utilizing concurrency feature
// Create an instance of the AWS DynamoDB Client for a particular region
const dynamoDbClient = new DynamoDBClient({ region: 'your_region' });
const tableName = 'your_table_name'; // The table to fetch data from
// An array containing sets of primary keys and sort keys for fetching items from DynamoDB. Each set has upto 100 items.
const keysSets = [
[
{ PK: 'value1', SK: 'abc1' },
{ PK: 'value2', SK: 'abc2' },
// ... add more items as needed up to 100 items per set
{ PK: 'value100', SK: 'abc100' }
],
[
{ PK: 'value101', SK: 'abc101' },
{ PK: 'value102', SK: 'abc102' },
// upto 100 items
{ PK: 'value200', SK: 'abc200' }
],
//... add more sets as needed
];
// Function to make BatchGet requests to DynamoDB with provided keys
const batchGet = async (keysToGet: any[]) => {
const command = new BatchGetCommand({
RequestItems: {
[tableName]: {
Keys: keysToGet,
},
},
});
// Send the BatchGet request and retrive its response.
const response = await dynamoDbClient.send(command);
return response.Responses[tableName]; // Return fetched items
};
// This array stores final results of all batch get operations
const finalResult = []
// Loop through keysSet executing batchGet function concurrently for each set using Bluebird's map method
await BlueBird.map(
keysSets,
async (keysToGet) => {
const batchResult = await batchGet(keysToGet); // Perform BatchGet operation
finalResult.push(...batchResult); // Add fetched items to the final result
},
{ concurrency: 10 } // Limit concurrency to 10 - adjust as per your use-case
);
// Console log the final result after all batch get operations are done.
console.log(finalResult);
This JavaScript code uses the AWS SDK to read multiple items from an AWS DynamoDB table concurrently. Initially, it imports necessary modules and creates a DynamoDB client for a specific AWS region. It then specifies the table name and defines a nested array keysSets, which comprises primary keys (PK) and sort keys (SK). Following the AWS limitation, each sub-array holds up to 100 keys. A function batchGet is defined that uses BatchGetCommand to fetch data with the provided keys from the specified table. Then, using Bluebird's map method, the code performs concurrent batch get operations on the keySets array, with a controlled concurrency level of 10 requests at a time. The retrieved results for each operation are stored in the finalResult array, which is finally logged in the console.
Gotchas
Until now, we've delved into the potential of batch operations and concurrent executions against DynamoDB. However, there are a few pitfalls we must be mindful of to ensure seamless operations; otherwise, we risk encountering issues.
Let's explore key considerations when implementing Batch Operations:
Maximum Payload Size: DynamoDB imposes limits on the size of the payload for batch operations. The total size of the request payload, including all items, attribute names, and attribute values, must be within the allowed limits. Ensure that your batch operations comply with the following constraints:
- BatchGet Limit: The total size of all items retrieved in a BatchGet operation must not exceed 16 MB. For example, if you ask to retrieve 100 items, but each individual item is 300 KB in size, the system returns 52 items (so as not to exceed the 16 MB limit). It also returns an appropriate UnprocessedKeys value so you can get the next page of results. If desired, your application can include its own logic to assemble the pages of results into one dataset.
- BatchWrite Limit: The total size of all items written in a BatchWrite operation must not exceed 16 KB. While individual items can be up to 400 KB once stored, it's important to note that an item's representation might be greater than 400KB while being sent in DynamoDB's JSON format for the API call
Maximum Items in the Collection: In addition to payload size limitations, DynamoDB has constraints on the number of items processed in a single batch operation:
- BatchGet Limit: Each BatchGet request can retrieve up to 100 items. If you need to retrieve more items, you must make multiple requests.
- BatchWrite Limit: Each BatchWrite request can include up to 25 items. If you need to write more items, you must split the operation into multiple requests.
Unprocessed Items: Batch operations may return unprocessed items, especially in scenarios where the payload size or item count exceeds DynamoDB limits. It's crucial to handle unprocessed items by implementing retries or adjusting batch sizes to ensure all items are processed successfully.
Atomicity for BatchWrite: While BatchWrite is an atomic operation, meaning that all items in the request are processed together as a single unit, keep in mind that individual items within the batch may fail. DynamoDB will attempt to process all items, but if any item fails, the entire batch operation fails. Ensure proper error handling and retry mechanisms to address individual item failures.
Consistency: Batch operations do not provide the same level of consistency guarantees as single-item operations. When using BatchGet, retrieved items may not reflect the latest changes in the database, as the operation does not guarantee strongly consistent reads.
Limited Support for Conditional Writes: BatchWrite supports conditional writes on a per-item basis, but conditional expressions are limited compared to individual conditional writes. Be cautious when relying heavily on conditional writes within a batch operation.
Performance Considerations: While batch operations can improve performance, excessive use or misuse may lead to throttling or increased latency. Monitor DynamoDB performance metrics and adjust batch sizes accordingly to maintain optimal performance.
Partition Key Considerations: We know that DynamoDB internally stores the data across multiple partitions and if single batch happens to have same partition key for all records in the batch, it might lead to throttling as single partition in DynamoDb has max read and write capacity units allocated. It is always to suggested to choose partition key pattern which distributes data across multiple partition to take maximum benefit of distributed system.
Best Practices
Having delved into the potential pitfalls, it is advisable to follow these best practices when implementing BatchGet and BatchWrite operations:
- Batch operations have limits on the number of items or payload size; be aware of and plan for these limitations. Find your sweet spot based on analysis of your payload size and carefully crafting primary key to make sure the load is distributed across multiple partitions to extract best performance out of the distributed system. Monitor and adjust batch sizes based on DynamoDB performance and table characteristics.
- Handle errors and retries appropriately to ensure robustness. It is always recommended you perform bulk reads or writes in a loop using the exponential backoff algorithm so as to handle exceptions and to process the unprocessed items.
Save Costs
You may be wondering, "How do we truly save costs on read and write operations when the overall cost remains constant compared to individual requests?" You're correct in pointing that out. However, have you considered the substantial reduction in Lambda execution time achieved through the implementation of batch operations?
Let me illustrate this with a practical example from my professional experience. In scenarios where reading 80k records would typically consume over 12 minutes of Lambda execution time, the introduction of batching brought about a remarkable transformation. The same workload was efficiently processed in a little over 40 seconds. The impact on execution time alone is a compelling reason to embrace batch operations, contributing not only to cost savings but also significantly optimizing overall performance.
Conclusion
In the realm of serverless DynamoDB struggles, where orchestrating massive data loads within tight timeframes feels like walking on a tightrope, DynamoDB Batch Operations emerge as the unsung heroes. As we've uncovered the intricacies of BatchGet and BatchWrite operations, along with strategies for concurrent calls and parallelization, a new paradigm for efficient and cost-effective data handling in serverless architectures unfolds.
In conclusion, DynamoDB Batch Operations don't just solve a puzzle; they redefine the rules of the game. By seamlessly blending efficiency and simplicity, they empower you to conquer the challenges of serverless DynamoDB interactions, making data handling not just possible but downright enjoyable. As you embark on your journey with DynamoDB Batch Operations, may your serverless endeavors be swift, cost-effective, and devoid of unnecessary complexities.
If you liked this blog and found it helpful, do check out my other blogs on AWS:
- Lost in AWS CDK? Here's Your Map for Debugging Like a Pro
- AWS Step Functions: The Conductor of Your Microservices Orchestra
- Building Event-Driven Serverless Architecture with AWS EventBridge and Serverless Framework
See you until next time. Happy coding!.
Top comments (0)