Amazon's DynamoDB is promoted as a zero-maintenance, virtually unlimited throughput and scale* NoSQL database. Because of its' very low administrative overhead and serverless billing model, it has long been my data store of choice when developing cloud applications and services.
When working with any tool, it is important to understand the limits involved to be able to create a mental model that can be used to most effectively plan and implement the usage of that tool. It might seem easy to think that because DynamoDB bills itself as "unlimited throughput and scale" you do not need to think about limitations on write and read throughput. However, as we will explore in this post, there are actually very well defined limits that we need to take into account when implementing solutions with DynamoDB.
Partition limits
Amazon documents these partition level limits: 3,000 RCU/s and 1,000 WCU/s. Let's verify these limits and put them to the test. We want to understand what these limits are for an UpdateItem
command and if the limits are any different for direct PutItem
commands.
Test setup
CDK code can be found here: https://github.com/rogerchi/ddb-writes-model
We create Items of a predefined size by filling them with random binary data. We then set up lambda functions which will spam our DynamoDB Item with either 10,000 atomic updates, or 10,000 full puts. We set up a stream handler to capture the times of the first and last updates to get an idea of how much time it takes to complete these 10,000 operations.
UpdateItem
code
We first initialize an Item with a count of 0 and then run this lambda function against it.
async function updateItem(id: string) {
const params = {
TableName: TABLE_NAME,
Key: {
pk: { S: id },
sk: { S: 'ITEM' },
},
UpdateExpression: 'set #count = #count + :one',
ExpressionAttributeNames: {
'#count': 'count',
},
ExpressionAttributeValues: {
':one': { N: '1' },
},
};
let attempt = 0;
while (true) {
try {
const command = new UpdateItemCommand(params);
await client.send(command);
return;
} catch (error) {
attempt++;
console.error(`Update failed for item ${id}. Attempt ${attempt}.`);
}
}
}
interface Event {
id: string;
}
exports.handler = async (event: Event) => {
const id = event.id;
if (!id) {
throw new Error('Event does not contain an ID.');
}
const totalUpdates = 10000;
const commands = [];
for (let i = 0; i < totalUpdates; i++) {
commands.push(updateItem(id));
}
await Promise.all(commands);
console.log('All updates complete.');
};
UpdateItem
results (10,000 updates)
Item Size (bytes) | Total time (seconds) | Op/s | WCU/s |
---|---|---|---|
1000 | 10 | 1000 | 1000 |
2000 | 20 | 500 | 1000 |
4000 | 40 | 250 | 1000 |
As we can see, the results match the expected write throughput described in the documentation. Does the throughput change if we are using the PutItem
instead of the UpdateItem
command, since the underlying service doesn't need to retrieve an existing item to update it?
PutItem
code
async function putItem(id: string, itemBody: any, count: number) {
const updatedItem = { ...itemBody, count: { N: `${count}` } };
const params = {
TableName: TABLE_NAME,
Item: updatedItem,
};
let attempt = 0;
while (true) {
try {
const command = new PutItemCommand(params);
await client.send(command);
return;
} catch (error) {
attempt++;
console.error(`Put failed for item ${id}. Attempt ${attempt}.`, error);
}
}
}
interface Event {
id: string;
size: number;
}
exports.handler = async (event: Event) => {
const { id, size } = event;
const data = generateData(id, size);
const item = marshall({
pk: id,
sk: 'PUTS',
size,
data: Uint8Array.from(data),
});
const totalUpdates = 10000;
const commands = [];
for (let i = 0; i < totalUpdates; i++) {
commands.push(putItem(id, item, i));
}
await Promise.all(commands);
console.log('All updates complete.');
};
PutItem
results
Item Size (bytes) | Total time (seconds) | Op/s | WCU/s |
---|---|---|---|
1000 | 10 | 1000 | 1000 |
2000 | 20 | 500 | 1000 |
4000 | 40 | 250 | 1000 |
Our PutItem
results are identical to our UpdateItem
results, and conform with the idea that the maximum write throughput for an Item is 1,000 WCU/s.
TransactWriteItem
TransactWriteItem
costs two WCU per 1KB to write the Item to the table. One WCU to prepare the transaction, and one to commit the transaction. You might assume that for a 1KB Item, given what we learned above, we will be able to perform 500 transactions per second. Does that prove out?
Code
async function transactUpdateItem(id: string) {
const params = {
TableName: TABLE_NAME,
Key: {
pk: { S: id },
sk: { S: 'TRXS' },
},
UpdateExpression: 'set #count = #count + :one',
ExpressionAttributeNames: {
'#count': 'count',
},
ExpressionAttributeValues: {
':one': { N: '1' },
},
};
let attempt = 0;
while (true) {
try {
const command = new TransactWriteItemsCommand({
TransactItems: [{ Update: { ...params } }],
});
await client.send(command);
return;
} catch (error) {
attempt++;
console.error(`Update failed for item ${id}. Attempt ${attempt}.`);
}
}
}
interface Event {
id: string;
}
exports.handler = async (event: Event) => {
const id = event.id;
if (!id) {
throw new Error('Event does not contain an ID.');
}
const totalUpdates = 10000;
const commands = [];
for (let i = 0; i < totalUpdates; i++) {
commands.push(transactUpdateItem(id));
}
await Promise.all(commands);
console.log('All updates complete.');
};
TransactWriteItem
results
Item Size (bytes) | Total time (seconds) | Op/s | WCU/s |
---|---|---|---|
1000 | 60 | 167 | 333 |
2000 | 90 | 111 | 444 |
4000 | 130 | 77 | 615 |
8000 | 188 | 53 | 851 |
16000 | 370 | 27 | 864 |
We see from the TransactWriteItem
results that we cannot achieve the full 1000 WCU/s through transactions, and that it seems like there is some fundamental overhead so that smaller items fall far from the 1000 WCU/s throughput and it's not until the item sizes get beyond 8KB that we get closer (but not quite reaching) the 1000 WCU/s partition throughput.
Our mental model for DynamoDB Writes
So what's our foundational mental model for DynamoDB write limits? You need to understand that writes target a specific Item at all times. This Item can have up to 1000 WCUs of Put
or Update
actions performed on it. This means an item of up to 1KB of size can be written or updated 1,000 times a second. An item of 2KB of size can be written or updated 500 times a second, and an item of 200KB of size can only be written or updated 5 times a second. To be able to exceed these limits, you must shard your writes. But because of instant adaptive capacity, you can think of each Item as being its own separate partition, literally a whole set of infrastructure dedicated to that single Item, which is pretty darn amazing. It's why there's no other service that shards quite like DynamoDB, and why it is the ultimate multi-tenant service, a real example of the power of scale in the cloud.
The caveat is that write throughput is significantly decreased with introducing transactions, from around 167 operations per second for a 1KB item to 27 operations per second for a 16KB item.
About me
I am a Staff Engineer @ Veho with a passion for serverless.
Top comments (1)
Thank you, it was a very useful content.