Hey wonderful readers and knowledge pursuers!
Today I want to talk about DynamoDB and its composite primary and sort keys. I am 100% aware that there is a huge amount of well written articles and on top of that great and verbose documentation for AWS DynamoDB, that I am using every day. But also we all know that there are situation that are not always described in the books and docs.
With all that said, here I am telling a short story about one of these cases.
Once upon a time a team of architects designed a software architecture, created according to a well written and coordinated documentation. But a third party player missed to mention a specific key factor and the blue sky gathered clouds.
OK! Enough mystery tales!
The design included use of DynamoDB and one of the tasks was to create a table in DynamoDB. The design included primary partition composite key and a sort key, and also very smart separation - every data adjacent to the main data object was pushed into the Dynamo table as a separate record, in the code base use Dynamo Toolbox, Schema Validators etc, etc... all goodies available out there.
All done! All set! Start....!
After a while.....
The skipped key factor from the third party, came into a play and was forcing us to change the composite sort key. Changing the sort key programatically in the schema used with Dynamo Toolbox, was a one line easy change. The problem and issue was, that the existing data after we change the sort key would continue to exists with the old sort key and any update operation on the records would fail miserably. We had to find a way to update the sort key on the existing data.
Now! The real story begins....
Obstacles: Primary partition keys and sort keys are immutable in DynamoDB - you can not just run a trivial update and update composite sort key.
Understanding the core components of DynamoDB - Link to AWS documentation
Solution: The solution we worked out was:
- Get the items with the old sort key
- Delete the items with the old sort key
- Re-create the items with the new sort key
Delete and Put must run all in one transaction
Now the question how to implement:
We decided that best is to run one-time script to fix the discrepancy.
There are two options:create bash script and use AWS CLI commands
or use AWS SDK
We have decided to use AWS SDK for JavaScript v3.
And there we go:
Looking at the process , the first thing to do is to extract the records with the old sort key. You can not query, simply because you do not know the primary key of the affected records, so you need to scan the whole table:
import { DynamoDBClient, ScanCommand } from "@aws-sdk/client-dynamodb";
import { DynamoDBDocumentClient } from "@aws-sdk/lib-dynamodb";
const client = new DynamoDBClient();
const ddbDocClient = DynamoDBDocumentClient.from(client)
const input = {
TableName = "your table name",
FilterExpression: "begins_with(#sk, :sk_portion_you_know)",
ExpressionAttributeNames: {
#sk: "sk"
},
ExpressionAttributeValues: {
:sk_portion_you_know: {
S: "The Portion You know and it is static"
},
}
const response = await client.send(new ScanCommand(input))
I know , I know, do not panic. Scan is huge, consumes a lot of reads, but done once it is not an overkill.
Here is the pricing from the AWS DynamoDB documentation link to DynamoDB pricing. Briefly (depends on a region) for around 2 million records scan would cost around just over a $1.00.
So far so good.
Anything I am missing ?
Yes! And there it is from AWS documentation link to AWS documentation.
A single Scan operation first reads up to the maximum number of items set (if using the Limit parameter) or a maximum of 1 MB of data and then applies any filtering to the results if a FilterExpression is provided.
With an operation that scans the whole production table, the payload would exceed 1Mb.
So the next important bit, is to implement pagination.
To leverage pagination in DynamoDB you need to add few important params in your input.
ExclusiveStartKey, LastEvaluatedKey, Limit. Ok so lets see how would our input would look like:
const input = {
TableName = "your table name",
FilterExpression: "begins_with(#sk, :sk_portion_you_know)",
Limit: 100,
ExpressionAttributeNames: {
#sk: "sk"
},
ExpressionAttributeValues: {
:sk_portion_you_know: {
S: "The Portion You know and it is static"
},
ExclusiveStartKey: lastEvaluatedKey
}
Now that the input is all set, how we would achieve the pagination.
First lets briefly (it is a complex and a long topic) explain what is a pagination in DynamoDB: Pagination is the process of sending subsequent requests to continue when a previous request is incomplete. A Query or Scan operation in DynamoDB might return results that are incomplete and require subsequent requests to get the entire result set.
The Limit parameter controls the maximum number of items returned per page, optimising throughput and reducing resource consumption. If the value of LastEvaluatedKey is undefined, the initial set of items will be returned according to the specified limit. However, if a valid value is provided, the query will return a set of items beginning from the value of LastEvaluatedKey, which contains the primary key of the last evaluated item.
Examples of how the response from scan is constructed and where the LastEvaluatedKey is placed.
As I love to say this can be implemented in a million and one ways, but simply this could also work:
The main thing to take out is that you need to query/scan till you have a LastEvaluatedKey.
So if you are using typescript like me: you can simple do a do while loop.
do {
command = await client.send(new ScanCommand(input))
lastEvaluatedKey = command.LastEvaluateKey
// update your input's params
params.lastEvaluatedKey = lastEvaluatedKey
} while (lastEvaluatedKey)
Now your scan will have a pointer from which to start the read for the next bunch of data till there is a data to query/scan - LastEvaluatedKey.
And your input now would look like this:
const allRecords: Record<string, AttributeValue> = []
const transactions: Record<string, AttributeValue> = []
const input = {
TableName = "your table name",
FilterExpression: "begins_with(#sk, :sk_portion_you_know)",
Limit: 100,
ExpressionAttributeNames: {
#sk: "sk"
},
ExpressionAttributeValues: {
:sk_portion_you_know: {
S: "The Portion You know and it is static"
},
ExclusiveStartKey: lastEvaluatedKey
}
let data:ScanCommandOutput
try {
do {
data = await client.send(new ScanCommand(input))
lastEvaluatedKey = data.LastEvaluateKey
if (data.Items.length) {
allRecords.push(...data.Items)
// update your input's params
params.lastEvaluatedKey = lastEvaluatedKey
} while (lastEvaluatedKey);
if (allRecords.length) {
allRecords.forEach((item) => {
if (item.sk.S === 'your old key') {
transactions.push(item)
}
})
}
} catch(error) {
throw new Error('Error thrown during scan operation')
}
π© There are number of expressions you can use in FilterExpression to filter your search. We had to use begins_with. Link to good examples on how to use different condition expressions
Next lets dive into the last part - deal with changed Composite Sort Key.
We obviously can not updated it in terms of a trivial update, because composite primary keys are immutable as I already mentioned. We will delete and put back the same record with the new sort key. Lets do it!
async function transactWrite(records: Record<string, AttributeValue>[]) {
records.forEach(async (transaction) => {
// we are getting the records via ScanCommand marshalled - item: {S: 'field value'} ,
// the DynamoDBDocumentClient Put uses high level json - {item: value} and to put them we need to unmarshall
const unmarshalled = unmarshall(transaction);
const input = {
ConditionCheck: {
TableName: tableName,
ConditionExpression: 'attribute_exists(pk)',
ExpressionAttributeNames: {
'#pk': 'pk',
},
ExpressionAttributeValues: {
':pk': `${unmarshalled.pk}`,
},
Key: {
pk: `${unmarshalled.pk}`,
sk: `${unmarshalled.sk}`,
},
},
TransactItems: [
{
Delete: {
TableName: tableName,
Key: {
pk: `${unmarshalled.pk}`,
sk: `${unmarshalled.sk}`,
},
},
},
{
Put: {
TableName: tableName,
Item: {
...unmarshalled,
sk: `The new key to update with put`,
},
},
},
],
};
try {
const response = await docClient.send(new TransactWriteCommand(input));
console.log('Item transactWrite successfully.', response);
} catch (error) {
console.log('Error transactWrite:', error);
}
});
}
One very important note here - ScanCommand will return result marshalled like item: {S: 'field value'}. You must unmarshall() the record before using it in Put. Failed to do so , you will end up with Schema Validation Error - ValidationException The provided key element does not match the schema from AWS SDK simply because Put uses high level json schema to validate the input in this manner {"item": "value"}.
Documentation with example about marshalling and unmarshalling your records.
Having Delete and Put in one Transaction adds a confidence, simply because if something goes wrong , the transaction will not be completed and will rollback.
π€ Little advice:
β
It is also good to take a few extra steps to make sure you have the data you want to update in a separate file, to extract modified records in another file , so you can compare the changes along side with the same records, but before the transaction. Again there are multiple ways and different precaution steps to take to make sure everything goes smooth and as planned.
It is not the greatest experience, but it happens. Sometimes internal documentation not often enough updated and leading to design holes. Often , but no always when other parties involved, miscommunication is something that could go along, and many other reasons, you name it.
π‘ Take off:
The most important as a take off is that - anything is possible, anything could be fixed, there is always a way, and last but not least , as long as you have a great team to support you and back you up πͺ - here is the right place to mention that the smart and great design of DynamoDB made it possible, and the extensive and comprehensive guide of AWS SDK.
π A huge Thank you to AWS DynamoDB team and AWS SDK team. Without the extensive suite of functions from AWS SDK , without the verbose and amazing examples in the documentation for AWS SDK we would have been lost and stuck into a forest of trials and errors.
π Thank you to you my dear reader. Thank you for sticking with me to the end. I hope this could help other people in a similar or same situation.
Let's connect π€
LinkedIn
β Would love to hear from you β
Top comments (0)