Jason Butz for AWS Community Builders

Posted on Nov 24 • Originally published at jasonbutz.info on Nov 24

New DynamoDB Key Feature & Why It Matters

#aws #nosql

AWS announced multi-attribute composite keys for DynamoDB global secondary indexes on November 19th. That's a lot of words, but what does it actually mean? It means it's easier to pull data out of DynamoDB in ways you hadn't planned initially, without a bunch of extra work. To fully explain why, I need to explain keys in DynamoDB. I'll assume you know that DynamoDB is a NoSQL database, that the "database" is called a table, and the table is a collection of items, which you can think of as records or rows.

Primary Key, Partition Key, Sort Key

You'll hear many names used for different keys in DynamoDB. It all makes sense, but it can be confusing. Every item in your table has a primary key, as you are probably used to with relational databases such as MySQL, PostgreSQL, Microsoft SQL Server, Oracle, and many others. That primary key is a unique value you can use to retrieve a particular item. In DynamoDB, the primary key can consist of either a single attribute or two attributes.

Your primary key always has a partition key (PK), sometimes called a hash key or hash attribute. The partition key is used by DynamoDB to distribute your items across partitions, enabling the performance for which DynamoDB is known. There are a whole lot of best practices for choosing a partition key, but you don't need to know them right now.

Your primary key might include a sort key (SK), sometimes called a range key or range attribute. If your primary key has a sort key, then it is known as a composite primary key. With a composite primary key, multiple items can have the same partition key. The combination of the partition key and sort key is what must be unique. The sort key determines how values returned are sorted when you query for the items with a given partition key.

Below is a screenshot using sample data AWS provides that shows an example table with only a partition key. In the example, the partition key is the user's username, LoginAlias, as shown in the screenshot. This allows you to easily retrieve a user's information if you know their login alias.

The following screenshot shows the same data, with the partition key as the user's first name and the sort key as their last name. This allows you to easily find users by name, for example, all users named Jane.

Get, Query, Scan

Before I get into key structures, I have to cover the three primary operations for reading data from DynamoDB. You can either use a GetItem action to read a single item, use the Query API to retrieve multiple items based on primary key values, or use the Scan operation.

For GetItem, you provide the entire primary key, and DynamoDB returns your item. It's very straightforward and very fast.

For a query, you must know the partition key value, but you can perform more complex comparisons for the sort key value, as well as for other attributes on the items. Using the screenshot above as an example, you could query for all users with the first name "Jane" and a last name that begins with "R", or for all users with the first name "Jane" who have "software" as a skill. Query operations are generally very fast, thanks to knowing exactly which partition to access for the data.

Scan operations don't require you to know the primary key, but they are much slower and far less efficient than other operations. You can apply filter expressions to limit the data returned, but this happens after the data is read. A scan operation reads out 1MB of data from your table, then applies any filter expression. If that doesn't return your results, you will need to paginate and scan the next batch of data. In effect, you are reading your entire table to get the data you want. You should avoid scans whenever possible.

After reviewing these operations, you can see why we want to focus on GetItem and query operations, but both require us to know the partition key. Now you can see why key structure matters. There is one more topic to cover before we really focus on key structures: indexes.

Secondary Indexes

DynamoDB has a feature called secondary indexes. They are effectively copies of your table that you read from the same way, but organized differently. There are two types: global secondary indexes (GSI) and local secondary indexes (LSI). A local secondary index uses the same partition key as your base table, but you can configure a different sort key. A global secondary index can have a different partition key and sort key from your base table.

There are additional considerations with secondary indexes, but those are outside the scope of this discussion.

Key Structures

You've seen how important the partition key is for your data. A standard line you used to hear from AWS about DynamoDB performance was the importance of "well-structured queries". If you don't know your partition key, your query is not well-structured. GSIs let you define new partition keys and offer a lot of flexibility, but you still need to know the partition key for each GSI.

The importance of your partition key is why you need to understand your data and data access patterns before developing your DynamoDB table. If you will always know your user's username once they log in, then it may be a good partition key to use. If you have a multi-tenant application and will always know the tenant ID associated with a user, then that may be a good partition key.

Sort keys are where things get interesting. Using someone's last name as a sort key works for a basic example, but real applications are rarely that simple. Perhaps you are tracking details and audit records for different pieces of equipment. That's two different kinds of data about your equipment, in the same table. Your equipment ID could serve as your partition key, and you could use "details" as your sort key for the equipment information. For your audit items, you can use version numbers or timestamps as part of your sort key to make them sortable, something like audit_v1 or audit_v20251121T121314Z. This lets you keep track of your past audits while also being able to retrieve the most recent one by sorting values. You do need to be careful of possible issues when sorting numbers as strings. For example, audit_v10 will come after audit_v1 but before audit_v2.

With complex use cases, you may end up using a composite sort key that isn't a single value but instead multiple values concatenated. AWS's documentation uses the following as an example for a table listing geographical data.

[country]#[region]#[state]#[county]#[city]#[neighborhood]

This kind of structure lets you use begins_with, between, and greater than/less than operators to retrieve related groups of data. You can use your partition key and query for all data related to King County, Washington (where Amazon's HQ is) by querying for a sort key beginning with usa#northwest#wa#king# (USA for the country, Northwest for the region, WA meaning Washington state, and King meaning King County). Note that I included the # at the end of the value, which prevents us from picking up extra values due to partial matches.

By combining these ideas, you can have multiple formats for sort keys in the same table. I had one application with a single DynamoDB table containing 12-15 different item types. We had numerous organizations in the same table, with their organization ID as the partition key and sort keys such as organization, user#[user guid]#detail, user#[user guid]#permissions, and more.

The downside to these composite sort keys is that you have to build them yourself when you create your items, setting them as an attribute. AWS doesn't do it for you. Now imagine you're adding a new GSI and want a new composite sort key. Now you have to update every item in your table with the correct value for that sort key. It becomes very challenging; hence, you generally want an in-depth understanding of how your data will be accessed before defining the key structures for your DynamoDB tables.

Enter Multi-Attribute Key Schemas

This is where the AWS announcement comes in. Multi-attribute key schemas are a new feature of GSIs. Instead of needing to manually concatenate and backfill values, the DynamoDB service can do that for you. Instead of writing a process to iterate over your table and update every item with your new attribute for a new partition key and sort key in a new GSI, you tell AWS which attributes you want used in the keys and in what order, and AWS handles it. AWS doesn't actually add a new attribute, but instead lets you retrieve items based on multiple attributes.

AWS uses TOURNAMENT#WINTER2024#REGION#NA-EAST as an example composite key, representing scores from a game tournament. Instead, you specify up to four attributes for your partition key and up to four for your sort key. In this example, tournamentId and region are the partition key. When reading data from the GSI, instead of referencing an attribute you defined with a composite value, you specify the attributes as part of your query. This keeps your data looking cleaner and saves you a headache.

This means instead of defining an item like the code below, with composite keys as attributes.

const item = {
  matchId: 'match-001',
  tournamentId: 'WINTER2024',
  region: 'NA-EAST',
  round: 'SEMIFINALS',
  bracket: 'UPPER',
  player1Id: '101',
  // Synthetic keys needed for GSI
  GSI_PK: `TOURNAMENT#${tournamentId}#REGION#${region}`,
  GSI_SK: `${round}#${bracket}#${matchId}`,
};

You define the attributes on your GSI and use your item as-is, without extra concatenation.

const item = {
  matchId: 'match-001',
  tournamentId: 'WINTER2024',
  region: 'NA-EAST',
  round: 'SEMIFINALS',
  bracket: 'UPPER',
  player1Id: '101',
  matchDate: '2024-01-18',
};

You are limited to four attributes in the partition key and four attributes in the sort key. This applies only to global secondary indexes (GSIs). It is not available for your base table or local secondary indexes (LSIs).

If you want to learn more about what this means for key structures, AWS added a page to their documentation covering multi-attribute key patterns.

Wrapping Up

I'm excited about multi-attribute key schemas because they make DynamoDB easier to use and more flexible without requiring complex backfill work. I've always had to caution people about DynamoDB because of the importance of well-designed key structures. They're still important, but adding new GSIs with different key structures is easier now, and that's a significant improvement.

DEV Community