This blog post will introduce and explore a way of documenting Access Patterns for DynamoDB in JSON format for easy updating and use in NodeJS lambdas. I haven't seen this structure used, but if it's out there and I missed it, I would love to talk about it!
As I write this on the last day of the 2024 I hope those reading will have (or have had) a happy new years!
I've been working with DynamoDB for years now. It seems there is always more to learn, always more to improve, when it comes to DynamoDB and using it to ever greater success. I can't give enough praise to people like Alex Debrie, Rick Houlihan, Khawaja Shams, and Pete Naylor for sharing their knowledge over the years through blog posts, re:invent presentations, books, chats, and in person conversations.
Anyone who has been introduced to DynamoDB through the awesome materials already out there probably knows about access pattens. Access patterns are the way we need to access the data and strongly influences how we structure the data and store it for both read and write operations. If you have seen any content out there on this process, you'll probably recognize spreadsheets of DynamoDB data modeling of those access patterns. I started out my DynamoDB journey using spreadsheets to model my data, but started moving towards JSON instead because it was easier for me to move between the code and the modeling. It also felt a lot easier to refactor the access patterns and move things around if it was in JSON format instead of spreadsheets. I am not sharing this JSON style modeling to replace or advocate for not using spreadsheets, but I find it works better for me so I thought I would share and see if it matches their style and what others think on the topic. I love to share knowledge about using technology and this fits with that. I don't think I can create new content that compares to the mountain on content out there on access patterns and how to create them, so I will level set and say that this blog post is for those who have a good understanding of access patterns already and how to model in DynamoDB.
Without further ado, onto the main content.
First up, we need to have some data to model. I went back and forth on this, but I think a nation wide Kayaking Rental company might give us some easy wins for modeling access patterns and showcasing the method of documenting those in JSON.
Let's assume that there is a Kayak Rental location in almost all the States in the United States of America, we'll call it 40. Each location will have an address. Let's assume these addresses are constant through time and won't be changing. Each location will have on average 5 employees, each employee has a job title and compensation, name and address, and other details that we won't worry about at this time. Employees are people, and people are likely to move, so the address is likely to change over time. Each store has an inventory that changes day to day. Each store has rental records. There are customers and what rentals they currently have. We could get into a lot more that is needed for actually running a Kayak Rental franchise, but let's stop here as this will give us more than enough to start asking questions of our data, and therefore we'll be able to define our access patterns.
I prefer to have the access pattern questions as close to the data modeling as possible so things aren't spread out and hard to find. I'll switch to JSON for this next part, and have the content implemented there:
{
"Access Patterns": [
"get the location of all rental stores",
"get the inventory of a store",
"get the current employees of a store",
"get all employees who have worked at a store",
"get all stores an employee has worked at",
"get all rentals a customer has out",
"get customer rental history for a location",
"get customer rental history for all locations"
]
}
For this blog post, I don't think we will get through everything. Let's pick some topics and start modeling! First up, let's talk about Single Table Design: where all our items are in the same DynamoDB table. This will mean that we need to keep track of data modeling in a way that all our items can exist at the same time and not conflict. Being thorough in documenting the access patterns and the data models will be key to make sure we keep things working well. The access patterns we have defined will need a brief introduction to cardinality. Traditional DynamoDB data modeling is extremely good at creating patterns for data with 1:1, many:1, and many:many relationships, given that on the many:many relationships, each pairing of items only occurs once. Item collections model 1:1, many:1, and many(1):(1)many relationships well, and there's a lot out there on it. Relationships where you can have many(1..n):(1..m)many situations, where each item can have any number of relationships to any other item 1 or numerous times are really hard to model with item collections, index overloading, or global secondary indexes on single items. This is getting into some complex situations to talk about abstractly, so let's give a real world example: an employee has worked for the company for 20 years. They have moved to a different location about once a year, with 10 different locations through promotions or lateral movements through the company, and they have worked at the same locations a few times. Say they have worked at location "A" 5 different times. Another situation, is an employee could be living at a location that is close to 2 stores, and might be working at both at the same time. This presents an issue when you need to define that relationship more than once. Using reverse PK/SK (SK as the partition key, and PK as the sort key in the global secondary index) lookups can work for many(1):(1)many relationships quite well. I've had issues with them for many(1..n):(1..m)many situations and have moved on to other patterns. For this, I have found Materialized Graphs are an awesome
pattern to use that keeps things generic and easy to understand, and powerful for querying. In Alex Debrie's "The DynamoDB Book" he has a section on Materialized Graphs, and I would highly recommend checking it out if you haven't already. For the modeling today, we'll make use of them for some situations, and keep to traditional items collections and index overloading for others.
Long ago I made the decision to keep the indexes and keys generic. This cuts off some access patterns that are perfectly valid, but also allows me to set things up once and keep on that path without having to continue to redo things or add customizations and worry about how they will affect previous data modeling. Thus, I have the primary index (what uniquely identifies each item) set to use "PK" as the partition key string, and "SK" set as the sort key string. I don't use local secondary indexes. I set up my tables to start with 6 Global Secondary Indexes as this is generally 1 more GSI than I have needed in most projects as they grow. Each Global Secondary Index is set up to use PK{x} and SK{x} as their partition and sort keys. Because Global Secondary Indexes are sparse, and will only be populated if the item contains the PK{x} and SK{x}, I can add these up front and not be penalized in performance or cost up starting out, and they will be there when I need them. Because I overload the Global Secondary Indexes, I don't project specific fields into each, I use the default of all fields being projected into the Global Secondary Index shadow table.
Alright, now that we have those two paragraphs of initial explanations and level setting out of the way, let's start modeling some basic entities.
{
"store": [
{
"PK": "${version}#store#storeULID#${storeULID}",
"SK": {
"metadata": {
"entityType": "storeMetadata"
}
}
}
],
"person": [
{
"PK": "${version}#person#personULID#${personULID}",
"SK": {
"metadata": {
"entityType": "personMetadata"
}
}
}
]
}
Here we have store and person. I decided to go with person as an entity because we have employees and we have customers, so this will allow us to model both. We may or may not stay with this, but for starters, this is a decisions I can support. I will also be using ULIDs. ULIDS are like UUIDs, but include time in a way to make them sortable based on when they were created.
For each PK, I have made the habit to include a version in case in the future I need to switch to a new version of data while working through the old data. I have yet to really need this, but I've kept it. There are situations where you can model in the version for keeping latest items as they are updated and achieving the changes to each one by changing the version on the fly, but I won't be covering that here.
For each category, or often entities, I create an entry in a JSON dictionary or object with a title. This can be something we can refer to, but it's not too strict on naming and could change on a whim if something else makes more sense. For each category of item (e.g. store and person) I have an array of objects that are one PK and at least one SK. The PK is a string, the SK is a dictionary, with each unique SK pattern as the key. For now, there are only two PK/SK unique pairing patterns, but we'll build more. Each unique PK/SK pair pattern is one entity, and we have the "entityType" field to give it a unique entity name. At one point in time I would do queries, parse the PK and SK, and use logic to figure out which entity it was. This worked well, but was pretty confusing to go through, had a lot of boilerplate code, and didn't help much if you working with the items in the AWS DynamoDB Web Console, without the assistance of these helper functions. Thus, I switched to the rule that all items in the DynamoDB table will have an entity type. We now have a storeMetadata entity, as well as a personMetadata entity.
Next up, let's add employees.
{
"store": [
{
"PK": "${version}#store#storeULID#${storeULID}",
"SK": {
"metadata": {
"entityType": "storeMetadata"
},
"employee#metadata#personULID#${personULID}":{
"entityType": "storeEmployee"
}
}
}
]
}
Here we use the personULID, and we'll populate it with just a few fields like name and id, things that won't change often (and when they do, it's a quick update). This item we will add as employees are hired, and remove them if they no longer work at the store.
Next let's add inventory. Very similar to employees, but we'll go ahead and take the hit and have all the metadata for each item, even if duplicate, for each item in inventory. Later we could move these out into their own like people, but for now for a rental company, there are benefits of tracking details on each and having some duplication of details. Having worked previously in retail for 11 years, I can tell you that even with specific products there are often countless different variations of it that make single lookups for an item that removes duplication sometimes useless.
{
"store": [
{
"PK": "${version}#store#storeULID#${storeULID}",
"SK": {
"metadata": {
"entityType": "storeMetadata"
},
"employee#metadata#personULID#${personULID}":{
"entityType": "storeEmployee"
},
"inventory#metadata#inventoryULID#${inventoryULID}":{
"entityType": "storeInventoryItem"
}
}
}
]
}
Let's review access patterns and see how the current model matches up and what queries the model would support. So far it's 2 out of 8: "get the inventory of a store" and "get the current employees of a store". For get inventory of a store, we can query for the PK "${version}#store#storeULID#${storeULID}"
and for the SK we would do SK begins with "inventory#metadata#inventoryULID#"
. For this query we would need the version and the storeULID. Very similar query for the current employees of a store: PK "${version}#store#storeULID#${storeULID}"
and SK begins with "employee#metadata#personULID#"
.
Now, let's use a Global Secondary Index (GSI) to grab all the stores. Adding PK1 and SK1 to the storeMetadata, we can query by the PK1 on the GSI1 index and it will give us all the stores. This is where the JSON really starts to shine through over spreadsheets for my style; each different entity has its Global Secondary Indexes explicitly defined as they might appear (templated) in DynamoDB with JSON view on. To me it makes it very easy to see what the raw data will start to look like, and we can copy and paste many parts into our code where needed. Often, we can paste in the template strings as is, and define the variables that they would use e.g. "storeULID#${storeULID}" becomes storeULID#${storeULID}
and would just require the variable storeULID to be present for the string to be completed. Notice that now only the metadata entity has the GSI1 values. The employee and inventory do not.
{
"store": [
{
"PK": "${version}#store#storeULID#${storeULID}",
"SK": {
"metadata": {
"entityType": "storeMetadata",
"PK1": "${version}#stores",
"SK1": "storeULID#${storeULID}",
"_commentGSI1": "GSI to grab all the stores at once."
},
"employee#metadata#personULID#${personULID}":{
"entityType": "storeEmployee"
},
"inventory#metadata#inventoryULID#${inventoryULID}":{
"entityType": "storeInventoryItem"
}
}
}
]
}
Now let's get into how we might solve the many(1..n):(1..m)many problem with materialized graph pattern. In this pattern we create edges between the nodes, or a relationship item in the database that points to both items and can have some properties on it. The nodes are the items we want to relate together e.g. a person and employment, and the edges are that relationship e.g. Person A worked at Company B. This pattern is set up using a pair of Global Secondary Indexes where each one allows you to query the inverse of the other Global Secondary Index. Let's look at how this might be set up with a relationship grouping.
{
"relationships": [
{
"PK": "${version}#employment#employmentULID#${employmentULID}",
"SK": {
"metadata": {
"entityType": "employmentRelationship",
"PK2": "${version}#employment#${xULID}",
"SK2": "${version}#employment#${yULID}",
"_commentGSI2": "Can query by xULID, and get all employment for x. Company location is x, person is y",
"PK3": "${version}#employment#${yULID}",
"SK3": "${version}#employment#${xULID}",
"_commentGSI3": "Can query by yULID, and get all employment for y, Company location is x, person is y"
}
}
}
]
}
The key part of this is the 2 Global Secondary Indexes 2 and 3. Each is a flipped version of itself. We could put these into a specific Global Secondary Index that just swaps these for us, but I find until you get to large scale this pattern is just fine. We define that in employment relationships, x will be the company location and y will be the person. Any time a person starts working at a Kayak Rental location, the will have one of these items created. The item can also contain many fields such as job position, salary, date range of employment, etc. For now we won't worry about filling those in; however, because we are using ULIDs we can sort on them and know when an employee started at a location compared to other locations, or vice versa if we want to know the general order of when employees at a location were hired.
Now that we have this Materialized Graph pattern we can build some queries to satisfy more of our access patterns. "get all employees who have worked at a store"
we query on GSI2 with PK2 "${version}#employment#${xULID}"
with xULID being the store location. This will give us all the employment records of that store. We could duplicate fields from person and store on the relationship object, things likely not to change much, and that could be the end of it. The other option is that we loop through the results and do a query for each person to get their full information. This can get expensive at scale, but it's still very efficient and does scale. With DynamoDB you can also send numerous queries at it in parallel to get data back, so our options are open here. "get all stores an employee has worked at"
we do the exact same as for stores, but use GSI3 with PK3 "${version}#employment#${yULID}"
where yULID is the person ULID. With this one, we could also just query all stores and then have them locally to loop through for lookups, bypassing the need to make 40+ queries to get each store's information.
We have satisfied 5 of the 8 access patterns. Now we will focus on the rental invoices and history. We will essentially duplicate the employment relationship, and change it to meet the needs of rentals. We will also go back to item collections of the store, where the current rentals out will be kept. First, here is the update to the store item collection.
{
"store": [
{
"PK": "${version}#store#storeULID#${storeULID}",
"SK": {
"metadata": {
"entityType": "storeMetadata",
"PK1": "${version}#stores",
"SK1": "storeULID#${storeULID}",
"_commentGSI1": "GSI to grab all the stores at once."
},
"employee#metadata#personULID#${personULID}":{
"entityType": "storeEmployee"
},
"inventory#metadata#inventoryULID#${inventoryULID}":{
"entityType": "storeInventoryItem"
},
"activeRentals#personULID#${personULID}#inventoryULID#${inventoryULID}":{
"entityType": "storeActiveRental"
}
}
}
]
}
Each time a person rents a kayak, the system will create a record with a PK of "${version}#store#storeULID#${storeULID}" and an SK of "activeRentals#personULID#${personULID}#inventoryULID#${inventoryULID}". This will allow us to query with the PK and do SK begins with "activeRentals#personULID#${personULID}#inventoryULID#" to see all the rentals a person has out, satisfying our access pattern. But wait, this is only for one location? We need to know for all locations, as location isn't a requirement of the access pattern. This we can solve with a GSI. We'll use GSI4 on the same item to make it so we can query across all locations.
{
"store": [
{
"PK": "${version}#store#storeULID#${storeULID}",
"SK": {
"metadata": {
"entityType": "storeMetadata",
"PK1": "${version}#stores",
"SK1": "storeULID#${storeULID}",
"_commentGSI1": "GSI to grab all the stores at once."
},
"employee#metadata#personULID#${personULID}":{
"entityType": "storeEmployee"
},
"inventory#metadata#inventoryULID#${inventoryULID}":{
"entityType": "storeInventoryItem"
},
"activeRentals#personULID#${personULID}#inventoryULID#${inventoryULID}":{
"entityType": "storeActiveRental",
"PK4": "${version}#activeRentals#personULID#${personULID}",
"SK4": "inventoryULID#${inventoryULID}",
"_commentGSI4": "GSI for looking up all rentals across all locations a person has out."
}
}
}
]
}
Now remember, we could start to overload each Global Secondary Index and use them for different things as long as the unique values are separate and we don't need both on one entity. I do like to start to do that, but I generally have enough access patterns satisfied by going up to 6 Global Secondary Indexes and not reusing each one.
All that is left in our initial access patterns is to set up rental history. We can again use the materialized graph pattern to meet this. A person at a location may rent out the same kayak countless times over the years. That Kayak could potentially also be moved to another location and the person loves it so much they drive there to rent it. More than one person can rent the same kayak, although not at the same time, so we have our many to many relationship, with a cardinality on both sides of 0 (database entry wouldn't be created with 0) to n. Back to the rental records and access patterns, we can use this to handle it:
{
"relationships": [
{
"PK": "${version}#rental#rentalULID#${rentalULID}",
"SK": {
"metadata": {
"entityType": "rentalRelationship",
"PK2": "${version}#rentalLocationPerson#${xULID}",
"SK2": "${version}#rentalPersonLocation#${yULID}",
"_commentGSI2": "Can query by xULID, and get all rental for x. Company location is x, person is y",
"PK3": "${version}#rentalPersonLocation#${yULID}",
"SK3": "${version}#rentalLocationPerson#${xULID}",
"_commentGSI3": "Can query by yULID, and get all rental for y, Company location is x, person is y",
"PK5": "${version}#rentalInventoryPerson#${xULID}",
"SK5": "${version}#rentalPersonInventory#${yULID}",
"_commentGSI5": "Can query by xULID, and get all rental for x. Inventory is x, person is y",
"PK6": "${version}#rentalPersonInventory#${yULID}",
"SK6": "${version}#rentalInventoryPerson#${xULID}",
"_commentGSI6": "Can query by yULID, and get all rental for y, Inventory is x, person is y"
}
}
}
]
}
In this case, we have added two more Global Secondary Indexes to handle the second set of relationships in a given rental transaction, this allows us to link a person, a company location, and the inventory being rented.
Let's go ahead and define what the queries would be for the last 2 access patterns identified. "get customer rental history for a location"
can be satisfied by querying on Global Secondary Index 3, PK3 "${version}#rentalPersonLocation#${yULID}"
and SK3 "${version}#rentalLocationPerson#${xULID}"
where yULID is the person, and xULID is the company location. "get customer rental history for all locations"
we simply change our strategy to using just the PK3 "${version}#rentalPersonLocation#${yULID}"
where yULID is the person and this will return all rentals by a person for all locations.
Wrapping up, here's all the JSON put together:
{
"Access Patterns": [
"get the location of all rental stores",
"get the inventory of a store",
"get the current employees of a store",
"get all employees who have worked at a store",
"get all stores an employee has worked at",
"get all rentals a customer has out",
"get customer rental history for a location",
"get customer rental history for all locations"
],
"store": [
{
"PK": "${version}#store#storeULID#${storeULID}",
"SK": {
"metadata": {
"entityType": "storeMetadata",
"PK1": "${version}#stores",
"SK1": "storeULID#${storeULID}",
"_commentGSI1": "GSI to grab all the stores at once."
},
"employee#metadata#personULID#${personULID}":{
"entityType": "storeEmployee"
},
"inventory#metadata#inventoryULID#${inventoryULID}":{
"entityType": "storeInventoryItem"
},
"activeRentals#personULID#${personULID}#inventoryULID#${inventoryULID}":{
"entityType": "storeActiveRental",
"PK4": "${version}#activeRentals#personULID#${personULID}",
"SK4": "inventoryULID#${inventoryULID}",
"_commentGSI4": "GSI for looking up all rentals across all locations a person has out."
}
}
}
],
"person": [
{
"PK": "${version}#person#personULID#${personULID}",
"SK": {
"metadata": {
"entityType": "personMetadata"
}
}
}
],
"relationships": [
{
"PK": "${version}#employment#employmentULID#${employmentULID}",
"SK": {
"metadata": {
"entityType": "employmentRelationship",
"PK2": "${version}#employment#${xULID}",
"SK2": "${version}#employment#${yULID}",
"_commentGSI2": "Can query by xULID, and get all employment for x. Company location is x, person is y",
"PK3": "${version}#employment#${yULID}",
"SK3": "${version}#employment#${xULID}",
"_commentGSI3": "Can query by yULID, and get all employment for y, Company location is x, person is y"
}
}
},
{
"PK": "${version}#rental#rentalULID#${rentalULID}",
"SK": {
"metadata": {
"entityType": "rentalRelationship",
"PK2": "${version}#rentalLocationPerson#${xULID}",
"SK2": "${version}#rentalPersonLocation#${yULID}",
"_commentGSI2": "Can query by xULID, and get all rental for x. Company location is x, person is y",
"PK3": "${version}#rentalPersonLocation#${yULID}",
"SK3": "${version}#rentalLocationPerson#${xULID}",
"_commentGSI3": "Can query by yULID, and get all rental for y, Company location is x, person is y",
"PK5": "${version}#rentalInventoryPerson#${xULID}",
"SK5": "${version}#rentalPersonInventory#${yULID}",
"_commentGSI5": "Can query by xULID, and get all rental for x. Inventory is x, person is y",
"PK6": "${version}#rentalPersonInventory#${yULID}",
"SK6": "${version}#rentalInventoryPerson#${xULID}",
"_commentGSI6": "Can query by yULID, and get all rental for y, Inventory is x, person is y"
}
}
}
]
}
There is much more we could model, and we could most certainly model some of these better, but this is New Years, and I have ran out of time to publish this in 2024 as I promised myself I would do. I shared what I wanted to share about putting access patterns in JSON format, and hopefully the benefits of doing it that way are seen. I find it works best for me, but I don't claim it is the best way. The current structure still lacks some possible (and common) patterns, so that could be improved if desired. This structure has served me well as it has evolved over the last 7 years and it works quite well for me! I hope those who have made it this far in this post have also appreciated the walk through of another example of evaluating a project, producing the access patterns needed, and implementing those in concrete DynamoDB ready implementations.
Please feel free to connect with me on LinkedIn, or to join the Believe in Serverless discord where there are over 1000 serverless enthusiasts ranging from just a few days in to people who literally wrote the book on the subject.
Happy New year!
Top comments (0)