DynamoDB: The 'Access Patterns' Mindset

#dynamodb #aws #serverless

It is well known that software development is a lifelong learning process. At the start of my career, most of my early interactions involved SQL databases. I used many SQL administrative tools, such as DBeaver, SQLiteStudio, and MySQL Workbench. When I first began working with DynamoDB, I didn’t realise early enough that I needed a mindset shift, and I am sure quite a few developers fell into a similar trap. In designing my DynamoDB table structure, I treated queries and access patterns as afterthoughts and focused more on mappings and relationships between items and entities, just as I would with a relational database. Persistently applying these old SQL habits often led me to deal with inefficient scans, which was deeply frustrating.

When working with DynamoDB, you need to shift your mindset, especially if you have prior experience with relational database systems. The key focus when creating a DynamoDB table should primarily be on access patterns rather than data structure.

The goal of this article is to shed light on what “designing for access patterns” means. I will explain its importance and why you cannot tap into the numerous benefits of DynamoDB without this mindset shift.

How does DynamoDB differ from Relational Database systems?

For starters, and for the benefit of those who do not know what DynamoDB is, it is a NoSQL database that is highly available and performant. As a user, you do not have to worry about patching and maintenance; AWS handles this. Amazon DynamoDB is a fully managed database capable of processing millions of requests per second.

With Relational databases like PostgreSQL, MySQL, and MariaDB, data can be queried in many ways. SQL-based databases are heavily dependent on JOINS and indexes, and this makes it easier to avoid data duplication. DynamoDB operates differently. As mentioned ab initio, DynamoDB is for workloads that need a high level of performance and scalability. Regardless of the number of item rows (can scale to trillions), reads and writes must be processed at a fast pace, irrespective of the database size. This is achieved by avoiding unnecessary item scans (which could be costly) and quick lookup of table items with the Primary Key.

Look at it this way, in Relational databases, you create tables, and your database must adjust itself to whatever queries you feed it to give you a result. On the flip side, in DynamoDB, your database and table won’t conform to your queries; your queries must adapt to your database based on the data models and access patterns you have defined.

DynamoDB’s access pattern mindset:

“These are the queries I have: what access patterns and data models should I create?”

Relational database:

“What queries can I write with the table/tables I have?”

When working with DynamoDB, it is best practice to list our entry points or access patterns before building. Without specifying and identifying these patterns, we wouldn't be able to outline the Primary Key, Sort Key, Local Secondary Index (LSI), and Global Secondary Index (GSI) that we need.

Understanding Primary Keys and Indexes

The Primary Key and Indexes are at the core of DynamoDB access patterns. For example, if we were to build a simple e-commerce backend of users and orders using a relational database, we would need to set up various tables for Users, Orders, Items, etc. With DynamoDB, we can model this efficiently in just one table thanks to the Primary Keys and Indexes.

Primary Key

You must first understand how to choose a Primary Key for your DynamoDB table before any further discussions on access patterns. In choosing a Primary Key, there are two options to select from.

Partition Key: This has to be unique for every row/item on the database. A unique partition key guarantees data distribution. E.g. for a user table, a good choice for a partition key would be the user_id (it could be a random UUID) because of its high cardinality. We can have several other attributes, such as name, age, address, etc.

Partition Key + Sort Key: The combination has to be unique for every item/row on the database. For example, in a users-orders table, the user_id is an ideal candidate for the partition key, and the orders_id can serve as the sort key. However, it is important to emphasise that when we have a composite primary key, data grouping is done on the partition key. So, a user can have numerous orders, as seen with user_id 5456789094 below.

So, we see the second and third items have the same partition keys but different sort keys, and their combination creates a unique primary key.

LSI (Local Secondary Index)

This is the first of the 2 kinds of indexes. It gives us another sort key for querying our DynamoDB table while maintaining the same partition key. We cannot define an LSI after the table has been created, and we can have up to 5 LSIs per table. At the moment, we can only do a query on user_id and order_id. We haven’t devised a means to do a query on user_id and order_item. The LSI makes it possible to perform this query. Therefore, we can get all the items ordered by a particular user.

GSI (Global Secondary Index)

What if we wanted to get the items that are in a particular order? At the moment, there is no way to do this, and that’s where GSIs come in. This gives us an alternate primary key that is different from the primary key we defined in our base table definition. We can create a GSI after initial table creation, and it makes it faster to query on non-key attributes. To get the item in an order, we can specify the order_id as the new partition key and the item as the new sort key. The user_id, order_quantity, name, and address become the attributes.

Wrapping Up!

When working with DynamoDB, data modelling is more about access patterns and query entry points than tables and how they are related. It's more about how your application reads and writes data. Adopting the access pattern mindset makes it possible to enjoy the immense benefits of DynamoDB as a serverless and scalable database.