DEV Community

Cover image for DynamoDB Professional - part 2 - sparse index
Jakub Stanisławczyk
Jakub Stanisławczyk

Posted on

DynamoDB Professional - part 2 - sparse index

In the previous part, I talked about indexing mechanisms in DynamoDB. I mentioned that there are two types of indexes: Local Secondary Index (LSI) and Global Secondary Index (GSI). I lied to you a little, because there are actually three types. But wait, how is that possible? Is the AWS documentation wrong?

How does a GSI work?

Before I answer that question, let's first go back to the basics. Let's recall how a GSI works. I covered its mechanism in the first part of this series. In short, this index operates under the hood like a separate table. Data is copied into it only when needed. And it is precisely this mechanism that forms the foundation of what can be called a Sparse Index.

Sparse Index

A sparse index is a type of database index where we don't create an entry for every single record in the table - only for the ones we care about. In DynamoDB, this isn't something that's directly supported. But, using the mechanisms we've already talked about, we can simulate it pretty easily.

Standard GSI Index example

Let's take an example. Imagine we're building an app that processes events from temperature sensors. Say each sensor emits an event every second. Each event has an ID, the sensor ID, a createdAt, a value, and status. The status is just a simple enum: OK if the value is within the expected range, or ALARM if it's outside.

Now, suppose we want to add a section to our dashboard that shows only the alarms for a given sensor. It's easy, we create a new GSI called SENSOR_STATUS_GSI, where the partition key is {sensorId}#{status} and the sort key is {createdAt}.

Sensor Status GSI console

This is how it looks in the code. We create separate SENSOR_STATUS_GSI_PK property.

Sensor event create

With this setup, querying events by status becomes super straightforward. When we scan index we can see all the events. In this example I created 10 events for 2 sensors (5 events per sensor)

Sensor status GSI scan

When we want to fetch all the alarms for the sensor, we only need to provide expected GSI PK

Query index

But this isn't really the best solution. After all, what we actually need is to query only the events that are alarms. Now we're storing everything. In a system that might generate millions of events, this would quickly increase the cost of our DynamoDB table.

Instead, we can tweak our GSI a little bit.

Sparse Index example

Let's create new ALARM_GSI index. This one uses {sensorId} as PK and {createdAt} as SK.

Console GSI alarm

I also updated the code with the new GSI PK.

Alarm save

When we query our index by sensor ID, we'll only get corresponding alarms.

Alarm GSI alarms

The magic really starts to happen when we scan our index.

Alarm GSI scan

The difference is immediate. It turns out the index now stores only the alarms, no events with an OK status in sight. The index already contains only the data you want.

Summary

A Sparse Index in DynamoDB is basically a clever hack. Even though DynamoDB doesn't give us native support for this feature, with a bit of creativity we can simulate it using GSIs.

This solution is very useful for large data sets that need to be searched binarily. For example:

  • A job queue where we need to search for jobs with a status of PENDING to process
  • Searching only for available products

The downside of this solution is that future changes must be anticipated. If the business also wants to display recent events with a status of OK, our sparse index won't be able to handle it. This will require either adding another index or modifying the current index and migrating the data.

Sparse indexes are a neat DynamoDB trick. They're not always the right fit, but in scenarios where you only care about a subset of your data, they can save you money and speed up your queries.


Links

👋 LinkedIn
💻 Github

Top comments (0)