DEV Community

Cover image for DynamoDB design patterns considered harmful
V for Volisoft

Posted on

DynamoDB design patterns considered harmful

Originally published at Volisoft

Table of Contents

Overview

Key Takeaway: DynamoDB design patterns are helpful illustrations, not rigid rules.

Effective DynamoDB design requires quantitative analysis of your data and access patterns.
Applying patterns without this analysis risks increased costs and degraded performance.
This article presents an automated DynamoDB design approach to address these critical challenges.

Introduction

Official AWS documentation offers DynamoDB design patterns to guide users migrating from relational to NoSQL databases, specifically DynamoDB.
However, these patterns are technique demonstrations, not prescriptive solutions.
Truly efficient and cost-effective DynamoDB design depends on a deep, quantifiable understanding of your data and anticipated access patterns.

Deeper Dive: Quantification is Key

“Understanding” here means quantification.
For data, this involves knowing the volume of each entity type and the distribution of key values.
For access patterns, it means determining data retrieval volumes and the frequency of each query.
Ignoring these quantitative factors can lead to higher operational costs and reduced application performance.

Case Study: Online Team Game - Different Data, Different Designs

To illustrate the importance of data characteristics, let’s consider an online team game example.
We’ll model two entity types: Game and Stats.
Let’s define their attributes and expected volumes:

Table 1: Game and Stats Entities: Data Volumes and Attributes
Entity Count time+team/name[id] time team/name archived? game/data stats/data
Game 1000 1 15 30 500 2 0
Stats 1000 1 15 30 0 0 2

In this online team game scenario, ’time’ represents the game timestamp.
On average, each team generates 30 Game records and 30 Stats records.
The fields ’time’ and ’team/name’ (represented as ’time+team/name[id]’) uniquely identify both Game and Stats entities.
’archived?’, ’game/data’, and ’stats/data’ represent additional attributes associated with each entity type.

The application needs to support the following queries.
Understanding the frequency and expected return size of each query is crucial for optimal schema design:

Table 2: Application Query Profile: Frequency and Expected Return Sizes
Query Name Entity Partition Key Sort key Frequency Return Count
time->games Game time   1 5
team-time>games Game team/name time 20 1
time+archived?->game Game time archived? 1 5
time->stats Stats time   1 5
team+time->stats Stats team/name time 1 1

Based on these data characteristics and query patterns, a read-optimized indexing schema could be structured as follows:

Table 3: Read-Optimized Schema (Initial Data Assumptions)
:table-cnt :table :pk :sk :entity
2000 MAIN time team/name Game
2000 MAIN time team/name Stats
2000 GSI1 team/name   Game
2000 GSI1 team/name   Stats
Table 4: Query Costs (Initial Data Assumptions)
:query :query-tbl :query-cost
time->games MAIN 15
time+archived?->game MAIN 5
time->stats MAIN 15
team+time->games GSI1 20
team+time->stats GSI1 1

The Twist: Evolving Data Changes Everything

Software applications evolve, and so does their data.
Initial assumptions about data distribution may become outdated as requirements change.
Optimization priorities can also shift, perhaps focusing on outlier cases rather than typical scenarios.
Let’s examine how revised data assumptions impact database design.

Revised Assumptions: Stats are Scarce

Previously, we assumed an average of 30 Stats records per team.
Now, let’s assume we still have 30 Game records per team, but dramatically reduce the Stats records to just 3 per team.
This seemingly small change has significant design implications.

Table 5: Read-Optimized Schema (Revised Data Assumptions)
:table-cnt :table :pk :sk :entity
2000 MAIN time team/name Game
2000 MAIN team/name time Stats
2000 GSI1 team/name   Game
2000 GSI1 time   Stats
Table 6: Queries (Revised Data Assumptions)
:query :query-tbl :query-cost
time->games MAIN 15
time+archived?->game MAIN 5
team+time->stats MAIN 1
team+time->games GSI1 20
time->stats GSI1 15

This shift in Stats record volume suggests a revised indexing strategy.
Indexing Stats records using ’team/name’ as the partition key becomes more efficient due to its increased specificity in this scenario.
A more specific partition key (lower cardinality) enhances DynamoDB’s ability to distribute data effectively.
Consequently, the query mapping adapts: retrieving Stats records by ’team/name’ [PK] and ’time’ [SK] for individual items can now be efficiently executed on the MAIN table.
Conversely, retrieving Stats records by ’time’ is now better served by querying the GSI1 index.

Conclusion: Data-Driven Design is Key

Key Takeaway: Data-driven design is critical.

Different data characteristics suggest different design choices.
Blindly applying patterns can be costly and inefficient.
Embracing a data-centric approach, especially with automated analysis, leads to efficient, cost-effective, and performant DynamoDB database designs.

Top comments (0)

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay