Originally published at Volisoft
Table of Contents
- Overview
- Introduction
- Deeper Dive: Quantification is Key
- Case Study: Online Team Game - Different Data, Different Designs
- The Twist: Evolving Data Changes Everything
- Revised Assumptions: Stats are Scarce
- Conclusion: Data-Driven Design is Key
Overview
Key Takeaway: DynamoDB design patterns are helpful illustrations, not rigid rules.
Effective DynamoDB design requires quantitative analysis of your data and access patterns.
Applying patterns without this analysis risks increased costs and degraded performance.
This article presents an automated DynamoDB design approach to address these critical challenges.
Introduction
Official AWS documentation offers DynamoDB design patterns to guide users migrating from relational to NoSQL databases, specifically DynamoDB.
However, these patterns are technique demonstrations, not prescriptive solutions.
Truly efficient and cost-effective DynamoDB design depends on a deep, quantifiable understanding of your data and anticipated access patterns.
Deeper Dive: Quantification is Key
“Understanding” here means quantification.
For data, this involves knowing the volume of each entity type and the distribution of key values.
For access patterns, it means determining data retrieval volumes and the frequency of each query.
Ignoring these quantitative factors can lead to higher operational costs and reduced application performance.
Case Study: Online Team Game - Different Data, Different Designs
To illustrate the importance of data characteristics, let’s consider an online team game example.
We’ll model two entity types: Game
and Stats
.
Let’s define their attributes and expected volumes:
Entity | Count | time+team/name[id] | time | team/name | archived? | game/data | stats/data |
---|---|---|---|---|---|---|---|
Game | 1000 | 1 | 15 | 30 | 500 | 2 | 0 |
Stats | 1000 | 1 | 15 | 30 | 0 | 0 | 2 |
In this online team game scenario, ’time’ represents the game timestamp.
On average, each team generates 30 Game
records and 30 Stats
records.
The fields ’time’ and ’team/name’ (represented as ’time+team/name[id]’) uniquely identify both Game
and Stats
entities.
’archived?’, ’game/data’, and ’stats/data’ represent additional attributes associated with each entity type.
The application needs to support the following queries.
Understanding the frequency and expected return size of each query is crucial for optimal schema design:
Query Name | Entity | Partition Key | Sort key | Frequency | Return Count |
---|---|---|---|---|---|
time->games | Game | time | 1 | 5 | |
team-time>games | Game | team/name | time | 20 | 1 |
time+archived?->game | Game | time | archived? | 1 | 5 |
time->stats | Stats | time | 1 | 5 | |
team+time->stats | Stats | team/name | time | 1 | 1 |
Based on these data characteristics and query patterns, a read-optimized indexing schema could be structured as follows:
:table-cnt | :table | :pk | :sk | :entity |
---|---|---|---|---|
2000 | MAIN | time | team/name | Game |
2000 | MAIN | time | team/name | Stats |
2000 | GSI1 | team/name | Game | |
2000 | GSI1 | team/name | Stats |
:query | :query-tbl | :query-cost |
---|---|---|
time->games | MAIN | 15 |
time+archived?->game | MAIN | 5 |
time->stats | MAIN | 15 |
team+time->games | GSI1 | 20 |
team+time->stats | GSI1 | 1 |
The Twist: Evolving Data Changes Everything
Software applications evolve, and so does their data.
Initial assumptions about data distribution may become outdated as requirements change.
Optimization priorities can also shift, perhaps focusing on outlier cases rather than typical scenarios.
Let’s examine how revised data assumptions impact database design.
Revised Assumptions: Stats are Scarce
Previously, we assumed an average of 30 Stats
records per team.
Now, let’s assume we still have 30 Game
records per team, but dramatically reduce the Stats
records to just 3 per team.
This seemingly small change has significant design implications.
:table-cnt | :table | :pk | :sk | :entity |
---|---|---|---|---|
2000 | MAIN | time | team/name | Game |
2000 | MAIN | team/name | time | Stats |
2000 | GSI1 | team/name | Game | |
2000 | GSI1 | time | Stats |
:query | :query-tbl | :query-cost |
---|---|---|
time->games | MAIN | 15 |
time+archived?->game | MAIN | 5 |
team+time->stats | MAIN | 1 |
team+time->games | GSI1 | 20 |
time->stats | GSI1 | 15 |
This shift in Stats
record volume suggests a revised indexing strategy.
Indexing Stats
records using ’team/name’ as the partition key becomes more efficient due to its increased specificity in this scenario.
A more specific partition key (lower cardinality) enhances DynamoDB’s ability to distribute data effectively.
Consequently, the query mapping adapts: retrieving Stats
records by ’team/name’ [PK] and ’time’ [SK] for individual items can now be efficiently executed on the MAIN table.
Conversely, retrieving Stats
records by ’time’ is now better served by querying the GSI1 index.
Conclusion: Data-Driven Design is Key
Key Takeaway: Data-driven design is critical.
Different data characteristics suggest different design choices.
Blindly applying patterns can be costly and inefficient.
Embracing a data-centric approach, especially with automated analysis, leads to efficient, cost-effective, and performant DynamoDB database designs.
Top comments (0)