DynamoDB is an amazingly powerful and performant database, best known for its low latency and elastic scaling characteristics. But there is one trap it is super easy to fall into, especially if you have any background at all in more traditional relational database systems (think MySQL, Postgres, Oracle, and SQL Server).
That's Not Normal, Man!
In the relational database world, the dependable best practice is to normalize your data model. It's a fairly academic topic, but the short version is that every piece of data should have one home (i.e., one table which is its canonical location), and any references to that data in another table will be in the form of a foreign key, which a pointer to its "true" location.
In this way of storing data, an entity can be assembled from all the different rows in all the different tables by asking for a JOIN
operation.
If you've ever done a data modeling exercise in this paradigm, you might have started with a table-per-entity. It might feel natural to follow this same process with DynamoDB. Stop!
Don't Be a Joiner
If you have taken DynamoDB for a spin, you may have noticed that there aren't any JOIN
operations. This is a feature, not a bug! Let's talk a bit about why JOIN
was invented in the first place. In the dawn of SQL databases (let's go back to 1979), storage was scarce and expensive. A database join saves storage at the expense of computation, a tradeoff which made sense for a long time, but doesn't anymore. In the present day, the cost equation is completely flipped: storage is millions of times cheaper, and while computation has also improved (a lot), not by the same orders of magnitude as storage, which means computation is now the bottleneck for cost and performance. DynamoDB achieves remarkable performance by not incurring the computation cost of doing all those joins.
If you make this common mistake (and I did, a bunch!) and continue modeling entities the old way, entity-per-table, you will end up doing all the joins yourself, in your application code. This is the worst of both worlds, because you're giving up the expressive flexibility of SQL while still paying the cost of joins.
So Now What?
What do you do instead? This problem is solvable with a little planning up-front. The best way I know to describe the new way of working is to imagine the query result you want, and store the rows in that form, completely denormalized. There's an old wisdom about performance: the less it has to do, the faster it can be. With denormalized rows in your DynamoDB table, the only work is fetching by your chosen partition key (and optionally a little bit extra to apply attribute constraints before returning).
The Next Level
When you start modeling in this way, the surprising result is that most applications can be implemented with a single database table. This is like waking up from living in the Matrix. When you're ready for more, start with this post by Alex Debrie, and then read everything else he has written! Absorb all of this documentation. And when you're done with that and you're ready to level up again, go find everything by Rick Houlihan, like this talk from re:Invent entitled "Amazon DynamoDB advanced design patterns."
Go With the Flow
When you start using DynamoDB the way it was designed, it will blow your mind. Have fun on the journey!
Top comments (1)
Awesome writeup!
DynamoDB is great as you have summarised here. The problem I have in the industry is that not many people understand (or want to) that the "correct" way to use DynamoDB is a SINGLE database table as mush as possible. Then also you'll realise that there are already a proliferation of applications that do this the relational method of one-entity-per-table.
Also, not many like the "restriction" of knowing all your access patterns up front, which is quite important before you even start.
This information is I think a pre-requisite before you allow anyone near a DynamoDB database!