Everything You’ve Always Wanted to Know About DynamoDB But Didn’t Know to Ask

#aws #database

What are the benefits to DynamoDB early in a project?

Instant setup and near instant ability to create new tables and add columns
No charge unless break out of the generous free tier
No need to pay attention to scaling — the relatively new pay per request billing mode handles everything
DynamoDB streams are very powerful compared to a relational DB’s triggers

What are the costs of DynamoDB early in a project?

Eventually consistent requires thinking about long term scaling problems immediately. You can use consistent reads but they are much slower and not available on global secondary indexes (GSI, an index other than the main key structure of your table).

So for instance say you do this query (written in PynamoDB):

comments = CommentModel.market_id_index.query(hash_key=market_id, 
attributes_to_get=[CommentModel.id,CommentModel.investible_id,                                                                                     
CommentModel.comment_type,CommentModel.created_by,                                                                                
CommentModel.updated_at,CommentModel.resolved,                                                                        
CommentModel.version,CommentModel.uploaded_files])

That won’t get you all the comments — just all the comments currently in the market_id_index GSI. Eventually all comments will be in that GSI but not until DynamoDB has propagated the latest changes to all nodes.

Using DynamoDB streams means you will early on tackle having asynchronous APIs. Plus streams are not actually FIFO but just FIFO per shard and shards are determined by your hash keys.
Speaking of keys you will need to choose your keys carefully. You get a hash key and a range key that must combine to something unique and random enough to guarantee even distribution across shards. Unlike a relational database you also want to maximize data stored in a table — so long as your keys are useful you can have your columns be sparsely populated. Then the table will “heat up” and so be warm for retrieval. See adjacency graph patterns.

What are the benefits to DynamoDB as your usage scales?

The simple SQL queries that were coded ignoring overall usage of the relational database and data growth of any one table will eventually require an entire operational team to simulate the out of the box behavior of DynamoDB.
The push for eventually consistent and asynchronous processing now pays off as you don’t have APIs promising synchronous results that increasingly you cannot deliver.
Upgrades are much easier as you don’t have to play tricks replicating data between database instances and unless DynamoDB transactions were used your tables are likely more independent. A SQL “join” is the opposite of a services based solution.
Your transactional and reporting databases are probably already separate so you don’t need to painstakingly split them. Using NoSQL and DynamoDB streams you most likely copied the data over into an entirely different reporting or search database.

We will be covering how Uclusion coded for eventual consistency and asynchronous APIs in later blogs.

Why is MySQL still so much more popular than DynamoDB?

See for instance this discussion, I believe that MySQL remains the devil developers know. Pay per request, transactions, encryption at rest and point in time recovery for DynamoDB only came out in 2018. Like a lot of AWS the decision to use DynamoDB relies as much on where the technology is going as where it is now. In this sense there is really no such thing as avoiding database vendor lock in — your choice of database technology will determine a lot of your application’s current architecture and your roadmap and your database’s roadmap will be on the same map.