Sebastian Zakłada 🧛

Posted on Nov 22, 2024 • Edited on Nov 27, 2024

DynamoDB and the Art of Knowing Your Limits 💥When Database Bites Back 🧛‍♂️

#dynamodb #aws #database #serverless

Sometimes, the best demonstration of DynamoDB expertise is knowing when not to use it

Time and again, developers reach for familiar but ill-suited technologies, driven by everything from career-building tech choices to corporate mandates to simple comfort with the known.

No service exemplifies this better than DynamoDB - while powerful in the right hands, it's no silver bullet, but unlike some 🧛‍♂️s it won't suck the life out of your application... if you use it wisely! Which means you not falling into the traps of:

Ad-Hoc Querying
Filtering
Joining Tables
Counting Items
Analytics
SQL

just to name a few...

⚠️ Spoiler alert ⚠️

Let me set the record straight about what I think about DynamoDB

It is a terrifyingly terrific tool.
You simply cannot go wrong with it as your primary data storage and main CRUD data source.
While DynamoDB is undoubtedly the unspoken 🐐 of the serverless world, it can be pretty terrible at tasks you might consider basic - like counting or filtering, just to name a few.
When someone tells you DynamoDB is good at everything, politely excuse yourself and come talk to me instead. I'll explain why that's far from true.

I love ❤️ DynamoDB

I've been using it for years now and cannot emphasize enough how much it's transformed the way I approach data in my service design. Pair it up with Lambda, wrap nicely with API Gateway and AppSync, and sprinkle some EventBridge magic dust – and you have a powerful, scalable, and cost-effective serverless system that just works.

It's also been a tough love. I've quickly realized that what makes DynamoDB so powerful is also why it can be super frustrating to work with when designing web or mobile applications.

If you've read my previous writeup (thank you!) you may remember how I said that Redshift was a freight train. Following that analogy, DynamoDB is like a Formula 1 🏎️ car - or better yet, a Formula E bolid, those are way WAY cooler! 😎 - it's super fast but also very limited in what it can do.

It moves at an incredible speed on a pre-planned circuit (known access patters)
Focuses on performance with minimal latency (consistently 🏎️💨💨💨)
Built for specific purpose while absolutely nailing it (highly focused)

BUT...

Not unlike F1 car, it's not-so-great at taking off-track excursions (struggles with unplanned query patterns)
Has no passenger seats and zero trunk cargo space (unable to JOIN with other data - no trips to the mall 🛒!)
One-way speed - want to run a simple "count all cars?" query? You have to drive the entire circuit (full table scan), count manually while paying fortune for fuel (read capacity units) while suddenly crawling at truck speeds

Eventually, trying to force DynamoDB beyond its comfort zone feels like pushing that race car through pit lane. No amount of tuning – tires, engines, or aerodynamics – can change its fundamental nature. Beyond basic CRUD operations, you will eventually outgrow DynamoDB intended use cases.

Which is perfectly fine 😄

The real challenges emerge when developers, unaware of system limitations, find themselves fighting against these constraints. This struggle leads to problematic implementations – and from there, little monsters 👾 are born in the code...

The classic case of database-as-swiss-army-knife immediately comes to mind. A developer thinks, "I have my data and SQL, so why not..." and proceeds to write a nightmare of a query. Convoluted and difficult to explain, unoptimized, slow, expensive and draining resources. Even worse, these solutions often pass initial testing (if tested at all... 😶‍🌫️) and only much later when faced with real-world scale, they crumble leading to significant production issues.

Day of the Tentacle, anyone...?

Not thinking ahead about 'boring' but crucial aspects of system design - scaling, high availability, cost efficiency - is a common sin in the developer community. Too many of us put our blinders on and do just what the Jira ticket tells us to do. Out of acceptance criteria, out of mind, let's bring next one in.

One of the reasons why tools like Lambda or DynamoDB are THE perfect building blocks and became so wildly popular is because they take care of many of those 'boring' aspects for you without having to even lift a finger. Both scale pretty much infinitely with your traffic, offer built-in resilience, and remain cost-effective to run.

With tools of the past it was super easy to mess things up - whether by applying the wrong pattern, using something for unintended purposes, or skipping entire architectural layers simply because business requirements didn't explicitly ask for them (yes, this happens more often than I'd like to admit). Can you mess things up with Lambda or DynamoDB? Sure you can! Long-running functions, badly designed keys, larger-than-life lambdaliths - these can still bite 🧛‍♂️ you. But even with these potential pitfalls, there is less room for error, leading to growth-ready apps even if they were created by less experienced teams.

Which I think is great!

DynamoDB takes an interesting approach to database design by intentionally making certain operations costly or cumbersome to steer developers away from anti-patterns. There is method to this apparent madness - by making inefficient patterns painful to implement, DynamoDB nudges teams toward thinking carefully about their access patterns upfront.

It's a clever bit of behavioral design that saves many applications from scalability problems down the road

Rather than giving developers enough rope to hang themselves with complex queries and full table scans, it guides them toward sustainable NoSQL patterns from day one.

The problem with people, and especially software engineers, is that we are a lazy bunch. And don't get me wrong - lazy can be good. It propels progress when approached with the right mindset. But it can also be a huge wall that you build around yourself, preventing from succeeding at anything.

The best in our field of work have mastered the art of productive laziness - it's practically an unofficial part of the job description for senior software engineers. This instinct to find the most efficient solution has given us everything from automations to reusable libraries, from informative CLI prompts to safe data repopulation scripts.

BUT (again)...

There's a crucial difference between the laziness that drives innovation ("I'll spend three hours automating a five-minute task") and the laziness that holds us back ("I'll just copy-paste this solution I don't fully understand"). The former pushes us to create better tools and elegant solutions, while the latter builds technical debt and limits our growth while introducing a real risk of an app imploding in production. Being lazy because you don't feel like learning is a very dangerous state of mind to be in, and with the rise of AI-assisted programming tools, I feel like many of us are slowly taking the "engineer" from the "software" and becoming a mindless copy-paste machine.

The real skill lies in recognizing which type of laziness we're embracing at any given moment

Are we cutting corners to avoid getting out of our comfort zone, or are we streamlining processes to focus on what truly matters?

Choosing the right tool for the job remains one of software engineering's fundamental challenges. Yet time and again, developers reach for familiar but ill-suited technologies, driven by everything from career-building tech choices to corporate mandates to simple comfort with the known. These misaligned choices eventually extract their toll - in maintenance nightmares, performance bottlenecks, inability to handle traffic surges and ultimately frustration stemming from countless hours spent wrestling with a tool that was never right to begin with.

It's the technological equivalent of using a sledgehammer to hang a picture frame - sure, it might work, but at what cost?

Don't be one of these guys...

Enough with the rant... 😄 Let's get back to the star of the show!

DynamoDB and Where it Falls Short

When implementing modern, data-driven applications, certain DynamoDB design choices can and will cause friction. Understanding these limitations is crucial for architects and developers alike making informed implementation decisions.

Problem 1: No Ad-Hoc Queries

DynamoDB is very stubborn and set in its ways 😄 taking a strict, opinionated approach to data access. While it's amazing at delivering blazing-fast queries, it comes with a significant constraint - you need to know exactly how you'll query your data before you even start. As it requires predefined access patterns through its keys and indexes, you typically cannot perform efficient ad-hoc queries outside these established paths.

For example, consider this SQL query (shown for illustration, as DynamoDB doesn't use SQL):

SELECT id, mpn, name FROM Projects WHERE mpn = '00638FOO'

If your table only has an id as the primary key and no secondary indexes on mpn, this query will require a full table scan — an operation that is both costly and inefficient for large datasets. To query efficiently, you would need to either redesign your primary key or create a secondary index or change your access patterns.

That sounds like a lot of work, doesn't it?

Problem 2: No Filtering (almost)

This problem stems from the previous one. While DynamoDB has support for filter parameters, these filters are applied after the data is retrieved from the table, not during the initial query. This means you still pay for the read capacity of all the data fetched before filtering!

For example:

SELECT * 
FROM Products 
WHERE category = 'Electronics' 
  AND price < 500 
  AND rating > 4

Unless you have an index specifically designed to support this access pattern, implementing such ad-hoc request in DynamoDB would result in:

retrieving all items
applying the filters
returning the filtered results

It's effectively the same as performing a full table scan and applying an in-memory filter on the entire result set

const filteredResult = items.filter((i) => 
  i.category === 'Electronics' 
  && i.price < 500 
  && i.rating > 4
)

This approach can become inefficient especially with large datasets or complex filtering requirements.

Problem 3: No Table Joins

DynamoDB doesn't support table joins like traditional relational databases. Each query can only access a single table at a time. For example, this type of SQL join operation is not supported.

SELECT P.name, O.quantity 
FROM Products P
INNER JOIN Orders O ON O.productId = P.id

Instead, you would need to either denormalize your data by embedding related information in a single JSON document or perform multiple separate queries in your application code to fetch and combine the data.

Again, this was a conscious design choice that AWS made to maintain consistent performance at scale, but it also means you'll need to carefully plan your data modeling to avoid the need for joins, which can significantly limit your flexibility, productivity and ultimately - time to market.

Problem 4: No analytics

Like nothing. Nada. Zero. None.

DynamoDB's analytical capabilities are severely restricted compared to traditional relational databases. It lacks fundamental analytical functions that are standard in SQL databases:

No COUNT
No aggregations or window functions
No GROUP BY
No subqueries, CTEs
No math functions
Frankly not much beyond simple CRUD operations is available

Even the simplest of use cases such as SELECT COUNT(*) becomes super complex and expensive in DynamoDB

Consider this basic SQL example

SELECT COUNT(*) 
FROM Products

While it's trivial in relational databases, it becomes an utter nightmare to develop and maintain if you want to support such scenario in DynamoDB. Take my word for it, there is countless of different examples available online discussing different ways to approach this requirement which in the DynamoDB world becomes surprisingly complex and resource intensive.

Problem 5: No SQL

There's something special about SQL that we've all grown to love over decades. It's not just about familiarity - it's about expressiveness, power, and simplicity that comes with it.

With DynamoDB, you're forced to learn and use a completely different query language and SDKs. Instead of a simple, readable SQL statement, you're now deep in the world of expressions, attribute values, and other verbose SDK constructs. Even with PartiQL (AWS's attempt at SQL "compatibility"), you're still heavily restricted - no JOINs, no COUNT, nothing that makes SQL... well, SQL.

The lack of SQL support isn't just about syntax - it's about losing decades of tooling, knowledge, and ecosystem that make data work easier and more productive.

Compare this simple query:

SELECT * 
FROM Users 
WHERE status = 'active' 
  AND lastLogin > '2024-01-01'

with its DynamoDB potential equivalent:

const params = {
  TableName: 'Users',
  KeyConditionExpression: '#status = :status',
  FilterExpression: '#lastLogin > :date',
  ExpressionAttributeNames: {
    '#status': 'status',
    '#lastLogin': 'lastLogin'
  },
  ExpressionAttributeValues: {
    ':status': 'active',
    ':date': '2024-01-01'
  }
};

Not exactly what you'd call developer-friendly, is it?

And this is just a simple query - imagine dealing with more complex needs. Sure, there are tools trying to make this better like dynamodb-toolbox by providing a more friendly API. While they are great at what they do - they make code look better and easier to work with - but at the same time they can't change DynamoDB fundamental capabilities. They're band-aids that make development less painful, but don't address the fundamental limitations.

Combined: Even more problems

Add all of the above together

const moreProblems = simpleQuery
  .createGlobalSecondaryIndex()
  .addCompositeKey()
  .denormalizeEverything()
  .sacrificeGoatToNoSQLGods()
  .pleaseJustWorkThisTime();

and you will quickly realize that any moderately complex data access needs turn into a massive engineering effort. Each of these limitations alone is manageable, but combined they create a perfect storm of complexity that can seriously slow down development. You'll find yourself writing tons of application code, managing complex data access patterns, and building intricate workarounds just to get basic functionality that comes out of the box with SQL databases.

Working with DynamoDB can be like planning a train network for a city. Sure, it's great for the routes you planned, but what happens when your needs evolve? Rigid foundation doesn't play well with the needs of modern software development, which prioritizes agility and rapid iteration cycles.

Mini Metro is such a terrific and cleverly designed game!

Any change may turn into major engineering effort, like trying to retrofit a new line through downtown just because you didn't predict today's traffic patterns years ago.

Good luck explaining that to the affected homeowners... 😄

Why Not Just Use a SQL Database Then?

Fair question! Thanks for asking!

For many use cases, you absolutely should. But as with many things in this world - you gain something and lose something with each choice you make. While traditional relational databases such as PostgreSQL give you the power of querying flexibility, they also have their own limitations and best use-case applications.

DynamoDB shines in specific scenarios - when you need consistent single-digit millisecond performance at any scale, with minimal operational overhead. It's particularly great for known access patterns like user profiles, session management, or real-time game states.

The issue arises when your application grows and you need both: the rock-solid performance of DynamoDB for your core operations AND the flexibility of SQL for analytics, complex queries and rapid feature development.

The DynamoDB Tunnel Vision

Consider yourself spending months or years working with DynamoDB and suddenly seeing everything through NoSQL-tinted glasses. Every new project becomes a nail for your perfectly-tuned DynamoDB hammer. "Of course we'll use DynamoDB - we always use DynamoDB!" becomes your unofficial motto.

This tunnel vision is particularly dangerous because it often comes from a place of genuine expertise and success. You've mastered the intricacies of partition keys, you can design access patterns in your sleep, and you've successfully launched multiple DynamoDB-powered applications. Your comfort zone has become a fortress, and stepping outside feels unnecessary, even risky.

The symptoms are subtle but telling

You start force-fitting complex analytical queries into DynamoDB, creating elaborate workarounds that would be otherwise trivial
Your application code grows increasingly complex as you build client-side solutions for basic database operations
You find yourself defending DynamoDB's limitations as "features" that force better design (while spending hours implementing basic counting functionality)
The phrase "we'll just denormalize the data" becomes your default solution to every data modeling challenge

The dangerous part is that this kind of thinking can feel like you're just being really good at your job. "I know DynamoDB's limitations!" you might say, "and I can work around them!"

But there's a fine line between working around limitations and fighting against the fundamental nature of your tools

Remember our Formula 1 car analogy? This is exactly like insisting on using your racing car for grocery shopping, daily commutes, and moving furniture, then spending countless hours engineering elaborate solutions to make it work. Do you really think strapping a cargo container to your F1 car is a good idea? 😅

The real danger isn't just the technical debt you're accumulating - it's the cost of not exploring better solutions. While you're busy implementing complex DynamoDB workarounds, you could be leveraging purpose-built tools that solve your problems out of the box. That analytics query that took you three days to optimize in DynamoDB? It could have been a simple SQL statement in Tinybird.

Breaking free from this mindset requires humility and a willingness to admit that sometimes, just sometimes, your favorite tool isn't the right one for the job. It means acknowledging that expertise in one technology shouldn't limit your architectural choices. Most importantly, it means remembering that our job isn't to use DynamoDB

Our job is to solve problems effectively

The next time you find yourself reflexively reaching for DynamoDB, pause and ask:

Am I choosing this because it's the best tool for the job, or because it's the tool I know best?
What would this solution look like with a different database?
Does this solution need a database at all (S3, anyone?)
Am I building workarounds for functionality that comes standard in other databases?
Is my comfort with DynamoDB blinding me to better alternatives?

Sometimes, the best demonstration of DynamoDB expertise is knowing when not to use it.

(to be continued...)

Have you faced similar challenges with DynamoDB? How did you solve them? Share your experiences in the comments!

Next Up

The landscape of data engineering has evolved. Tools like OpenSearch (or Algolia), Redshift (or Databricks), Tinybird (or Rockset... oh, wait) have emerged to bridge the gap between purpose-built databases.

In the upcoming article I will explore how we can leverage these specialized tools alongside DynamoDB to build robust, scalable systems without breaking the bank and spending months implementing features.

Time-to-Market is king! 👑

Disclaimer

This article is an independent developer guide. All views, rants and recommendations expressed are my own.

No databases were harmed in the making of this article. Just some egos and a few AWS accounts over the years of me using DynamoDB.

I took this article as an excuse to dive into my beloved World of Words 😄

Gosh, I never realized how much I missed writing...

Top comments (4)

Seth Orell • Nov 27 '24

You make a great point about the tradeoffs of time-to-market vs. I-may-need-hyper-scaling-someday. Question everything.

I never realized how much I missed writing...

I am glad you took the time to write this. I look forward to more.

Sebastian Zakłada 🧛 • Nov 27 '24

Hey @setho, long time no see! 🥁 Thanks for reading through this wall of text 😜 I promise to use fewer words eventually... though the expected latency for reaching that state may be a bit on the longer side 😜

Mahdi Azarboon • Nov 27 '24

Very good post. Thank you. Im going to follow you. Please continue writing such posts.