DEV Community: Uriel Bitton

Using Conditional Writes to Enforce Business Logic in DynamoDB

Uriel Bitton — Fri, 23 May 2025 12:23:21 +0000

One of the most underrated features in DynamoDB is conditional writes.

Most users treat DynamoDB as a simple key-value datastore: put an item in, get an item out.

But DynamoDB has a wealth of powerful features to support most modern workloads.

One of these is to enforce business logic directly at the database level (in an atomic operation) without needing extra read operations or middleware logic.

This does a lot for performance and consistency in your database.

Let’s break this down.

What are conditional writes?

In DynamoDB, you can use a ConditionExpression within the following write operations:

PutItem
UpdateItem
DeleteItem

This tells DynamoDB to only perform the write if the given condition is met. Otherwise the write is rejected.

This acts as a logic gate for writes.

Imagine the following example: you want to create a user with a unique username.

Usually to do this, you would need to have a read to check if the username exists and then do a write if it doesn’t (and return an error if it does).

But with conditional writes, you can skip the read entirely.

Here’s the code in a DynamoDB PutItem:

await dynamodbClient.send(new PutItemCommand({
  TableName: "users",
  Item: marshall({
     username: "jason123",
     email: "jason@gmail.com",
  }),
  ConditionExpression: 'attribute_not_exists(username)'
}))

This write of a new user will only succeed if the username value does not exist in the table, effeciently enforcing uniqueness with one atomic operation.

Some use cases for conditional writes

Some useful use cases for conditional writes can be:

preventing duplicate orders
making sure a user doesn’t apply a coupon twice
enforce account limits (e.g. max 5 devices per user)
avoiding overwrites in race conditions

These are all business logic constraints that commonly appear in applications.

Conditional writes allow you to support these constraints with minimal latency and complexity.

Handling conditional failures

If a condition fails, DynamoDB will throw a ConditionalCheckFailedException.

You can catch this error and return a more readable message to the user like:

“This coupon has already been used”

Remember, a conditional fail is expected behaviour, not a bug. So display it as the normal flow when an item already exists.

Summary

Conditional Expressions in DynamoDB are powerful and keep conditional writes more efficient.

They let you enforce rules like uniqeness, resource limits, idempotency and concurrency control all at the database write layer with full atomicity.

Whenever you need conditional writes in your application, verify if the condition can instead be done on your DynamoDB query before.

👋 My name is Uriel Bitton and I’m committed to helping you master AWS, serverless and DynamoDB.

🚀 Build the database your business needs with DynamoDB — subscribe to my email newsletter The Serverless Spotlight.

Thanks for reading and see you in the next one!

AWS Lambda Best Practices For Performant & Scalable Serverless Functions

Uriel Bitton — Tue, 13 May 2025 14:36:56 +0000

Serverless functions help you build powerful systems quickly and at a low initial cost.

They are versatile in that they can fit small MVP workloads and large scalable enterprise systems.

However, like any other tool, they must be used properly and with best practices.

I’ve been building backend systems, for large and small companies, for over 6 years now and I’ll share with you some best practices and tips I’ve learned along the way.

Use this article as a guide for every time you create a new Lambda function, to make sure it follows the best practices in this list.

1. Memory & Timeout

The first thing you should be configuring as soon as you create a new Lambda function is the memory and timeout settings.

These settings can be found in the General Configuration tab under the main Conguration tab.

99.9% of your functions should have at least 500MB of memory. Anything lower than that will usually take longer than needed to run or timeout.

The sweet spot for most functions is between 500MB and 1GB. If you have longer running workloads you can bring that up to 2–3GB.

For the timeout, set a default of 3–5 seconds, depending on how long your functions tend to run. Monitor the timeout for your functions in CloudWatch and determine the best timeout for each function.

That will optimize costs and performance.

2. Permissions

Your Lambda function should respect the least privilege principle. In other words, it should be overly permissive yet have access to the services it communicates with.

Rather than having an IAM role like “admin” (read/write any AWS service), or a role which has access to several different services, each function should have a role for the one thing it performs.

For example, if your function is responsible for fetching items from DynamoDB, it should possess only a DynamoDB query role. Not a DynamoDB full access, read or read+write.

The tighter the permissions control is on your Lambdas, the less room there is for error.

3. Concurrency

Concurrency dictates how many times your Lambda function can be invoked at the same time.

For higher traffic usage, you may need a higher amount.

In the Configurations settings under concurrency and recursion detection you can set the concurrency for each function individually.

The way it works is your account has a total quota of concurrency for all the Lambdas in it.

Think of it as a pool of concurrency. If you need a certain amount of concurrency for a popular function, you can reserve 100 concurrency units for it.

This will guarantee the function can always support no less and no more than 100 concurrent executions.

This is what reserved concurrency means.

If you leave the default (unreserved concurrency) this lets the function take as much concurrency as it needs without regards to other functions.

So imagine you have a total of 100 account concurrency and you have 1 function using up 100 invocations at the same time, your other functions won’t be able to run as they will have no concurrency available to them to run.

Setting reserved concurrency will guarantee that the function to never be starved of concurrency.

4. Versions

Lambda versions let you publish immutable “snapshots” of your function code and configurations.

This is particularly useful when you need to perform rollbacks or AB testing of your function.

A best practice with versions is to publish a new version after every stable code change and use aliases like dev, satging, prod, to manage deployment stages safely.

5. Triggers

Triggers come in handy when working with other AWS services like DynamoDB or S3.

If you need to react to changes in S3 or DynamoDB and execute code with your Lambda function in response, triggers is what you need to use.

For example, you can setup a DynamoDB or S3 trigger on your Lambda to do things like:

calculate data in one table and write it to another DynamoDB table
update an external search index whenever new data is written to my DynamoDB table
Write a new item to my DynamoDB table every time a file is uploaded to S3

The process is simple: add a trigger on the Lambda function, configure the trigger, write your Lambda code to capture the upstream data and finally push that data to a downstream service.

6. Environment Variables

Environment variables are useful and important to set for hidden values and configuration variables.

Typically, they are used to store values such as:

S3 bucket names
DynamoDB table names
API base URLs
Config keys & non-sensitive credentials
feature toggles

In the screenshot above, you can see I use them specifically for a Algolia integration. I store my app id as an environment variable as well as the index name which is a configuration value.

To add environment variables, select the Configuration tab and select environment variables in the left sidebar. From there you can add your variables.

It is important to note not to use environment variables for secret keys or sensitive values. Those should be securely stored using AWS Secrets Manager.

Conclusion

In this article we went through several best practices of using AWS Lambda. These include configuring memory and timeout, concurrency, working with permissions and versions as well as setting triggers and environment variables.

By following these best practices you can make sure your Lambda functions are secure, efficient and production-ready.

Keep this guide as a checklist whenever you build or update your serverless functions.

👋 My name is Uriel Bitton and I’m committed to helping you master AWS, serverless and DynamoDB.

🚀 Build the database your business needs with DynamoDB — subscribe to my email newsletter The Serverless Spotlight.

Thanks for reading and see you in the next one!

How to Think in Access Patterns First With DynamoDB

Uriel Bitton — Tue, 06 May 2025 14:34:17 +0000

If you’ve spent years building apps on relational databases, switching to DynamoDB can feel very strange.

I often tell folks to “forget everything you know” about relational databases.

Designing data on DynamoDB requires a very different mindset.

The biggest shift is that you don’t start with your data structure, you start with your access patterns.

Understanding this concept will catapult you ahead of most folks building with DynamoDB.

It’s a mindshift from “what data do I have?” to “how will users access my data”.

Let’s break this concept in this article.

SQL vs. DynamoDB: The Mental Model Shift

In SQL, you define your schema first and normalize data into tables:

users
posts
comments
likes
shares

You then query across tables using joins to fetch the data you need. This flexibility is simultaneously SQL’s strength and performance bottleneck.

DynamoDB is drastically different.

You have to think differently.

You start with your use cases:

How your application will write/read data.

“What are the questions your application needs to answer”?

Get all posts by a user within a specified date range
fetch orders made by a user this month
show a feed of user’s posts sorted by date published
Get comments on a given post.

Each one of these are access patterns and you model your table around these patterns specifically.

Think in access patterns instead of queries

Instead of starting with:

“What tables do I need”?

Ask yourself:

“What queries will my app need to support for its users (at scale and performance)”

For example, imagine you are building a marketplace application.

Your app needs to:

Show all listings by a seller
Fetch a listing by its ID
Display orders for a buyer
Show order status by orderID

Those are your app’s main access pattern and each one needs to be efficient and satisfied with a single query operation.

Using the single table design pattern

DynamoDB’s performance comes (in part) from avoiding joins and sticking to primary key lookups.

This is why the single table design is the recommended approach.

In this design strategy, you store different entity types (e.g. users, posts, comments, likes) in the same table.

Each entity item is differentiated by a “PK” and “SK” composite key format. For example:

PK        | SK
-------------------------
user#101  | profile
user#101  | post#201
user#101  | order#303

This data model makes it easy to query the related data on your table with a single query.

You can also avoid joins and multiple queries.

If you’re curious about the single table design, I recommend this article.

How to get started with DynamoDB Design

Here’s a breakdown on getting started with your first DynamoDB database design process:

List every access pattern your app needs. (Don’t think about tables, just access patterns).
Group them by common entity or user context.
For each one, decide the ideal PK/SK structure that makes queries simple and efficient.
Identify where you’ll need GSIs to support alternate lookups. (For each access pattern add a “table” or “GSI1” in parentheses after it)
Once you’ve done all that, create a single table and store all your data on it. (read this article on how to do this)

Conclusion

Working with DynamoDB requires a large mindset shift.

Forget what you know about SQL and learn to embrace access patterns first. This will help you build systems that are faster, cheaper, and scale effortlessly.

The shift isn’t always easy, but it’s worth it and gets easier the more you work with DynamoDB.

👋 My name is Uriel Bitton and I’m committed to helping you master AWS, serverless and DynamoDB.

🚀 Build the database your business needs with DynamoDB — subscribe to my email newsletter The Serverless Spotlight

Thanks for reading and see you in the next one!

How I Migrated An SQL Database To DynamoDB

Uriel Bitton — Wed, 16 Apr 2025 12:40:15 +0000

Migrating an SQL database to NoSQL is always challenging.

SQL models relational data while NoSQL can be designed for a very wide variety of data storage use cases.

I recently had to migrate a legal database from MySQL to Amazon key-value database, DynamoDB.

Amidst the challenges, the migration also forced me to make some tradeoff decisions to better adapt to DynamoDB’s scalable architecture.

I’ll break down the migration process into the following three elements:

Understanding the use cases and access patterns

Understanding the relationships between data
Choosing the best tradeoff with the existing constraints
Let me take you through how each one posed its own challenge and what I did resolve them, in the hopes to offer a broad guideline if you have a similar migration workload.

Understanding Use Cases & Access Patterns

Since I was migrating the data to DynamoDB, the first step was to understand the nature of the application and its access patterns.

The group of lawyers had various types of data stored on their MySQL database.

Some of this data included:

-customers data
-dossiers data
-meetings data
-transactions data

The main data access patterns were the following:

-Getting a customer’s dossier(s)
-Getting customers with a given status (in progress, accepted, rejected)
-Getting all cases in the current month
-Getting invoices for a given customers (by month/year).
-Getting bill payments (transactions) for a given month

There were a few more, but for the sake of brevity I’ll focus on these main ones.

Understanding the relationships between data

With MySQL, the database fetches would make a few joins to query customers with dossiers, customers with meetings, and customers with transactions (for billing puposes).

When a new customer’s dossier was created by a lawyer, a customer record, a dossier and some initial invoicing data is created initially.

Then as the lawyer meets with the customer, some meeting records get added (these would have the total duration and hourly rate, amongst some other metadata).

If there was a court case (quite frequent), a case record was created with additional invoices (to be added as transactions when paid).

Design the Data For DynamoDB

To migrate the data from MySQL to DynamoDB, I started with carefully designing a data model based on the access patterns identified.

Getting a customer’s dossier(s)

All customer dossier items in DynamoDB were transformed to have the following partition key (pk) and sort key (sk):

pk: “customer#101#dossiers”,
sk: “dossier#2025-01-01#201”,
entityType: “dossier”

In MySQL, a dossier record had a foreign key with the customer’s ID to query for dossiers belonging to that customer.

In DynamoDB, I created an item collection — customer##dossiers — for all dossiers items for that customer.

The sort key was defined as the date prefix (to sort dossiers by date created) and suffixed with a random uuid.

(note: for userIds and dossierIds i’m using plain numbers like 101, 201, for simplicity. I do the same for the rest of the data below).

Getting customers with a given status
It was important for my client to be able to filter their customers by status. Each customer would have a status defined based on whether their dossier was accepted, rejected or in progress.

In MySQL this was a plain value. Here’s how I designed it in DynamoDB.

I created a GSI with attributes to satisfy this access pattern:

pk: “customer#101#dossiers”,
sk: “dossier#2025-01-01#201”,
entityType: “dossier”,
GSI1PK: “dossiers#accepted”,
GSI1SK: “dossier#2025-01-01#201”

When creating a dossier item, I had to add the two attributes “GSI1PK” and “GSI1SK” — the first had the value of “dossier#” and the GSI1SK would keep the same value as the base table’s “sk” value.

Using this GSI, I could easily get all dossiers that were of a given status by passing in the status into the “GSI1PK” value.

Getting all cases in the current month

Cases were the less frequently written items to the database since these involved the lawyers going to court with their customers.

As was the case, it was safe to store all case items in a partition prefixed by the current year.

Here’s the primary key design:

pk: “cases#2025”,
sk: “case#2025-01-01#301”,
entityType: “cases”
Getting invoices for a given customers.
Invoice items are part of the customer item collection.

Here’s how they are represented in the new database:

pk: “customer#101#invoices”,
sk: “invoice#2025-01-01#401”,
entityType: “invoices”

Getting bill payments for a given month

Bill payment records follow the same model as invoices. They are partitioned by customer since it is the customers that make the payments.

pk: “customer#101#payments”,
sk: “payment#2025-01-01#501”,
entityType: “payments”

Choosing the best tradeoffs

The main tradeoff was sacrificing some simplicity of join operations in MySQL with more effort on the database item primary key design.

This involved more complex partition and sort key data modeling to “pre-join” items of different entities.

For example, one access pattern would fetch a customer item, their invoices for the current month as well as any payments associated with it.

Other tradeoffs included replicating items versus normalizing data.

Much of the data that was normalized in the SQL database had to be denormalized and sometimes duplicated in DynamoDB.

For example, if two lawyers worked on the same case, a case item had to be duplicated, one for each lawyer. This was designed as such to enable the access pattern where a lawyer could retrieve all cases assigned to them.`

The Migration Process

The migration went mostly smooth after a lot of preparation.

The data model for each record was carefully planned before the migration.

Most of the normalized data was denormalized before as well.

For the actual migration, I wrote one script per table.

I scanned each table, read each item and wrote it to my single DynamoDB table, adding the primary keys, and other attributes.

For the data that was normalized, I had to take that into consideration and create additional items to DynamoDB.

Once all scripts ran successfully, I had some small modifications to do to make sure all items were consistent.

Then came the post-migration, which required me to create serverless functions with AWS Lambda to satisfy the main access patterns to write new data to the DynamoDB table.

The rest of that process can make for an interesting follow up article…

Conclusion

Migrating my client’s database from SQL to DynamoDB required careful planning and tradeoff decisions to adapt to DynamoDB’s context.

While the process involved tradeoffs like denormalizing data and designing complex primary key structures, it ultimately allowed for much more scalable and efficient data access patterns.

The migration taught me a lot about how to think in NoSQL versus SQL and how data modeling in both systems involves different thinking and satisfying different problems.

👋 My name is Uriel Bitton and I’m committed to helping you master AWS, serverless and DynamoDB.

🚀 Build the database your business needs with DynamoDB — subscribe to my email newsletter Excelling With DynamoDB.

Thanks for reading and see you in the next one!

What Is Split For Heat In Amazon DynamoDB?

Uriel Bitton — Fri, 11 Apr 2025 13:21:59 +0000

It is well known that Amazon DynamoDB adapts to workloads of any scale while offering consistent performance.

But a lesser-known concept with its adaptive capacity features is “split for heat”.

To understand what “split for heat” is we must first understand the reason behind it.

In DynamoDB, your data is written to partitions on AWS’s storage servers.

While a partition has a physical size limit, you as a customer, are not concerned with that nor will it affect you in any way. This is because DynamoDB will automatically manage partitions for you, creating new ones as your dataset grows.

However, even though size limits is not an issue, hot partitions surely is one.

A hot partition is a partition that is consistently receiving high traffic. This can impact read and write performance to and from that partition (when reading or writing items stored with that partition key).

The standard prevention for a hot partition is high partition key cardinality.

But a hot partition can still occur when you have an overly popular partition.

This is where DynamoDB applies a strategy called “split for heat”.

What is Split For Heat?

Split for heat is a mechanism where DynamoDB detects a partition that is receiving high traffic and automatically splits it into two smaller partitions.

This helps reduce throttling and alleviates the performance impacts of reads and writes to that partition as you now have double the throughput available for that data (since it’s split into two).

This is better than traditional sharding as it requires no manual intervention. DynamoDB will also determine when it is best to split it based on usage patterns.

When splitting a partition, DynamoDB will distribute the items based on their sort key. This effectively doubles the available read and write throughput.

How Does Split For Heat Work?

DynamoDB will monitor for read and write traffic to its partitions. When a partition is identified as receiving sustained high traffic, it is marked as a candidate for splitting.
The partition is split into two, with items distributed according to their sort key. Each new partition gets around half of the original items.
Once the partition has been split, the total read/write capacity doubles, preventing throttling.
While DynamoDB provides no notification when a split occurs, users will notice the reduction in throttling and better overall request handling.

Best Practices to maximize the benefits of Split For Heat

Some best practices and design considerations:

Design partition keys carefully: even though DynamoDB can adapt to hot partitions, it’s always good to choose high cardinality partition keys to distribute writes evenly.
Avoid time-based sort keys for high traffic items: if items in a partition have an ever-increasing sort key (e.g. timestamps), they will always be directed to a single partition, making the split for heat ineffective. Consider a more randomized sort key design in this case.
Monitor and adjust: use Amazon CloudWatch to track read and write capacity usage and identify hot partitions before they impact your system’s performance.

Conclusion

Split for heat is one of DynamoDB’s most valuable features for managing unpredictable traffic use cases.

While proper data modeling is crucial to prevent hot partitions, DynamoDB’s adaptive capacity features like split for heat make sure that even the highest traffic workloads don’t impact your system’s performance.

👋 My name is Uriel Bitton and I’m committed to helping you master AWS, serverless and DynamoDB.

🚀 Build the database your business needs with DynamoDB — subscribe to my email newsletter Excelling With DynamoDB.

☕️ Need help with DynamoDB? Book a 1:1 with me here.

Thanks for reading and see you in the next one!

Designing Time-Series Data In DynamoDB

Uriel Bitton — Sun, 06 Apr 2025 13:50:49 +0000

DynamoDB is excellent at efficiently storing and querying time-series data.

However, the efficiency remains entirely dependent on your ability to model your data properly to support the access patterns required.

Improper data models can lead to higher costs and performance issues.

Let’s explore some best practices and data models for storing and querying time-series data in DynamoDB.

Designing a table for time-series data

Choosing the right primary keys
Choosing the right partition key is essential for time-series data (and for any type of data in fact).

A common design pattern is to prefix an identifier to the partition key and a high level time component. Here’s an example:

pk: device#123#2025-03-21

We add the “device” keyword followed by the device ID and then the date (without the time).

The sort key should be a timestamp (in standardized ISO 8601 format) of the event to enable efficient range queries. Here’s an example:

sk: time#2025-03-21T12:30:00Z

So each reading from our device would be stored in this key structure.

Querying time-series data efficiently

To fetch data for a specific time range, we can use the BETWEEN query method (rather than using inefficient Scan methods).

Say we want to retrieve all readings from device 123 on March 21st 2025. We can run the following query:

TableName: "devices",
KeyConditionExpression: "pk = :pk AND sk BETWEEN :start AND :end",
ExpressionAttributeValues: {
    ":pk": "device#123#2025-03-21",
    ":start": "time#2025-03-21T00:00:00Z",
    ":end": "time#2025-03-21T23:59:59Z"
  }

By extending the start and end date we can expand our filter to whatever range we need.

Handling high-traffic time-series workloads

Partition Splitting to avoid hot keys

A single partition can handle a total of 3000 read units and 1000 write units per second.

If your time-series data has a high ingestion rate you can run into hot partitions.

The solution for hot partitions is typically to shard the keys by more specific time components.

For example, if you experience the “day” partitions experience heat, you can shard them by hour or minute.

Rather than the partition key being “device#123#2025–03–21”, it can become “device#123#2025–03–21T12:00”.

Now your data for that deviceID is partitioned by hour, reducing the risk of a hot partition.

Expiring Old Data
Time-series data often becomes obsolete after a certain amount of time.

This is where using TTL is beneficial. Each item should have a TTL attribute so that it can automatically be removed by DynamoDB’s system.

This optimizes storage costs and space on your table.

Storing Aggregated Data
Finally, if you often query summaries or aggregate totals, you can pre-aggregate data into a single item and query this item instead more efficiently.

For example, instead of storing every event, you can store hourly or daily aggregation values with minimum, maximum and average values.

To achieve this, you can use DynamoDB Streams. Every time a new event is recorded, use the stream to update a “count” value.

Summary

Time-series data is often a great fit for DynamoDB, but as always efficient data modeling is critical.

By designing your data to accommodate for BETWEEN query methods, you can retrieve time-series data in ranges very efficiently.

Implementing these best practices mentioned will make sure you reduce query and storage costs and ensure high performance, low latency, and high scalability.

👋 My name is Uriel Bitton and I’m committed to helping you master AWS, serverless and DynamoDB.

🚀 Build the database your business needs with DynamoDB — subscribe to my email newsletter Excelling With DynamoDB.

Thanks for reading and see you in the next one!

How I Migrate My Clients’ SQL Databases To DynamoDB

Uriel Bitton — Thu, 20 Mar 2025 14:06:16 +0000

Migrating from a SQL database to DynamoDB is not a one-size-fits-all decision.

It requires a deep understanding of access patterns, data modeling techniques, and costs. I only recommend this migration when there are clear performance or cost benefits.

Here’s the structured approach I take when helping clients transition from SQL to DynamoDB.

1. I Define Access Patterns First

In SQL, data is normalized into multiple tables with relationships managed with foreign keys.

However, in DynamoDB, the design is actually based on how the application queries and writes data.

Before starting any migration, I like to list out every access pattern on the application. In other words, how data will be read and written.

This step ensures that the database structure supports efficient queries without increases in cost or latency.

2. I Choose the Right Primary Keys

DynamoDB does not support traditional SQL joins.

Instead, it relies on partition and sort keys. The selection of these keys directly impacts performance and scalability.

A poor key selection can lead to hot partitions, causing slowdowns and increased costs.

Because of this, I spend the most amount of time in the migration/design process defining my primary keys to ensure an efficient and scalable data model.

3. I Denormalize Data Where Needed

Unlike SQL, where data is spread across multiple tables and joined together at query time, DynamoDB takes advantage of denormalization instead.

This means I’ll embed related data into a single item to reduce the number of read operations, which means duplicating data when necessary.

By doing so, queries become faster and more efficient, avoiding expensive join-like operations in NoSQL; the primary latency reasons for SQL databases.

4. I Use Global Secondary Indexes (GSIs) Instead of SQL Indexes

SQL databases allow multiple indexes, but DynamoDB takes a different approach with GSIs.

If an application requires querying data based on different attributes, I strategically implement GSIs.

To optimize costs and performance, I apply key overloading, which allows me to use a single index for multiple query patterns rather than creating multiple GSIs.

5. I Handle Transactions Differently

DynamoDB supports transactions, but they come at twice the cost in terms of read and write capacity units.

Instead of relying on transactions as frequently as in SQL, I use a single-table design with transactional writes only when necessary.

This keeps operations efficient while maintaining data consistency.

6. I Test and Optimize Queries

Every access pattern must be tested to ensure that the new database performs efficiently.

I evaluate query performance, monitor read and write capacity usage, and optimize indexing strategies before fully transitioning to DynamoDB.

Sometimes I may have to change a primary key design or a GSI to support a better access pattern.

Other times, my clients may present me a revised or improved access pattern and I will then have to adjust my data model accordingly.

Once everything is accounted for, it is time for the actual migration process.

My Preferred Migration Strategy: Dual-Write Approach

To ensure a smooth transition with minimal downtime, I typically implement a dual-write strategy.

Here’s the step-by-step flow:

Both the SQL database and DynamoDB receive new data writes.
Reads continue to be served from the SQL database.
Once the DynamoDB setup is validated and optimized, I’ll start cutting over read operations to it.
After confirming stable performance (may take a few days to a few weeks), the SQL database is disconnected.
DynamoDB becomes the primary database

This gradual cutover method makes sure the migration is as smooth as possible while allowing time to validate data consistency and query performance.

Conclusion

Migrating from SQL to DynamoDB is not just about moving data; it requires, as I often say “a shift in mindset”.

SQL focuses on normalization and relationships, whereas DynamoDB is built for scalability and optimized query performance.

By carefully defining access patterns, choosing the right keys, denormalizing where necessary, using GSIs, and implementing an effective migration strategy, you can achieve a smooth and cost-efficient transition from any database to DynamoDB.

Why DynamoDB Doesn’t Let You Write A Bad Query

Uriel Bitton — Wed, 12 Mar 2025 14:49:03 +0000

You’ve probably read this somewhere.

“DynamoDB doesn’t let you write a bad query”.

But what does it mean and how does DynamoDB accomplish this?

To understand this we have to get into how DynamoDB works under the hood.

DynamoDB Data Structure

DynamoDB is a distributed and fully managed NoSQL database that lets you easily scale.

But how does it let you, the user, scale your database?

Under the hood, DynamoDB stores your data in multiple servers called partitions.

Your data is distributed across partitions, with the item’s partition key dictating the partition (using a hash function) where that item will be stored.

With this architecture, a query you make to a partition (using a partition key) will be processed in O(1) time complexity.

So no matter how you query with a partition key, the query will always be efficient (can’t make a bad query).

That’s fine for partition key queries, but what about composite key queries; when you provide a sort key?

Well, inside each partition, DynamoDB stores your items as a B-tree data structure. Think of this as an upside down tree where the root is the partition and all items inside it are leafs.

As you add items to that partition, they are stored in an alphanumerically sorted order.

For example, “ha” will be at the top of the structure, followed by “har”, then “hard”, “harm” and “harsh”. After all “ha…” items have been stored, items that start with “he…” will be stored (refer to screenshot above).

Here’s what this means in terms of efficiency:

Writing items to this partition will be done in worst case O(log n) — pretty fast.

Reading data from this partition will be done in the same time complexity (O(log n)).
From a high level overview, this is how data is stored in DynamoDB.

How DynamoDB doesn’t let you write a bad query

So why does DynamoDB prevent you from writing bad or inefficient queries?

It does this based on the data structure and some “clever limitations” it places around querying and writing.

First, any single item write or read you make will remain efficient due to the data structure, as these operations would take O(log n) no matter how many items there are in the partition.

Second, a query has a limit result set of 1mb. This limitation is designed so that the read latency stays low.

While other NoSQL databases have a much higher limit, DynamoDB places a relatively low limit to keep queries fast.

Additionally, for multiple item reads and writes, the partition key will greatly limit the search zone of your query.

Similarly, the sort key is as well limited to a few query methods only, again to ensure efficiency and low latency.

These query methods are:

equality operators (<, < =, >, > = and =): this lets you query for number or string values that are greater alphabetically or numerically than another value. (e.g. 5 > 7 or “ha” greater than “he”).
begins_with() method: this lets you say “get me all items whose sort key begins with the substring x”. (e.g. begins_with(“ha”) will return “hard”, “harm”, “harsh”, etc.
BETWEEN method: This lets you say “get me all items whose sort keys are between x and y”. (e.g. sort key BETWEEN “ha” and “hi”, will return “hard”, “harm”, “harsh”, “hello”, “hey”, until “hi”.

Now here’s why these query methods remain efficient even if your database contains billions of items.

Notice how these methods respect the B-tree data structure.

Think of it as a dictionary. If you want to get all words that begin with the word “ha”, you could do it easily (and efficiently).

You’d identify the “h” section (the partition) and find “ha”. From there you can sequentially trace through every word until you reach “he…”. (that’s the begins_with() method)

You can also simulate the BETWEEN method with the dictionary approach. You can trace through every word that starts with “ha…” and ends with “hi”.

However, in DynamoDB you can make a query in any other way such as saying “get me all items that contain a particular string like ‘er’”, just like with a dictionary you wouldn’t be able to find all words that contain the letters “er”.

Or you might be able to but that would take you a really long time and require a full dictionary search.

This is the strategy DynamoDB uses to make your queries scalable, fast and efficient.

Now lets say you tried to design a bad, inefficient database structure and attempted to make a bad query, here’s what would happen.

The query would still require a single partition (efficient).
You can choose to specify a sort key but can only run it on sequential data, which is fast.

If all items are in one partition and its a large partition, the query result would be limited to 1mb (wouldn't take too long to fetch).

So no matter how you query your “badly designed” table, your query would remain relatively fast.

Conclusion

DynamoDB’s architecture and query limitations are designed to enforce efficiency, and make sure that even a poorly designed schema won’t result in an inefficient query.

Through the use of partitions, B-tree structures, and a limited set of query methods, DynamoDB guarantees predictable performance, preventing you from writing a “bad” query.

What Is Split For Heat In Amazon DynamoDB?

Uriel Bitton — Sat, 08 Mar 2025 00:05:41 +0000

It is well known that Amazon DynamoDB adapts to workloads of any scale while offering consistent performance.

But a lesser-known concept with its adaptive capacity features is "split for heat".

To understand what "split for heat" is we must first understand the reason behind it.

In DynamoDB, your data is written to partitions on AWS's storage servers.

While a partition has a physical size limit, you as a customer, are not concerned with that nor will it affect you in any way.
This is because DynamoDB will automatically manage partitions for you, creating new ones as your dataset grows.

However, even though size limits is not an issue, hot partitions surely is one.

The standard prevention for a hot partition is high partition key cardinality.

But a hot partition can still occur when you have an overly popular partition.

This is where DynamoDB applies a strategy called "split for heat".

What is Split For Heat?

Split for heat is a mechanism where DynamoDB detects a partition that is receiving high traffic and automatically splits it into two smaller partitions.

This helps reduce throttling and alleviates the performance impacts of reads and writes to that partition as you now have double the throughput available for that data (since it's split into two).
This is better than traditional sharding as it requires no manual intervention. DynamoDB will also determine when it is best to split it based on usage patterns.

When splitting a partition, DynamoDB will distribute the items based on their sort key. This effectively doubles the available read and write throughput.

How Does Split For Heat Work?

DynamoDB will monitor for read and write traffic to its partitions. When a partition is identified as receiving sustained high traffic, it is marked as a candidate for splitting.

The partition is split into two, with items distributed according to their sort key. Each new partition gets around half of the original items.

Once the partition has been split, the total read/write capacity doubles, preventing throttling.

While DynamoDB provides no notification when a split occurs, users will notice the reduction in throttling and better overall request handling.

Best Practices to maximize the benefits of Split For Heat

Some best practices and design considerations:

Design partition keys carefully: even though DynamoDB can adapt to hot partitions, it's always good to choose high cardinality partition keys to distribute writes evenly.
Avoid time-based sort keys for high traffic items: if items in a partition have an ever-increasing sort key (e.g. timestamps), they will always be directed to a single partition, making the split for heat ineffective. Consider a more randomized sort key design in this case.
Monitor and adjust: use Amazon CloudWatch to track read and write capacity usage and identify hot partitions before they impact your system's performance.

Conclusion

Split for heat is one of DynamoDB's most valuable features for managing unpredictable traffic use cases.

While proper data modeling is crucial to prevent hot partitions, DynamoDB's adaptive capacity features like split for heat make sure that even the highest traffic workloads don't impact your system's performance.

Why You Shouldn't Use DynamoDB Filter Expressions And What To Use Instead

Uriel Bitton — Wed, 26 Feb 2025 17:42:23 +0000

Last Friday I talked about DynamoDB on Zac's AWS Show.

Here's some of the value I shared 👇

👉 Why you don't want to use FilterExpressions

FilterExpressions should be a last resort solution.

They only filter the results after all items were fetched, losing the real benefit of filtering.

How do you filter then?

Use Sort keys...

👉 How to use powerful filtering on your data

Sort keys give you the opportunities to design for powerful filtering and sorting.

In the video below I explain briefly how to overload your sort key to allow for this efficient filtering.

👉 The 2 most important query methods to understand

These are the:

begins_with() method: the basis of filtering
BETWEEN method: enables range type filtering

I show nice examples of these in the talk.

👉 How GSIs work in DynamoDB

GSIs allow for additional access patterns.

A typical use case for a GSI is to create an inverted index for 1-many relationships.

E.g. If your base table lets you get all students enrolled in a course, your inverted GSI index lets you get all courses that a student is taking.

👉 The single table design

Finally, i teased the single table design, the motivation behind it and some good use cases.

Using one table instead of many is usually a better idea with DynamoDB.

It offers much faster queries.

The data that you query together should be stored together.

It also offers lower costs if you are using provisioned capacity.

That's it!

If you missed it you can watch the recording here:

Youtube Link

I Spent 5 Years Building With DynamoDB, Here Are My 3 Top Takeaways

Uriel Bitton — Sat, 22 Feb 2025 23:36:57 +0000

Ispent the past 5 years building scalable databases for clients and organizations.

Here are a few lessons I’ve learned along the way.

A lot of these are issues a lot of DynamoDB users have and learning from these will set you up with an unfair advantage.

1. Understand your access patterns

Before you start any database design with DynamoDB, stop and do this instead: identify and understand your data access patterns.

What I mean by this is to take some time to find out how your users will be reading and writing data on your application.

For example, are users browsing your products by category or rather by most popular? (it could be both)

Do you allow sellers to add multiple products at a time or rather just one?

By the manner your application fetches data most commonly that should tell you how you model it.

Rather than using a general data model, model your data (that is primary keys) based on how they will be queried.

If your application often queries users’ data along with their orders and purchase history, then you should store user info, orders and purchases using the same partition key.

DynamoDB optimizes for latency at scale and designing based on your application’s access patterns will make or break that speed and scalability.

2. Partition Key design lets you scale

Always think high cardinality first.

High cardinality is simply a measure of uniqueness of your partition keys.

A high cardinality partition key is for example “userID” or “productID” — there is usually just one in your database.

A low cardinality partition key is for example “status” or “isFeatured” — there can be many shared values like “active” or “true/false”.

The higher the partition key cardinality the better your database can scale. Partition keys that have too many shared values will run into hot partitions — partitions that get too much traffic.

Your partition doesn’t always have to have a single unique value, sometimes you have no choice but to give it a value that will be shared. The thing to keep in mind here is when the partition key starts getting popular you should look towards sharding it.

Sharding partition keys can be as simple as adding a prefix before the partition key name, such as a date or a location value. That can further make the partition higher in cardinality.

3. Sort Key design lets you filter

DynamoDB’s API has a “FilterExpression” method. This lets you filter items will high flexibility.

Don’t use it.

Why?

Well because FilterExpressions will fetch all of the data in the query and apply the filter after having fetched the data. This essentially means you are spending the same capacity units and costs without the filter operation.

So how can you filter data in DynamoDB instead?

Use the sort keys.

Sort keys support the following query operations:

= (equality)
begins_with() (sort key starts with substring)
<= (less than or equal to)

= (greater than or equal to)
BETWEEN (sort key is between value a and b)
Using particularly the begins_with() operator, we can perform some powerful filtering.

For example, if we need to filter hotel rooms by features we can use the following sort key design:

room#<view-type>#<num-of-guests>#<features>#<floor-number>#<room-number>

e.g.:
Room 1: "room#sea-view#2-guests#smoking#f1#101"
Room 2: "room#sea-view#3-guests#smoking#f2#201"
Room 3: "room#garden-view#4-guests#no-smoking#f3#301"
With the data model above we can filter out rooms by their view, number of guests, features (like smoking allowed), floor number and room number.

In this article, I go more into detail on using the begins_with() method to perform these powerful filtering strategies.

Amazon DynamoDB Vs Google Firebase: Which Database To Choose For Your App?

Uriel Bitton — Tue, 11 Feb 2025 15:45:25 +0000

When it comes to building modern applications, choosing the right database is crucial.

Two popular options in the cloud database space are Amazon DynamoDB and Google Firebase.

While both are powerful tools, they offer different features and come with their own strengths and limitations.

In this article, I’ll break down the main differences you should be concerned with to help you decide which one might be the best fit for your next application.

1. Use Cases

At their core, DynamoDB and Firebase serve different purposes.

DynamoDB is a fully managed NoSQL database designed for high-performance, scalable applications.

It’s ideal for use cases like gaming, e-commerce, IoT data storage, and any application where low-latency and high data throughput is critical.

On the other hand, Firebase is more than just a database — it’s an entire app development platform.

Firebase Firestore is a NoSQL database that focuses on real-time data synchronization.

It’s perfect for apps that require instant updates, like chat applications, collaborative tools, or live dashboards. Firebase also comes as a package with features like authentication, hosting, and analytics, making it quite versatile for various workload types.

2. Data Structure

DynamoDB and Firebase handle data fetches and writes differently.

DynamoDB stores data in a key-value and document format, which is highly flexible but requires careful planning of primary keys and indexes for efficient querying.

It’s designed for predictable, and fast access patterns, but may be challenging when used with complex queries and filtering.

Firebase Firestore uses a document-based model with collections and documents. It’s more intuitive for developers who want to store and retrieve hierarchical data.

Firestore also supports real-time queries, meaning your app can listen for changes and update instantly. This makes Firebase a better choice for apps that need real-time functionality with virtually no backend setup.

3. Scalability

Both databases are scalable, but they approach scalability differently.

DynamoDB is built for massive scale and can handle millions of requests per second with consistent performance.

It automatically partitions data across multiple servers, ensuring low latency even as your data grows. However, this scalability comes with a cost —scale can drive up pricing and requires careful capacity planning.

Firebase is also scalable, but it’s more suited for small to medium-sized applications.

While it can handle real-time updates efficiently, it isn’t designed for extremely high write loads like DynamoDB.

4. Ease of Use and Developer Experience

When it comes to developer-friendliness, Firebase dominates in this area.

Firebase’s SDKs are easy to integrate and the real-time features mean you can build real-time apps with very little effort.

Firebase also provides a generous free tier, making it accessible for startups and small projects.

You can check out the free tier and pricing plans here.

DynamoDB has a steeper learning curve.

It requires a deeper understanding of database design, including as I call it “re-wiring your SQL mind” to understand partition keys, sort keys, and secondary indexes.

However, for developers who need fine-grained control over performance and scalability, DynamoDB’s raw power, scalability, and flexibility is a major advantage.

5. Pricing

Pricing is another area where these two database services differ.

DynamoDB charges based on read/write capacity units (RCUs/WCUs) or on-demand usage. This can get expensive for high-traffic applications, but you benefit in rare throttling and keeping a stellar user experience.

With DynamoDB, you pay for what you use.

Firebase uses a pay-as-you-go model based on the number of reads, writes, and storage.

It has a generous free tier which makes it ideal for small projects. However, costs can add up quickly for apps with heavy real-time usage or large datasets.

6. Ecosystem and Integrations

Both database services have great ecosystems — Firebase is part of Google’s cloud and DynamoDB is part of AWS’s cloud.

Firestore integrates seamlessly with other Firebase services like Authentication, Cloud Functions, and Hosting. This is beneficial for full-stack app development.

DynamoDB integrates well with other AWS services like Lambda, S3, and AppSync, amongst other widely used AWS services, making it a great fit for organizations already on AWS’s cloud platform.

Conclusion: Which One Should You Choose?

As I’ve said before: the choice between DynamoDB and Firebase ultimately depends on your application’s needs.

If you’re building a small to medium-sized real-time app with a focus on user interaction and ease of development, Firebase is probably a better choice.

However, if you need a highly scalable, high-performance database for your application, DynamoDB is definitely your best bet. It’s a powerhouse for handling massive amounts of data with low latency, though it requires more upfront planning and expertise.

The bottom line: it’s all about picking the database that aligns with your business and application’s needs, budget, and project requirements.