DEV Community

Cover image for Processing Data in Real-Time with DynamoDB Streams
Rohan
Rohan

Posted on

Processing Data in Real-Time with DynamoDB Streams

Processing data in real-time is a common use case in serverless applications. Enabling DynamoDB Streams on a DynamoDB table opens up the possibility to process events (Inserts, Updates and Deletes) in real-time. Rather than having to poll a database for changes from user-written code AWS will handle the polling and push the data changes to the a Lambda function for processing.

Alt Text

What goes into the Stream?

Every type of data event - Inserts, Modifications, and Deletes, is captured as part of a "stream record". The stream record contains the type of data event and the affected data. You can configure the StreamViewSpecification when enabling a stream on a table to what's most suitable for your use case. Typically I go for "New and Old images" to ensure I'll be able to capture the most information about changes in the table.

Use Cases:

Archiving Data from DynamoDB to S3:

Storing frequently accessed data in DynamoDB is a great idea. Infrequently accessed data, not so much. If you know your access pattern no longer requires quick access to a subset of records, those records are perfect candidates for archival. One way to designate when records should be archived is specifying a time-to-live value when inserting data into the table. AWS will automatically delete that record from the table within 48 hours of the specified time. If you need time-sensitive deletion, it may be better to set up a Cloudwatch Event + Lambda function to regularly query your table for items that should be deleted based on some given parameters.

Data Replication

If you need to copy records from a DynamoDB table into another database, you can set up a Lambda function to capture inserts into the DynamoDB table and write the contents of those records into the second table. If you need to mirror the DynamoDB table with the second table, you can expand the replication rules to capture modification and delete data events too.

Sending messages to employees or customers

As covered in a previous article, I used DynamoDB as part of an application that would send out SMS messages to customers whose contact info was stored in DynamoDB. The business requirement required that we sent out an SMS in real-time so DynamoDB Streams were a perfect fit for the use case. (Thanks real-time processing!)

Controller Patterns

The Simple Controller

Alt Text
The Simple Controller is meant for simpler use cases where you may only be taking one or two actions on data events from a table. In this scenario, you can use a single Lambda function to process your stream data.

This is a great way to start out with DynamoDB streams when your use cases are limited in nature and you don't have multiple developers writing code for the same Lambda function. If you find yourself or your team struggling with adding new functionality to the Lambda function as time passes, it's worth considering the next option.

The Complex Controller

Alt Text

Mobile View Alt Text

The Complex Controller decouples your stream controller and stream processor actions from each other another and may be suitable for larger applications and teams. Code changes can happen in parallel without affecting the code that processes other data event types. Additionally, each Stream Processor Function can be configured and scaled separately.
The Stream Controller:
You still have a single Lambda function (The Stream Controller) that will be invoked by the DynamoDB stream. However instead of this Lambda function taking actions upon the data in the stream, the stream controller will send the stream data to an SNS Topic with a specified message filter attribute.
The Stream Router:
We'll use an SNS Topic to fan-out the messages to our Stream Processors. The SNS Topic will only pass on the message to the functions subscribed for a particular message filter attribute value. If there's a Lambda function subscribed to the topic looking for Insert events, it'll only receive those kinds of events. Functions that are subscribed to Delete events won't receive Insert events. If the SNS topic finds a match in its subscribers, it'll route the message down to that Stream Processor.
The Stream Processors:
The Stream Processors will then act on the stream data. As I mentioned earlier, each stream processor can be scaled separately based on the behavior of the system its interacting with. For example, if you get throttled by a downstream service when sending too many requests at one time, you can set up SNS Topic -> SQS Queue -> Lambda Function with Reserved Concurrency enabled to reduce the number of concurrent requests made. For more on that, check out this great post by Alex DeBrie.

The added flexibility from the Complex Controller does have tradeoffs. You'll be charged for the additional Lambda invocation and SNS message, so it's worth evaluating if the approach is worth the cost. Based on my own experience paying the extra usage costs made sense for complex use cases since it was simpler to test and troubleshoot the decoupled Stream Processors but YMMV.

Controller Comparison

Simple Controller Complex Controller
Stream Processing 1 or 2 simpler actions 3+ actions, complex data processing
Cost Charged for 1 Lambda invocation Charged for 2 Lambda invocations and SNS Message
Maintainability Easy to maintain with limited functionality Easier to maintain with expanded functionality
Processor Scaling Every event type is processed with the same settings for scaling/concurrency Stream Processors can be scaled based on expected load and limitations of downstream systems

What if I don't want to process my data until later?

If you need to hold off on processing the data for up to 15 minutes, I'd advise trying out SQS Delay Queues. The Stream Controller Lambda can insert a message into an SQS queue with a delay enabled and the stream record can be processed after the specified delay period.
If your stream data needs to be processed the next day or some other time after that, I wouldn't recommend using DynamoDB Streams or SQS Message Delivery Delays as those are more suitable for real-time or near-real-time use cases. I'll be covering how I processed these kinds of records in a future article.

Conclusion

Thanks for reading this post, please feel free to leave your questions or feedback in the comments. I noticed that the dev.to mobile app didn't have a great way to view images in landscape, so I created a vertical view of the Complex Controller diagram, please let me know if you liked that view or if there's another way you think the image could be better displayed in the mobile app.

Top comments (0)