How I Designed a Zero-Waste, Event-Driven Analytics Pipeline Using DynamoDB TTL and AWS Lambda

#programming #go #aws #dynamodb

It is time to get the data Barx is generating and bring it to the barbers. My first option was to create a cron job that runs every Monday at 3 am UTC. When my lambda triggers, I search for all barbershops in DynamoDB and start the analytics calculation for each one.

However, this approach presents some resource and cost challenges:

Scanning the entire DynamoDB table is resource-intensive and can be costly.
A single lambda function handles a heavy workload, which may impact performance and scalability.
Every Monday, the process attempts to retrieve all barbershops, regardless of their activity status.
Analytics calculations are performed for all barbershops, including those without any activity in the previous week, leading to unnecessary resource consumption.

Probably there are more problems, but I won't ask for AI to make it more dramatic...

The point is, I would like to be able to spend resources only with those barbershops that had activities from last week and recalculate their analysis regarding the last 7, 30, 60, and 90 days.

The architecture proposal

Thinking on that, I built this architecture:

It looks simple, but it helps me to schedule a job only for those barbershops that had some activity, so I create a record in DynamoDB to expire next Monday at 3 am UTC.

When the record is deleted because TTL rule, it triggers another lambda that shares the data of that barbershop that was scheduled to start in a dedicated lambda for its analytic calculation.

This approach helps me to spend resources only with users who are producing data, not those who are not producing.

Benefits I thought:

Analytics calculations are performed only for barbershops with recent activity, optimizing resource usage.
Scheduled records in DynamoDB are automatically cleaned up via TTL, reducing manual maintenance.
The architecture scales efficiently, supporting any number of users without additional complexity.

Preocupations I have:

If many records expire at the same time, a large number of lambdas may be triggered simultaneously, leading to a spike in concurrent executions. (But I can handle this, configuring better DynamoDB Stream batch size and parallelization factor).
There may be a delay between the scheduled expiration time and the actual triggering of the lambda, depending on the load on the DynamoDB Streams service. (This is acceptable for my use case since analytics don't need to be real-time).

My intention in posting this article is to share my idea and get feedback from the community. If you have any suggestions for improvement or alternative approaches, please let me know in the comments!

DEV Community

How I Designed a Zero-Waste, Event-Driven Analytics Pipeline Using DynamoDB TTL and AWS Lambda

The architecture proposal

Benefits I thought:

Preocupations I have:

Top comments (0)