DEV Community

Kenichiro Wada for AWS Community Builders

Posted on

I created a Lambda loop detection verification environment with AWS CDK and verified it.

Nice to meet you!
I'm Kenichiro Wada, one of the AWS Comunity Builders in Japan.

This is my first post here.
I hope you will read it, even though it is a rambling article.


This article is a compilation of the following two published in Japanese.

https://zenn.dev/keni_w/articles/acbfd69a2c7dbb

https://zenn.dev/keni_w/articles/ef6e8d39bb055f


I tried "Detecting and stopping recursive loops in AWS Lambda functions" which appeared last month.

AWS Lambda now detects and stops recursive loops in Lambda functions

Those who have looped with Lambda in the past (including me) thought it was a surprise and were slightly disappointed (see "What to Expect in the Future"),
But when I actually tried it, I was impressed because it stopped properly.

For details on how it works, please click here.

Detecting and stopping recursive loops in AWS Lambda functions

However, in the article that I tried to verify, others have already written about it, so I started by building a verification environment using the AWS CDK.

Build a verification environment with AWS CDK

The created resources are available on Github.

https://github.com/Kenichiro-Wada/aws-lambda-recursion-detection

As mentioned in the README,
The following verification environment will be deployed.

  • Loops in Amazon SQS With Dead Letter Queue
    Loops in Amazon SQS With Dead Letter Queue

  • Looping with Amazon SQS Without Dead Letter Queue

Looping with Amazon SQS Without Dead Letter Queue

  • Looping with Amazon SNS

Looping with Amazon SNS

  • Pattern combining Amazon SNS and Amazon SQS

Pattern combining Amazon SNS and Amazon SQS

  • Pattern Combining Amazon SQS and AWS Lambda

Pattern Combining Amazon SQS and AWS Lambda

  • Looping with Amazon SQS and AWS Lambda in a Beaded Pattern

Looping with Amazon SQS and AWS Lambda in a Beaded Pattern

  • Looping with Amazon S3 (this is a bonus)

Looping with Amazon S3

We prepared two types of Amazon SQS because we wanted to know if the behavior changes depending on whether DLQ is used or not.

As for Amazon S3, it is not included in this project, so it is a complete bonus.
This one does not detect and stop when it is running,

If left unchecked, it will continue to run forever, so be sure to stop it!

The code for the Lambda function was generated by ChatGPT (no charge).
I am someone who has not yet mastered Amazon CodeWhisperer yet. I will try harder.

Verified.

My honest impression is, oh, it properly stops at 16 runs.
The invoke metrics also confirm that it stops at 16 runs.

Lambda Invoke Metrics

I tried to display it in console.log in Cloudwatch Log to search for messages and get them out,
For the execution, 16 messages are displayed.

Lambda Function Cloudwatch Logs

(I should have used the TAIL function.)

Email notifications and Health Dashboard notifications did not appear until about 3 hours later.
Below is the Health Dashboard notification.
The ones that were close to the execution time were guided together.
(Sorry for my Japanese.)

Health Dashboard notification 1

Health Dashboard notification 2

Health Dashboard notification 3

If you want to do this as soon as possible...,
Set up a DLQ and use SQS triggers to notify errors (this is also the case),
(This is also the case.) If you can detect that the system has entered the DLQ, you may be able to notice it as soon as possible. I think so.
I'm sending an email in the case of a DLQ in SQS this time.

In the case of the SQS loop with and without DLQ, the behavior did not change,
In the case of the SQS loop without DLQ, if I left the loop as it was, I received an email and a Health Dashboard message the next day or later.
This is the loop processing (without DLQ) information email received on 2023/08/07 that was executed on 2023/08/05.

Mail

It is interesting that it changes with or without DLQ.
I think it is recommended to have a DLQ available when using Queue, so I am guessing that the notification will not appear continuously, as it is not even a recommended setting to begin with, plus don't let it loop.

The extra S3 trigger, though,
When I run it, it looks like this. (Even the person who created it was nervous when moving it.)
I stopped it after about a minute, but still, it was executed 27 times.

for s3 Loop

This one is not subject to detection, so if you try it,
Please stop it as soon as you run it!

About Amazon SQS and AWS Lambda beaded pattern

I wondered if the following pattern would also stop, and verified it, although the pattern was not specifically stated in the official statement.

Pythagorean switch
(This is an excerpt from a slide, so it is only in Japanese. Sorry.)

At the beginning of the verification, I thought that this case would not be detected and stopped...,
It stopped and stopped properly.

This is the Invocations metric for a Lambda function executed in a Pythagorean switch pattern.
This one also stopped after 16 times. (I was really thrilled...)

Pythagorean switch Lambda Metrics

You can try this one with the "Loop with Amazon SQS and AWS Lambda in a beaded pattern" mentioned above.

I think this time the mechanism is supposed to detect by entering the same Queue or something else in succession, so I'm thinking it would be no good if it goes through something else.

What to Expect in the Future

I think it would be a very effective mechanism in a configuration using SQS/SNS.

But I've had it happen in the past, too,
I've had several people comment on it,
I think Lambda loops are often triggered by S3 triggers like the following.

Lambda Loop
※This is a Loop pattern that I have caused in the past.
(This is also extracted from a slide, so it is only in Japanese. Sorry.)

So,
I want it implemented in S3 triggers!
I have come to the conclusion that this is a good idea.
I wonder if it will be difficult... But I hope so.

Don't forget.

One thing to remember. The ability to detect and stop loop execution is only an auxiliary function (last resort).
What we should not forget is,
I think we need to be careful to avoid loops.

In the pattern we were verifying, hey, it's not a loop, is it? I had to review it,
I think it is necessary to construct the process in such a way that it does not become a loop (unless it is intentional).

In addition, it seems that the support case needs to be raised for cases where loops are intentionally created.

Finally.

Sample published on github, S3 trigger case,
If you try it, please stop it immediately after it starts!
If you get charged for it, it's your fault for not stopping it, and we are not responsible for it.
It's all over README.md!


We will continue to implement Event processing to avoid being indebted to this function as much as possible.

Top comments (0)