Discussion on: Understanding the AWS Lambda SQS Integration

View post

Replies for: Great article. I'm relatively new to this area but it appears that sqs->lambda integration is way more complex/subtle than it appears to most pr...

Unfortunately I could not find any documentation on this but I think you are correct. I assume that the message retention timer is not reset after changing the visibility.

The practical implication of this is that you might end up invoking your Lambda function over and over for a failing message, e.g. one that throws an exception in your Lambda code, until you "fix" your code.

However the item age seems to be reset because when I look at ApproximateAgeOfOldestMessage when one message gets constantly retried the graph looks like a sawtooth, indicating that the message age is indeed reset when the visibility is changed back to visible.

What you can do however to detect a scenario where your messages are being retried all the time is to configure an alarm on the ApproximateAgeOfOldestMessage based on the sawtooth pattern.

Does it make sense?

harkinj • May 15 '20

Thanks for the reply an the information. Did some experiments and what I outlined is actually occurring. I think the a potential way to handle this is to put in place a RedrivePolicy and control the number of retries via the maxReceiveCount setting. Unfortunately in the system I've inherited the suite of Lambda functions are not idempotent and hence I may need to set maxReceiveCount to 1 and also batch_size to 1 ( to remove partial failures) and get the message to the DLQ asap rather than retrying. lumigo.io/blog/sqs-and-lambda-the-... has some useful info. What we decide to do with messages in the DLQ will be fun :) but at least we have not lost messages that failed to be processed. Thanks for your time.