DEV Community

Cover image for Detect EventBridge target failure: Part 1 - with dead letter queue
Pubudu Jayawardana
Pubudu Jayawardana

Posted on • Originally published at pubudu.dev

Detect EventBridge target failure: Part 1 - with dead letter queue

Intro

When EventBridge delivers messages to its target, there can be many reasons that cause failing to send a message. There can be permission issues, rate limits or the unavailability of the target or even can be a glitch in the AWS itself, just to name a few.

No matter what the reason is, it is always ideal to get notified that there is an issue delivering messages and the reason for the failure. In this blog post I will discuss how a dead letter queue can be useful to get notified when the EventBridge fails to deliver messages to its target.

Dead letter queues

Dead letter queues are unsung heroes of the event driven architecture 😀. Those are easy to set up and manage yet greatly improve the resilience of a system. Also it is very cost effective.

Let’s see how we can capture the target delivery failures in EventBridge using a dead letter queue.

Please note that EventBridge supports DLQ in a couple of “levels”. EventBridge bus can have a DLQ itself, or you can set a DLQ per target basis. Let’s discuss the differences.

DLQ on EventBridge bus level

EventBridge bus can have a DLQ of its own. However, this is limited to capture any errors related to the KMS encryption. EventBridge sends events that aren’t successfully encrypted to the DLQ.

You can only see this DLQ setting in the EventBridge in AWS console only if customer managed KMS is used to encrypt messages. In fact, it is part of the Encryption settings.

Image: DLQ for Event bus only available when customer managed KMS is in use.

Image: DLQ for Event bus only available when customer managed KMS is in use.

However, this DLQ will NOT capture any target related failures, so that we cannot use this for our purpose.

DLQ on EventBridge target level

When EventBridge cannot deliver a message to a target, we can set up a SQS queue to put that message in, on the target level.

DLQ on target

Image: DLQ on target.

Since one rule can have more than one target, each target can have different DLQs as well. You can use the same SQS queue as the DLQ for all the targets, but you have to configure it for each and every target separately. It may sound like repetitive work, but if you use an infrastructure as a code tool like CDK or CloudFormation, this is not complex.

How it works

High level architecture
Image: High level architecture.

  1. EventBus tries to deliver a message to its target (here, it is a SQS queue) via EventBridge rule.

  2. Let’s assume there is a permission issue, and the message cannot be delivered.

  3. Then, EventBridge will put the message into the DLQ configured for this specific target.

  4. In CloudWatch, there is an alarm set up to be triggered whenever there is a message in the DLQ.

  5. When the failed message is in the DLQ, Alarm triggers and there is a SNS topic configured as the alarm action.

  6. And when the alarm action publishes a message to SNS topic, it will send the notification to all the subscribers to notify about the failure.

Try this yourself

I have created an AWS SAM template to try this scenario in your AWS account.

  1. Clone the Github repository: https://github.com/pubudusj/event-bridge-target-failure-detection-with-dlq

  2. Deploy the stack using below command:

    sam deploy \
    --template-file template.yaml \
    --stack-name event-bridge-target-failure-detection-with-dlq \
    --capabilities CAPABILITY_IAM \
    --no-confirm-changeset \
    --parameter-overrides NotificationEmail=[YourEmailAddress]
    
  3. Here, add your email address as NotificationEmail, so you will get the notification into your email box when the target fails.

  4. Once the stack is deployed, you will get a SNS subscription confirmation email. You need to confirm it in order to receive notifications.

  5. Then, publish a message into the created event bus with the source as xyzcorp.

  6. This way the message will match the rule and try to deliver the message to the target.

  7. I have blocked the permission for publishing the target intentionally to simulate the failure.

  8. In a moment, you should get an email with the alarms status.

  9. Further, if you check the messages in DLQ, you can see the failed message and in the message attributes, you may see the reason of failure (depends on the reason).

Message attributes of a failed message in DLQ
Image: Message attributes of a failed message in DLQ.

10.You can configure the threshold, period and evaluation periods of the alarm as needed to control the frequency of the notifications in case of a failure. https://github.com/pubudusj/event-bridge-target-failure-detection-with-dlq/blob/main/template.yaml#L61-L63

Summary

  1. EventBridge bus has a DLQ but it is for a different purpose and cannot capture any target failures.

  2. You can use this dead letter queue approach, to capture any messages which cannot be delivered to the target. Based on the no of messages in the queue, you can get notified using CloudWatch metric and SNS. However, you will need to configure it for each and every EventBridge target separately. Using an IAC tool to configure this may be convenient.

  3. I will discuss another solution to achieve the same in part 2 of this blog post.

Resources

  1. Using dead-letter queues to process undelivered events in EventBridge https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-rule-dlq.html

👋 I regularly create content on AWS and Serverless, and if you're interested, feel free to follow/connect with me so you don't miss out on my latest posts!

LinkedIn: https://www.linkedin.com/in/pubudusj
Twitter/X: https://x.com/pubudusj
Medium: https://medium.com/@pubudusj
Personal blog: https://pubudu.dev

Top comments (0)