DEV Community

k.goto for AWS Community Builders

Posted on

CloudWatch Composite Alarm to detect ELB's other 5XX with AWS CDK

Summary

  • I want to be notified when HTTPCode_ELB_5XX_Count is other than HTTPCode_ELB_[500|502|503|504]_Count
    • = Notify only when 501,505,561, etc.
    • Notification can be made for 500,502,503,504 using the respective concrete status metrics.
    • If 5XX is also notified when 500,502,503,504, it will be duplicated, so I want to prevent it.

By using CloudWatch Composite Alarm (with Suppressor Alarm functionality), I was able to achieve a non-duplicate detection and notification mechanism that only notifies when 501,505,561, etc. are present, as described above.

The story goes on to say that the mechanism was created using a CDK construct and I published it on Construct Hub.


Construct Hub

Construct Hub is where you can publish your own AWS CDK constructs as a library.

The one I created this time is published on Construct Hub under the name "elb-other-5xx-alarm".

In addition, it is necessary to publish to npm in advance for Construct Hub publication, and the page as an npm package is shown.

And while you're at it, take a look at GitHub if you like, see this.


CloudWatch Composite Alarm + Suppressor Alarm

CloudWatch Composite Alarm is alarm that can be dynamically determined by an expression based on multiple alarm conditions.

I thought that by configuring HTTPCode_ELB_5XX_Count and HTTPCode_ELB_[500|502|503|504]_Count with the following alarm rule, I could theoretically set the alarm state only when [500|502|503|504] is not firing.

To explain the rule as it is expressed, it fires when "5XX is firing and none of [500|502|503|504] is firing".

ALARM(HTTPCode_ELB_5XX_Count)
AND (
    NOT (
        ALARM(HTTPCode_ELB_500_Count)
        OR ALARM(HTTPCode_ELB_502_Count)
        OR ALARM(HTTPCode_ELB_503_Count)
        OR ALARM(HTTPCode_ELB_504_Count)
    )
)
Enter fullscreen mode Exit fullscreen mode

In fact, however, depending on the timing of the error, the above composite alarm expression could be true or false, so it did not work.

After much investigation, I was able to solve the problem by using a CloudWatch composite alarm "suppressor alarm".

Suppressor alarm is one of the functions of CloudWatch composite alarms and is a function that suppresses composite alarm actions while an alarm specified as a suppressor alarm is firing.

In other words, by waiting a specified number of seconds for a suppressor alarm to fire and then not firing the composite alarm during that time, it is possible to perform delayed evaluation of the composite alarm.

This makes it possible to evaluate composite alarms after waiting to see if a 500 error occurs or not as described above.

Thus, the CDK construct library published this time is realized using the combination of CloudWatch composite alarms + suppressor alarms.


How to use Construct

First, install in the CDK repository.

npm install elb-other-5xx-alarm
Enter fullscreen mode Exit fullscreen mode

Then it will be available.

import { ELBOther5XXAlarm } from 'elb-other-5xx-alarm';

new ELBOther5XXAlarm(this, 'ELBOther5XXAlarm', {
  alarmName: 'my-alarm',
  alarmActions: alarmActions, // e.g. [new SnsAction(new Topic(this, 'Topic', {}))]
  loadBalancerFullName: alb.loadBalancerFullName, // e.g. 'app/alb/123456789'
  period: Duration.seconds(60),
  threshold: 1,
  evaluationPeriods: 1,
});
Enter fullscreen mode Exit fullscreen mode

In practice, it is used in this way.

This creates, in addition to the VPC, etc., two ELBs and their respective composite alarms such that one returns 501 and the other 503 as a fixed response. After deployment, when the default endpoint issued by the ELB is accessed with curl, etc., one ELB will return 501 and the other will return 503.

The composite alarm associated with the ELB that returns 501 will trigger an action, while the composite alarm associated with the other ELB that returns 503 (which you do not want to fire) will temporarily go into alarm state, but the action will be suppressed and will not fire and will immediately return to OK state. You can confirm this in the CloudWatch alarm console, for example.


Finally

CloudWatch Composite Alarm is very useful!

Use this Construct if you like!

Top comments (0)