Introduction
In part 1, we introduced the Amazon DevOps Guru service, described its value proposition, the benefits of using it, and explained how to configure it. We also need to go through all the steps in part 2 to set everything up. In the subsequent parts, we saw DevOps Guru in action, detecting anomalies on DynamoDB and Aurora Serverless v2, API Gateway, and Lambda alone and also in conjunction with other AWS Serverless Services like SQS, Kinesis, Step Functions, and Aurora Serverless v2.
In this part of the series, we'd like to explore whether DevOps Guru will recognize anomalies with Amazon Simple Notification Service (SNS)
Detecting anomalies with SNS
Let's enhance our architecture so that in case of creation of the new product, we send the notification to the SNS Topic, which then delivers this notification to other (external) HTTP(s) endpoints.
Let's imagine that this HTTP(s) endpoint was moved or answers with the 500 error code, so that SNS will consider the notification as not being delivered.
I was able to reproduce this scenario on AWS by deploying a temporary API Gateway endpoint and configuring it as an SNS subscription. I needed to confirm the subscription, so I put the Lambda behind my temporary API Gateway endpoint, which was triggered for a POST request (this is what SNS sends to the configured HTTP(s) endpoint as a confirmation request). Then I logged the whole HTTP body of the POST request in my Lambda function and copied the subscription URL (which is a part of the HTTP body), which I entered in the browser. With the SNS subscription being confirmed, I then deleted my temporary API Gateway endpoint so that the SNS HTTP(s) subscription could be sent, but couldn't be delivered to the endpoint anymore.
Then I sent several hundred "create product requests" via the hey tool like :
hey -q 1 -z 15m -c 1 -m PUT -d '{"id": 1, "name": "Print 10x13", "price": 0.15}' -H "X-API-Key: XXXa6XXXX" https://XXX.execute-api.eu-central-1.amazonaws.com/prod/products
which all failed to be delivered and have been retried (without success) 3 times by default, see Amazon SNS message delivery retries.
Despite seeing the NumberOfNotificationsFailed in CloudWatch metrics (see the blue line), no DevOpsGuru insight has been created even after re-trying this experiment several times.
Then, directly after this experiment, I immediately started another experiment to fetch a non-existent product from the database, which then caused HTTP Error 404 (Not Found) on the API Gateway. I was then surprised that the following insight was created by the DevOps Guru right away:
with the following anomalous metrics NumberOfNotificationsFailed Average (for anomaly with SNS) and 4XXX Error Average (for anomaly with API Gateway):
and the following graphed anomalies :
Conclusion
In this article, we explored whether DevOps Guru will recognize anomalies with Amazon Simple Notification Service (SNS), like the HTTPs Subscription whose endpoint doesn't exist anymore (no connection can be established) or answers with HTTP 500. We saw that DevOps Guru seemed not to react to the anomalous metric NumberOfNotificationsFailed Average alone, as DevOps Guru considers this not to be an anomaly (which is wrong in my opinion). It only seems to create DevOps Guru insight, then at least another anomalous metric will be detected. I will approach the DevOps Guru team with my insights so that they can verify the experiment and look behind the scenes at what's happening, and hopefully improve the DevOps Guru service to correctly handle this SNS anomaly.





Top comments (0)