DEV Community

Vadym Kazulkin for AWS Community Builders

Posted on • Edited on

Amazon DevOps Guru for the Serverless applications - Part 4 Anomaly detection on API Gateway

Introduction

In part 1 of our series, we introduced the Amazon DevOps Guru service, described its value proposition, the benefits of using it, and explained how to configure it. We also need to go through all the steps in part 2 to set everything up. In part 3, we saw DevOps Guru in action by generating the anomalies on the DynamoDB and explaining the general capabilities of the DevOps Guru service. In this part of the series, we'll generate anomalies on the API Gateway.

Anomaly Detection on API Gateway

There are mainly two kinds of anomalies that we can experience with API Gateway: HTTP 4XX errors and HTTP 5XX errors. We'll see the latter in action when we provoke Lambda anomalies in the next part of the series. Let's take a look at whether DevOps Guru can recognize HTTP 4XX errors as anomalies.
There are several kinds of such anomalies. We'll be looking at the following ones:

  • HTTP 429 "too many requests" to API Gateway, where it will throttle requests.

  • HTTP 404 "not found error" in case we ask for a non-existing product id.

Let's first look at the HTTP 429 error. The easiest way to generate such errors with the lowest cost possible is to set low values to either Request Rate, Burst, or Quota to the DevOpsGuruDemoProductAPIUsagePlan associated with our DevOpsGuruDemoProductAPI. Here is the example where we set the Quota to 500 requests per day

Now let's do some stress test with hey tool:

hey -q 10 -z 11m -c 5 -H "X-API-Key: a6ZbcDefQW12BN56WEN7" YOUR_API_GATEWAY_ENDPOINT/prod/products/1
Enter fullscreen mode Exit fullscreen mode

With this (sending 10 requests per second per container with 5 containers in parallel for 11 minutes), we'll exhaust 500 requests per day on API Gateway pretty quickly and then receive HTTP 429 as a response. DevOps Guru also recognized the anomaly as displayed in the image below.

We see that DevOps Guru thinks that such an error has only medium severity (which I personally disagree with).

"Aggregated metrics" show that "4XXError Average" was correctly recognized as a reason for the anomaly. Unfortunately, it's the problem of CloudWatch that it only displays the generic 4XX HTTP Error and not the concrete HTTP 429 error, and DevOps Guru simply shows the CloudWatch graphs here. We'll need some help from the CloudWatch Logs to identify the exact error.

And "Graphed Anomalies" shows the exact amount of throttled requests in the time range of the anomaly.

There are also some recommendations on how to fix this kind of anomaly.

To provoke an HTTP 404 "not found error", we simply have to permanently query for a non-existent product id like:

hey -q 3 -z 10m -c 5 -H "X-API-Key: a6ZbcDefQW12BN56WEN7" YOUR_API_GATEWAY_ENDPOINT/prod/products/200
Enter fullscreen mode Exit fullscreen mode

And after several minutes, the DevOps Guru will recognize this anomaly and create the insight. As CloudWatch doesn't differentiate between HTTP 4XX errors, the insight will look exactly as in the case of HTTP 429 errors explained above.

Here is the room for improvement as HTTP 404 are application errors and HTTP 429 can be a more infrastructural error, so more precise information delivered by CloudWatch/DevOps Guru will lead to much quicker remediation time.

Conclusion

In this article, we described how Amazon DevOps Guru was able to detect the anomaly on API Gateway by throttling through exceeding the number of requests per day quota (the real-world scenario will be to exceed the request and burst quotas) and by querying for not existing product id. In the next part of this series, we'll explore the anomaly detection on the Lambda.

Top comments (0)