Introduction
In part 1, we introduced the Amazon DevOps Guru service, described its value proposition, the benefits of using it, and explained how to configure it. We also need to go through all the steps in part 2 to set everything up. In the subsequent parts, we saw DevOps Guru in action, detecting anomalies on DynamoDB and Aurora Serverless v2, API Gateway, and Lambda alone and also in conjunction with other AWS Serverless Services like SQS, Kinesis, Step Functions, and SNS.
In this part of the series, I'd like to explore whether DevOps Guru will recognize anomalies on a Lambda function consuming from DynamoDB Streams
Detecting anomalies on Lambda consuming from DynamoDB Streams
Let's enhance our architecture so that in case of creation of the new product persisted in the DynamoDB, DynamoDB Streams will call the UpdateProduct Lambda function. Here is the link to the AWS SAM template. UpdateProduct Lambda function is defined with DynamoStream event type. We also added Dead Letter Queue as a Lambda failure destination in the DestinationConfig of the DynamoStream event.
UpdatedProductFunction:
Type: AWS::Serverless::Function
Properties:
FunctionName: UpdatedProduct
.....
Events:
DynamoStream:
Type: DynamoDB
Properties:
Stream: !GetAtt ProductsTable.StreamArn
DestinationConfig:
OnFailure:
Type: SQS
Destination: !GetAtt OnFailureQueue.Arn
StartingPosition: LATEST
BatchSize: 50
MaximumRetryAttempts: 5
MaximumRecordAgeInSeconds: 3600
Here is how a DynamoDB ProductsTable is defined together with StreamSpecification.
ProductsTable:
Type: AWS::DynamoDB::Table
Properties:
TableName: "ProductsTable"
AttributeDefinitions:
- AttributeName: 'PK'
AttributeType: 'S'
KeySchema:
- AttributeName: 'PK'
KeyType: 'HASH'
BillingMode: PAY_PER_REQUEST
StreamSpecification:
StreamViewType: NEW_IMAGE
This is how the final architecture looks:
Let's imagine that the UpdateProduct Lambda function always runs into an error while processing DynamoDbStreamRecord (simply throw some error there), which is a part of DynamoDbEvent. Lambda function then retries 5 times according to our configuration of DynamoStream event, and then the message will be placed into the Dead Letter Queue.
MaximumRetryAttempts: 5
MaximumRecordAgeInSeconds: 3600
We can reproduce the failure with curl or hey tool, so that we have many failed UpdateProduct Lambda functions.
hey -q 1 -z 15m -c 1 -m PUT -d '{"id": 1, "name": "Print 10x13", "price": 0.15}' -H "X-API-Key: XXXa6XXXX" https://XXX.execute-api.eu-central-1.amazonaws.com/prod/products
I wanted to figure out whether DevOps Guru will detect this incident and what information it will give us.
The incident was first of all recognized by DevOps Guru :
with the anomalous metrics "Errors Sum" and "IteratorAge Maximum" on the UpdateProduct Lambda function :
and the following graphed anomalies :
Interestingly, if the compare anomalous metrics with the anomalies with Kinesis Data Streams (which works similar to DynamoDB Streams) which we reproduced in the article Amazon DevOps Guru for the Serverless applications - Part 6 Continuing with anomaly detection on Lambda invocations we saw additional anomalous metrics on the Kinesis Data Streams like "GetRecords.Byte Sum" and "GetRecords.Records Maximum" which both indicate that there are unprocessed Kinesis Data Streams record(s) for a long period of time. CloudWatch also showed me "GetRecords.Byte Sum" and "GetRecords.Records Maximum" increased on DynamoDB Streams during the incident, but they were not listed in the anomalous metrics. Generally, it's not wrong, as there is an error in the Lambda function itself and not with DynamoDB Streams. The value of the "IteratorAge Maximum" anomalous metric increases when the Lambda function can't efficiently process the data that's written to the Kinesis/DynamoDB streams that invoke the function. So, there is enough information in place to investigate the incident and understand what AWS Serverless services are involved in it, but the DevOps Guru behaves a bit differently in case of incidents with Kinesis Data Streams and DynamoDB Streams.
I reproduced this type of anomaly several times, and DevOps Guru occasionally created another type of insight for the same anomaly instead:
pointing to other anomalous metrics "NumberOfMessagesSent Sum" and "ApproximateAgeOfOldestMessage Maximum" on the Dead Letter Queue as Lambda failure destination, but with surprisingly no anomalous metrics listed from the UpdateProduct Lambda function:
and the following graphed anomalies :
Conclusion
In this article, we explored whether DevOps Guru will recognize anomalies on Lambda consuming from DynamoDB Streams in case this Lambda function runs into an error. The general answer was yes, but we experienced 2 different flavors of anomalous metrics: "Errors Sum" and "IteratorAge Maximum" on the Lambda function, and "NumberOfMessagesSent Sum" and "ApproximateAgeOfOldestMessage Maximum" onthe Dead Letter Queues Lambda failure destination. I'd personally expect that all these anomalous metrics will be presented together in one DevOps Guru insight and not separated into different DevOps Guru insights.
I will approach the DevOps Guru team with my insights so that they can verify the experiment and look behind the scenes at what's happening, and hopefully improve DevOps Guru service to correctly handle this anomaly.







Top comments (0)