DEV Community

Cover image for Troubleshooting Async AWS Lambda Flows
Manusha Chethiyawardhana for AWS Community Builders

Posted on • Originally published at Medium

Troubleshooting Async AWS Lambda Flows

Asynchronous invocation of AWS Lambda functions is a powerful feature that allows event producers and consumers to be decoupled, facilitating efficient and scalable event processing.

It enables developers to create highly scalable and responsive applications. However, troubleshooting async Lambda flows may present significant complexities, necessitating familiarity with techniques and tools to overcome these challenges.

This article will provide an overview of troubleshooting techniques for asynchronous AWS Lambda flows and discuss some best practices for handling errors and retries.

Asynchronous Invocation Overview

AWS services such as Amazon S3, Amazon SNS, Amazon SQS, and Amazon EventBridge can invoke Lambda functions asynchronously. When a Lambda function is invoked asynchronously, the event is placed in a queue, and the caller does not need to wait for the function to complete its execution. Instead, Lambda takes over the responsibility of processing the event. It is possible to configure Lambda to handle errors and send invocation records to a downstream resource that supports asynchronous invocations, which allows for connecting various components of your application.

By setting the invocation type parameter to “Event,” you can enable asynchronous invocation. An example AWS CLI command to invoke an async function is shown below. If you are using AWS CLI version 2, the cli-binary-format option is required.

$ aws lambda invoke \
--function-name myAsyncFunction \
--invocation-type Event \
--cli-binary-format raw-in-base64-out \
--payload '{ "name": "value" }' response.json
Enter fullscreen mode Exit fullscreen mode

And the response you get when the event is queued would look like this:

{
 "StatusCode": 202
}
Enter fullscreen mode Exit fullscreen mode

Challenges of Troubleshooting Asynchronous Invocations

Troubleshooting asynchronous AWS Lambda invocations can be challenging due to several factors:

  1. Limited Visibility: When an event is queued, you will receive a status code as confirmation. However, it’s important to note that a separate process handles the execution of your function by reading events from the queue. Therefore, you won’t receive immediate feedback confirming whether the event was successfully processed.
  2. Automatic Retries: If the function encounters an error, Lambda automatically makes two attempts to run it by default. The documentation does not mention the specific time gap between these retries. Function errors can encompass errors returned by the function’s code and runtime errors, such as timeouts.
  3. Delivery Failures: When Lambda encounters difficulties in sending a record to a configured destination, it reports DestinationDeliveryFailures to Amazon CloudWatch. This situation may arise if your configuration includes an unsupported destination type, such as an Amazon SQS FIFO queue or an Amazon SNS FIFO topic. Additionally, errors related to permissions and size limitations can also lead to delivery failures.
  4. Function Invocation Looping: One scenario where this can occur is when your function manages resources within the same AWS service that triggered it. For example, you might create a function that stores an object in an Amazon S3 bucket configured with a notification that triggers the function again, creating a loop of invocations.
  5. Concurrency Limits: Processing delays can arise due to concurrency limits. When a function is invoked repeatedly, the Lambda service may throttle the invocations, causing delays in processing.

Let’s see how we can overcome these challenges.

Error Handling and Automatic Retries

It is important to understand the retry behavior of the invoking service and any other services involved in the request to ensure proper handling of errors in asynchronous invocations.

The get-function-event-invoke-config command can be used to obtain the configuration for the asynchronous invocation of a function. Below is an example response syntax you will receive.

{
   "DestinationConfig": { 
      "OnFailure": { 
         "Destination": "string"
      },
      "OnSuccess": { 
         "Destination": "string"
      }
   },
   "FunctionArn": "string",
   "LastModified": number,
   "MaximumEventAgeInSeconds": number,
   "MaximumRetryAttempts": number
}
Enter fullscreen mode Exit fullscreen mode

Using the put-function-event-invoke-config command, You can configure a function with a maximum event age and maximum retry attempts. This command overwrites any existing configuration on the function.
For example, to configure a function with a maximum event age of two hours and no retries, you can use the AWS CLI command given below:

$ aws lambda put-function-event-invoke-config --function-name myAsyncFunction \
--maximum-event-age-in-seconds 7200 --maximum-retry-attempts 0
Enter fullscreen mode Exit fullscreen mode

You can use the update-function-event-invoke-config command to configure an option without resetting others.

You can also use the AWS Lambda console to configure asynchronous invocation settings on a function.

AWS Lambda console asynchronous invocation overview
You can select and edit asynchronous invocation configurations by navigating to the configuration tab.

Using AWS Lambda console to configure asynchronous invocation settings on a function

Lambda Destinations

Lambda Destinations provide a way to send asynchronous invocation records to various services, including SQS queues, SNS topics, Lambda functions, or EventBridge. By utilizing Lambda Destinations, you can effectively handle successful and unsuccessful execution scenarios, enhancing monitoring capabilities and error handling without requiring additional code.

You can use either the AWS CLI or the Lambda console to manage the settings for asynchronous invocation in Lambda Destinations. For instance, you can use the following command to add a failure destination to your asynchronous invocation.

$ aws lambda update-function-event-invoke-config --function-name myAsyncFunction \
--destination-config '{"OnFailure":{"Destination": "arn:aws:sqs:us-east-2:112233445566:destination"}}'
Enter fullscreen mode Exit fullscreen mode

The function execution result is a JSON document containing information about the event, the response, and the reason for sending the record. The JSON format varies depending on the destination.

An example function execution result would look like this:

{
    "version": "string,"
    "timestamp": "string,"
    "requestContext": {},
    "requestPayload": {
        "ORDER_IDS": []
    },
    "responseContext": {},
    "responsePayload": {}
}
Enter fullscreen mode Exit fullscreen mode

You can now monitor the health of your serverless applications using execution status.

Dead-Letter Queues

On the Lambda function, you can also configure a dead-letter queue to capture events that were not successfully processed. They are comparable to failure destinations. However, unlike destinations, dead-letter queues allow you to save failed messages for later debugging and analysis. They are instrumental when isolating problematic messages and investigating why their processing failed.

Use the update-function-configuration command to set up a dead-letter queue with the AWS CLI.

$ aws lambda update-function-configuration --function-name myAsyncFunction \
--dead-letter-config TargetArn=arn:aws:sns:us-east-2:123456789012:my-topic
Enter fullscreen mode Exit fullscreen mode

The response includes the request-id, the error code, and the error message. This information can be used to determine the error returned by the function or to correlate the event with logs or an AWS X-Ray trace.

Monitoring and Troubleshooting

Understanding behavior and errors cannot be based solely on error handling. Monitoring and observability, which include metrics, logs, and tracing, are also a major part of debugging and troubleshooting.

For example, monitoring Lambda functions can help manage Lambda functions’ concurrency limit, preventing throttling and processing delays.

To make things even easier, Lambda integrates with monitoring and tracing tools such as Amazon CloudWatch, AWS X-Ray, and Helios to provide monitoring of async function invocations.

Amazon CloudWatch

When a function completes event processing, Lambda sends metrics about the invocation to Amazon CloudWatch. This includes metrics for errors that occur during the invocation and should be monitored and addressed:

  • Errors — the number of invocations that result in a function error (include exceptions that both your code and the Lambda runtime throw).

  • Throttles — the number of invocation requests that are throttled (note that throttled requests and other invocation errors don’t count as errors in the previous metric).

AWS introduced three new Amazon CloudWatch metrics for asynchronous AWS Lambda function invocations:

  • AsyncEventsDropped — the number of events dropped without successfully running the function.
  • DestinationDeliveryFailures — the number of times Lambda attempts to send an event to a destination but fails (typically because of permissions, misconfigured resources, or size limits).
  • DeadLetterErrors — the number of times Lambda attempts to send an event to a dead-letter queue but fails (typically because of misconfigured resources or size limits).

These metrics provide visibility for asynchronous Lambda function invocations.

Using the error metric is highly recommended for alerting function errors. Additionally, leveraging metrics offers valuable insights into retry behavior, including the interval between retries. For instance, if a function fails due to a downstream system overload, you can rely on metrics like AsyncEventAge and Concurrency metrics to better understand the situation.

AWS X-Ray

AWS X-Ray offers powerful visualization capabilities for your application’s components, helping you identify performance bottlenecks and troubleshoot error-causing requests. However, it’s crucial to remember that AWS X-Ray doesn’t track every request. The sampling rate is set to one request per second, with an additional 5% request rate that cannot be configured. This means that relying solely on AWS X-Ray to troubleshoot a failed invocation may not be sufficient, as the failed invocation could be absent from the sampled traces.

Helios

Helios is an OpenTelemetry-based tool that allows developers to install distributed tracing easily. It enables you to gain end-to-end visibility for Lambda function invocations.

With Helios, you can enhance your monitoring capabilities by utilizing labels and alerts. These features allow you to save search queries and identify specific behaviors of interest for each Lambda function. This customization empowers you to focus on the precise data important to you, whether it be applicative events or AWS Lambda metrics. Furthermore, Helios conveniently provides automatic access to CloudWatch logs. With a simple button click, you can effortlessly retrieve the relevant logs from CloudWatch for each span, streamlining the troubleshooting process.

One notable advantage of Helios is its ability to offer users valuable insights through data and visualizations. For instance, you can easily access information such as the execution time, involved services, and error codes returned for each span, allowing for a comprehensive understanding of your application’s behavior.

Furthermore, Helios provides a convenient feature that enables the replay of specific flows. This functionality automatically generates the necessary code, configures and executes it in a different environment, and allows for a thorough investigation of the root cause of an issue. Finally, it enables verification to ensure that the solution is functioning correctly.

These powerful features offered by Helios significantly contribute to the effective troubleshooting of asynchronous AWS function invocations.

Conclusion

In conclusion, troubleshooting asynchronous AWS Lambda flows requires a comprehensive understanding of the error handling and automatic retry mechanisms provided by AWS Lambda and the available monitoring and troubleshooting tools. By utilizing a combination of dead-letter queues, Lambda Destinations, custom parameters, and adapted processing code, you can effectively handle errors and optimize your asynchronous Lambda flows to suit your specific use case.

I hope these tips and tools prove invaluable in troubleshooting your asynchronous Lambda functions easily and efficiently. Thank you for taking the time to read, and happy coding!

Top comments (0)