In the world of cloud computing, managing API request rates efficiently can make or break your application's performance. Today, I'll share my journey of implementing a robust rate-limiting solution using AWS Step Functions and Lambda.
I have been working with external API calls for a while and have noticed they can sometimes fail for various reasons, such as network issues, server downtime, or rate limits on the server. So, I have built this solution to have a robust system to tackle this problem.
In this solution, we will leverage the AWS Step Function and Lambda Functions to construct a reliable retry mechanism. The State Machine will consist of a collection of Lambda functions invoked and stitched together to produce results. This article will walk you through the step-by-step guide.
The main objective we are trying to solve:
While Step Functions inherently support retries within tasks, our specific challenge involves handling API rate limits from the server we are communicating with. The server imposes a rate limit and responds with a 429 status code if too many requests are made from the same IP address within a short period.
For the simplicity of the architecture, I have shown one retry using two lambda functions; this can be increased easily during implementation.
Workflow Explanation
- User Invokes Step Function State Machine: The process begins when a user initiates the step function state machine. This could be triggered through an API call, a scheduled event, or another AWS service.
- Step Function Invokes Lambda (1st Attempt): The step function invokes the first Lambda function (Lambda 1). This Lambda function is responsible for making the API call.
- Response: Status: Lambda 1 Executes the API call and returns a status response. This response indicates whether the API call was successful (e.g., status code 200) or failed (e.g., any status code other than 200).
- If Failure Status ≠ 200 (2nd Attempt): If the response from Lambda 1 indicates a failure (status code not equal to 200), the step function will proceed to invoke a retry mechanism. This could involve retrying the same Lambda function or invoking a different Lambda function (Lambda 2) to handle the retry attempt.
- Response: Status: Lambda 2 It attempts to execute the API call and returns a status response. Similar to the first Attempt, this response will indicate whether the retry was successful.
- If Success Status = 200: If either Lambda 1 or Lambda 2 Successfully executes the API call and returns a status code of 200, the step function completes successfully, and the user is notified of the success.
- If Failure Even After Retries: Then we will fail the step function and forward the API error to the user with the appropriate status code.
To explain the architecture easily, I have created the above diagram with one retry only, but we will build the solution with two retries. Below is the state machine diagram.
Step-by-Step Guide
Create a base lambda orchestrator function:
This lambda function will help us in orchestrating the state machine. Executing the state machine and handling logic based on the execution status.Create a function URL for the lambda function:
Now that the lambda function is ready, we can set up a function URL to trigger/send a request to the lambda function using it. Refer to the article below to turn any lambda function into an API with a function URL.
How to use AWS Lambda to trigger “any” script as an API call | by Somil Gupta | Technology Hits | Medium
Somil Gupta ・ ・
Medium
- Create child lambda functions: These will be simple lambda functions acting as a proxy; they will not handle any logic.
We have to create the same three lambda functions using the step_function_child_lambda code.
- Define Step Function State Machine: Next, we'll create a Step Functions state machine with a retry mechanism. Here is an example definition in JSON.
Complete code to implement this is available here:
Testing the State Machine
Trigger the state machine execution using the first lambda function URL and monitor it through the AWS State Machine Console. You should see the retries and the final result, whether it succeeds or fails.
Conclusion —
Implementing a robust API retry mechanism using AWS Step Functions and Lambda is a powerful way to enhance the reliability of your external unreliable API integrations. I have worked too much with the vendor APIs, and their reliability is something you can not trust. They have rate limits, server IP-based wait times, and so on. This retry using different lambda functions will give us different server URLs, preventing IP-based wait time blocking plus the retry mechanism. I hope my experience inspires you to explore innovative solutions for your own cloud computing challenges.
Top comments (2)
I like this strategy of using Step Functions, but the need to manually add the proxies doesn't make sense to me. We can use a map state with a predefined retry limit to control this and a choice state to finish the map based on the response from the external API (task state). This way, we won't have to change the state machine definition each time we want to increase or decrease the retries.
Thank you for your thoughtful suggestion. While your proposed strategy using a map state and choice state in Step Functions is elegant, it doesn't fully address our specific challenge. Let me clarify:
1. IP-based rate limiting: Our main issue is that the server blocks requests based on IP address. Simple retries from the same IP won't bypass this limitation.
2. Multiple Lambda functions: We're currently using multiple Lambda functions, each with a different IP address, to work around this IP-based rate limit.
3. Need for IP rotation: The core of our strategy is to rotate through different IP addresses for each request, not just retry with the same IP.
So while we can't eliminate the need for multiple "proxies" (Lambda functions with different IPs), your suggested Step Functions structure could indeed help us manage this process more efficiently.