Background
I worked for a food delivery service in Japan, developing APIs that integrate data with large restaurant chain systems. When the restaurant system experienced downtime or network issues, API errors could prevent order completion, leading to cancellations that negatively impacted Gross Merchandise Value (GMV).
To reduce cancellations, I implemented a retry mechanism that reattempted requests automatically, improving order completion and service reliability.
Retry Strategies
There are several common retry strategies that can be used to handle failures in APIs. Understanding these strategies and selecting the appropriate one for your system is crucial:
- Cancel: Stop retrying after the first failure.
- Retry: Attempt the request again immediately after failure.
- Retry with Same Interval: Retry at fixed intervals.
- Retry with Backoff: Increase the interval between retries exponentially to avoid overwhelming the system.
- Retry with Backoff + Jitter: Add randomness (jitter) to the backoff interval to prevent multiple systems from retrying at the same time.
Initially, I considered a retry strategy that combined backoff with jitter. However, I ultimately chose retry with backoff only. Since API requests are dependent on the user's order timing and are made randomly, I determined that adding jitter was unnecessary.
Designing the Retry Mechanism
Once the retry strategy was chosen, several parameters needed to be considered when designing the retry mechanism. In addition to selecting a backoff strategy, the following parameters must be determined:
- Retry Limit: The maximum number of retry attempts before abandoning the request. Too many retries could lead to excessive load on the system.
- Retry Count: The number of successful retries allowed before the request is considered successful.
- Retry Interval: The initial interval between retries, which will increase exponentially in a backoff strategy.
- Maximum Wait Time: A limit on the total time spent on retries to prevent excessively long delays.
- Error Types: Not all errors should trigger retries. It’s important to define which types of errors should be retried and which should result in an immediate failure.
When designing the retry strategy, it's important to consider how these parameters will impact system performance and user experience. In my case, I carefully adjusted the retry count and interval to ensure the system could maintain a high order completion rate without overloading the restaurant systems.
Limitations of Retry and Circuit Breaker
Frequent retries can add unnecessary load to the system and potentially hinder recovery from failures. For example, if the server is down and all requests continue to fail, retrying will not resolve the issue. To avoid this, it is effective to implement a circuit breaker.
A circuit breaker stops requests after a certain number of failures, helping to reduce load on the system and preventing unnecessary retries. By implementing a circuit breaker, unnecessary requests can be avoided, allowing the system to recover from a failure more effectively. If a circuit breaker is used, it can be combined with the retry strategy to prevent overload and maintain overall system health.
Conclusion
API retry strategies play an essential role in ensuring the reliability of services. In real-time systems like food delivery services, the design of the retry mechanism can significantly impact overall system performance.
By carefully configuring retry counts, intervals, and limits, I was able to improve order completion rates while preventing system overload. Additionally, when retries are ineffective, combining the retry mechanism with a circuit breaker can help avoid unnecessary requests and enhance system reliability.
Hopefully, this article will serve as a helpful guide when designing retry strategies, enabling you to create more resilient systems.
References
- Retry Pattern in Microservices - GeeksforGeeks
- Retry Pattern - DoorDash
-
Resilience4j Backoff + Jitter - Baeldung
- The diagram in this article was especially helpful in visualizing the concept of jitter.
Top comments (0)