Introduction
In this post, we will discuss the retry mechanism used to make systems more resilient and create a simple implementation in Go. The idea here is to develop the mechanism from some abstract information
Background
There used to be a time when multiple instances of a monolith were enough to serve users. Applications today are a lot more complex, moving a lot of information and communicating with different other applications to provide users with a service. With a lot of moving parts, it becomes more necessary to make sure your application doesn't break when interacting with third party services. Its better to let the user know that the request cannot be processed rather than make them wait.
That's why we strategically place timeouts. In Go, we do that using the context package. Then the application's logic tells it if it wants to try again or not.
We also dont want to retry too many times in a short period of time. The third party API may rescind all the requests if the number goes beyond the threshold or even block your application's IP from making any requests.
It is a good idea to retry for a fixed amount of time, with some pre-defined time intervals.
The most common way to do it is to retry after every n seconds configured by the user. This value can be obtained with respect to the rate-limiter threshold of the third party API.
One can also apply exponential backoffs. If there is an error, the next request will be done after n seconds, n*n for the request after, and so on.
But do you want the retry mechanism to work when the application makes a bad request?
Or make a request when the third party API tells the system that the service is unavailable for some time.
Design
So our retry package should have these configurations
- Max number of retries
- Standard duration between each retry
- User Defined BackOffs (Exponential or Custom)
- Bad Errors When these error occur, we stop retrying
- Retry Errors, which is a list of errors If an error outside of the list occurs, we stop retrying
One cannot have both bad and retry errors enabled for our retry functionality at the same time. Similarly, if custom or exponential intervals are given for retry, we should be omitting the variable that sets the maximum number of retries.
Implementation
For the user to run our retry package, they would have to adhere to a function signature. The function call should only return an error.
This can be easily done using closures.
Lets say our retry package has a method called Run
Run(fn func()error)
And we want to call the method Run on our own function, ThirdPartyCall(input string)(string,error)
So a call should look like
obj := retry.New()
obj.Run(func()error {
resp,err := ThirdPartyCall("input String")
if err != nil {
return err
}
// code logic
return nil
})
Run Function
For the purpose of this blog, I have not implemented separate functions for a normal retry method and the one with user specified intervals. So we will just do a check on the interval variable. If its length is zero, we will run our function in normal mode, or else we will run using the intervals specified.
These are the few things that we need to keep in mind before the implementation
- Have a count of the number of retries done so it can be compared to the threshold.
- Put the time gap not at the start of the function but after an error occurs. You don't want the system to wait until the function is called for the first time.
- Check for bad errors. If they exist, dont try again
- Check for retry errors. If they don't exist, don’t try again.
Another thing to keep in mind is that the whole request, including retries, might take a couple of seconds. So we don’t want the configuration variables to be changed while the retry process is going on. The extra space occupied isn't much, so it won't be a worry.
So the code may look like this
// Run runs the user method wrapped inside Action
// with a set number of retries according to the configuration
func (r *Retrier) Run(fn Action) error {
if len(r.intervals) > 0 {
return r.RunWithIntervals(fn)
}
var (
count int
badErrors = r.badErrors
be = r.be
maxRetries = r.maxRetries
re = r.re
retryErrors = r.retryErrors
sleep = r.sleep
)
var rn func(fn Action) error
rn = func(fn Action) error {
if err := fn(); err != nil {
if be {
if _, ok := badErrors[err]; ok {
return err
}
}
if re {
if _, ok := retryErrors[err]; !ok {
return err
}
}
count++
if count > maxRetries {
return ErrNoResponse
}
time.Sleep(sleep)
return rn(fn)
}
return nil
}
e := rn(fn)
return e
}
Run With Intervals
The code will be similar to our run function. As we already keep tabs on the count to compare it to the threshold, the same information can be used to determine how much time we need to sleep before we retry the same function. The code for this would look like
// RunWithIntervals is similar to Run. The difference is that we have a slice
// of time durations corresponding to each retry here, instead of maxRetries
// and constant sleep gap.
func (r *Retrier) RunWithIntervals(fn Action) error {
var (
count int
badErrors = r.badErrors
be = r.be
maxRetries = r.maxRetries
re = r.re
retryErrors = r.retryErrors
intervals = r.intervals
)
var rn func(fn Action) error
rn = func(fn Action) error {
if err := fn(); err != nil {
if be {
if _, ok := badErrors[err]; ok {
return err
}
}
if re {
if _, ok := retryErrors[err]; !ok {
return err
}
}
count++
if count >= maxRetries {
return ErrNoResponse
}
time.Sleep(intervals[count])
return rn(fn)
}
return nil
}
e := rn(fn)
return e
}
And that's it. The Retry package is ready to use. Configuration of the package is not in the scope of this post, but it can be found here. I have used the functional options pattern to set the configuration.
You can checkout the package and its test cases here
Top comments (0)