In my previous article we created a custom http client abstraction for making http requests in a hypothetical service. The watchful readers could have noticed that the http client created was absolutely basic and not production ready under any circumstance. A real world scenario would have at least a very basic timeout settled while executing outgoing HTTP calls.
Don’t underestimate failure in distributed applications, anything that can go wrong will go wrong. If there is a chance for a component to fail it will, sooner than you can imagine.
Circuit Breakers
Protect services by stop sending requests during prolonged failure period
Circuit Breaker is one of the most important components for making applications more resilient to failures. Basically it prevents the upstream service of failing in case of the downstream service is exhausted, basing on statistics collected of every outgoing request to a pre defined threshold.
Relying only on timeouts is not enough, many consider such approach as an anti-pattern. Reducing the request timeout from 30s to 5s for example, will still require the upstream service to use CPU, memory, IO and bandwidth either way, and it will be totally coupled to the downstream service. If the latter is overloaded, requests will be queued up in the upstream service and both services will be overloaded and not operating anyways.
For many times in my career I designed and witnessed failures cascading among services, and no one even knows where to start troubleshooting due to all services being exhausted.
This is an unnecessary coupling that we should get rid of, let’s stop designing obsolete applications.
Applying the pattern
Fail fast instead of wasting time waiting for negative replies.
In our example, we have a store service running upon user service. Store service executes synchronous http requests to users service to collect users data and store it sending another http request. See below:
// Store responsible for storing the user.
func (s *StoreService) Store(user *User) error {
userResourceV2 := fmt.Sprintf("%s/%s/%s", usersAPIV2, user.Email, user.Country)
res, err := s.client.Get(userResourceV2)
if err != nil {
return fmt.Errorf("failed to fetch user: %v", err)
}
// code omitted
res, err = s.client.Post(usersURL, "application/json", body)
if err != nil {
return fmt.Errorf("failed to store user: %v", err)
}
// code omitted
}
The http.Client is very extensible, we are going to create our own transport layer implementing the http.RoundTripper interface.
type RoundTripper interface {
RoundTrip(*Request) (*Response, error)
}
As you could have noticed, small interfaces for the win.
Before, the request struct was neither holding any dependencies nor receiving, its main responsibility was to expose the Do method. Now it’s going to take a http.RoundTripper implementation.
type (
// RequestOption is the request options.
RequestOption func(*Request)
// Request is the application http request.
Request struct {
client *http.Client
}
)
// WithRoundTripper receives the http.RoundTripper implementation.
func WithRoundTripper(rt http.RoundTripper) RequestOption {
return func(r *Request) {
r.client.Transport = rt
}
}
// NewRequest returns a new configured Request.
func NewRequest(opts ...RequestOption) *Request {
r := &Request{client: new(http.Client)}
for _, o := range opts {
o(r)
}
return r
}
Before we jump into transport implementation, let’s see what changed in theDo method:
// Do is a convenient method for executing http requests.
func (r *Request) Do(method, url, contentType string, body io.Reader) (*http.Response, error) {
req, err := http.NewRequest(method, url, body)
if err != nil {
return nil, fmt.Errorf("failed to create request %v: ", err)
}
req.Header.Set("Content-Type", contentType)
ctx, cancel := context.WithTimeout(req.Context(), reqTimeout)
defer cancel()
req = req.WithContext(ctx)
return r.client.Do(req)
}
Now it uses context with timeout for the http requests, the cancel function will cancel requests which exceeds the defined timeout.
The transport implementation is designed as follows:
type (
// Breaker is the http circuit breaker.
Breaker interface {
// Execute runs the given request if the circuit breaker is closed or half-open states.
// An error is instantly returned when the circuit breaker is tripped.
Execute(fn func() (interface{}, error))
}
// Transport is the application http transport.
Transport struct {
tripper http.RoundTripper
breaker Breaker
}
)
// RoundTrip decorates tripper.RoundTrip with a circuit breaker.
// An error is returned if the circuit breaker rejects the request.
func (t *Transport) RoundTrip(r *http.Request) (*http.Response, error) {
res, err := t.breaker.Execute(func() (interface{}, error) {
res, err := t.tripper.RoundTrip(r)
if err != nil {
return nil, err
}
if res != nil && res.StatusCode >= http.StatusInternalServerError {
return res, fmt.Errorf("http response error: %v", res.StatusCode)
}
return res, err
})
if err != nil {
return nil, err
}
return res.(*http.Response), err
}
Here we wrap the RoundTrip call making it circuit breaker aware. When the http.Response status code is 5XX then it’s an error and we should let the circuit breaker know about it, otherwise it will count successful requests.
Store Service
The StoreService exposes a Store method which executes http calls to a hypothetical users service. Lets see how it manages when the circuit breaker rejects requests:
// Store responsible for storing the user.
func (s *StoreService) Store(user *User) error {
userResourceV2 := fmt.Sprintf("%s/%s/%s", usersAPIV2, user.Email, user.Country)
res, err := s.client.Get(userResourceV2)
if err != nil {
if err == srv.ErrCircuitBreakerOpen {
// fallback to apiv1
userResourceV1 := fmt.Sprintf("%s?%s&%s", usersAPIV1, user.Email, user.Country)
res, err = s.client.Get(userResourceV1)
if err != nil {
return fmt.Errorf("failed to fetch user: user service is unavailable: %v", err)
}
} else {
return fmt.Errorf("failed to fetch user: %v", err)
}
}
// code omitted
}
Circuit Breaker Setup
The cmd/main.go
is where we tie all dependencies and pass them downstream to http, message queue, cli, etc… handlers, and the circuit breakers are no exceptions:
func main() {
breaker := newCircuitBreaker()
transport := http.NewTransport(breaker)
req := http.NewRequest(
http.WithRoundTripper(transport),
)
client := http.NewClient(req)
_ = http.NewStoreService(client)
// pass storer downstream to http handler, messaging handler, etc...
}
func newCircuitBreaker() *gobreaker.CircuitBreaker {
return gobreaker.NewCircuitBreaker(gobreaker.Settings{
Name: "HTTP Client",
Timeout: time.Second * 30,
ReadyToTrip: func(counts gobreaker.Counts) bool {
failureRatio := float64(counts.TotalFailures) / float64(counts.Requests)
return counts.Requests >= 3 && failureRatio >= 0.6
},
OnStateChange: func(name string, from gobreaker.State, to gobreaker.State) {
// do smth when circuit breaker trips.
},
})
}
I would recommend you start with gobreaker it’s very simple and has a clean API. I am not going to repeat it’s docs here, so please check the usage. However I will describe our http client circuit breaker settings:
-
Timeout
is the period of the open state -
ReadyToTrip
is called whenever a request fails in the closed state. If true is returned, the circuit breaker will be placed into the open state. -
OnStateChange
is called whenever the state of the circuit breaker changes.
Nevertheless there are more circuit breakers libraries in Go which definitely worths to look out:
Conclusion
From now on with this example, I hope you start protecting external calls from your service with circuit breakers. Bear in mind that circuit breaker is not bound only to http requests, any external call specially shared resources like Redis, RabbitMQ and even Databases should be wrapped by a circuit breaker.
The Pattern when well applied allow applications to be more resilient and responsive. In fact it does not make any sense to keep sending requests to a downstream service which is unable to handle it.
Design for failure is a large topic and is more complex than we can imagine, if you think that using circuit breakers is enough I regret to tell you it’s absolutely not. Aside of circuit breaker we should be composing it with retry and throttling.
Now just imagine every external call in every service with such protection, in medium to large distributed systems it does not scale well, I am not even mentioning deadlines, distributed tracing and instrumentation of every request.
This is where service mesh comes into the game.
Show me the code
You can check the source code containing the full implementation
Acknowledgements
Thanks to my friends Ricardo Valeriano, Italo Vietro and Felipe for reviewing the article.
Top comments (0)