Modern web projects rarely exist without any external service integrations. It can be SMS gates, partner's API, ad pixels, and so on. But what risks does it bring for us?
Let's say we have a high load online store with SMS notifications on order creation. Message sends via some external API and it has gone offline. We'll experience timeouts from it and how will our app handle it? It depends on the client configuration, but in any case, it will lead to a lower response rate, extra resource usage, and the queue of the requests.
Another example: we have a personalized products feed on the main page and we use Redis to reduce the response rate (since it responses in 1ms the whole feed of 20 products will take 20ms). But in case of Redis goes rebalancing (or worse if it has lost the node) the response rate will increase to 500 ms for each product and 10 seconds for the whole feed.
Are there any solutions to prevent or minimize such occasions? Well, that's the case when the "Circuit Breaker" pattern comes in handy.
Pattern overview
The main principle of the pattern is easy as a cake:
If the external API is not available – it is useless to send requests to it, it won't respond.
And vice versa: if an external API is available – you can request it, perhaps it will respond with something useful.
To visualize it we'll stick to the SMS gate example. Let's imagine all that logic as a toggle. When the service is up we'll pass requests to it. That state is called "Closed" so you can memorize it as a real circuit toggle.
And, when it's down – we break the chain (The "Open" state).
What to do in the "Open" state is up to you and usually depends on the integration purpose. You can:
- Return the last successful response while it's fresh enough;
- Return default value;
- Use different strategies;
- Just return an error.
Since we're going to notify users, we may implement an email notification for such situations.
And the last question to be answered: who'll change the state of the circuit? "Red button" to control it manually would be useful, but not as the only solution. I don't think anybody wants to keep their finger on the pulse. Then we need rules to do that automatically.
Let's say we're going to break the chain on every error from the API. But when to get it back to the "Closed" state? To do that we have to introduce one more state - "Half-open". The purpose of it is to pass some requests to check if the API is alive. So it works as an intermediate level on the path from "Open" to "Closed" state.
So then the logic is simple: on any error we switch to the Open and prevent requesting the API. When some time exceeded we switch to the Half-open state and pass request to check is everything ok. In case of error, we switch back to the Open and on success – to the closed.
And to make it more efficient we need to add two things:
- Error policies – to ignore some of the expected errors;
- Thresholds or the braking strategy – to describe the rule of breaking the chain.
Summarizing everything mentioned let's look at the activity diagram:
Hands-on
API
Before we start, let's create a test environment. Our go-app should listen for two HTTP endpoints. One will serve as our SMS gateway API mock. And second, will toggle the server's status "good" → "broken" and back. Let's do that in a separate file.
// server.go
package main
import (
"log"
"net/http"
"os"
)
// ExampleServer is a test server to check the "CircuitBreaker" pattern
type ExampleServer struct {
addr string
logger *log.Logger
isEnabled bool
}
// NewExampleServer creates the instance of our server
func NewExampleServer(addr string) *ExampleServer {
return &ExampleServer{
addr: addr,
logger: log.New(os.Stdout, "Server\t", log.LstdFlags),
isEnabled: true,
}
}
// ListenAndServe starts listening on the address provided
// on creating the instance.
func (s *ExampleServer) ListenAndServe() error {
// The main endpoint we will request to
http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
if s.isEnabled {
s.logger.Println("responded with OK")
w.WriteHeader(http.StatusOK)
} else {
s.logger.Println("responded with Error")
w.WriteHeader(http.StatusInternalServerError)
}
})
// Toggle endpoint to switch on and off responses from the main one
http.HandleFunc("/toggle", func(w http.ResponseWriter, r *http.Request) {
s.isEnabled = !s.isEnabled
s.logger.Println("toggled. Is enabled:", s.isEnabled)
w.WriteHeader(http.StatusOK)
})
return http.ListenAndServe(s.addr, nil)
}
Client
What we gonna do is to create a simple Client structure with a single method Send
. We'll send the request to the /
endpoint to emulate an external integration. And visiting localhost:8080/toggle will switch the server's response to the error. Let's create our client.
// client.go
package main
import (
"errors"
"net/http"
)
type NotificationClient interface {
Send() error // We ignore all the arguments to simplify the demo
}
type SmsClient struct {
baseUrl string
}
func NewSmsClient(baseUrl string) *SmsClient {
return &SmsClient{
baseUrl: baseUrl,
}
}
func (s *SmsClient) Send() error {
url := s.baseUrl + "/"
resp, err := http.Get(url)
if err != nil {
return err
}
defer resp.Body.Close()
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
return errors.New("bad response")
}
return nil
}
The client is ready. All we need to make it work is to initialize everything in our main.go
file.
// main.go
package main
import (
"log"
"os"
"time"
)
func main() {
logger := log.New(os.Stdout, "Main\t", log.LstdFlags)
server := NewExampleServer(":8080")
go func() {
_ = server.ListenAndServe()
}()
client := NewSmsClient("http://127.0.0.1:8080")
for {
err := client.Send()
time.Sleep(1 * time.Second)
if err != nil {
logger.Println("caught an error", err)
}
}
}
And that's it! we've created all the required environments and if we run the code it will log to the standard output for every response of the server.
Server 2021/09/22 21:51:30 responded with OK
Server 2021/09/22 21:51:31 responded with OK
Server 2021/09/22 21:51:32 responded with OK
Server 2021/09/22 21:51:32 toggled. Is enabled: false
Server 2021/09/22 21:51:33 responded with Error
Main 2021/09/22 21:51:34 caught an error bad response
Server 2021/09/22 21:51:34 responded with Error
Main 2021/09/22 21:51:35 caught an error bad response
Everything works as we expected. Now we have an "unstable" API and out insecure client. So to make things better we can implement the "Circuit Breaker" method.
Circuit Breaker
As it usually happens with useful patterns there are a lot of implementations of it. So if you want to create your solution – you can use the scheme above. But in this article, we're going to use a great library of Sony:
gobreaker
gobreaker implements the Circuit Breaker pattern in Go.
Installation
go get github.com/sony/gobreaker/v2
Usage
The struct CircuitBreaker
is a state machine to prevent sending requests that are likely to fail.
The function NewCircuitBreaker
creates a new CircuitBreaker
.
The type parameter T
specifies the return type of requests.
func NewCircuitBreaker[T any](st Settings) *CircuitBreaker[T]
You can configure CircuitBreaker
by the struct Settings
:
type Settings struct {
Name string
MaxRequests uint32
Interval time.Duration
Timeout time.Duration
ReadyToTrip func(counts Counts) bool
OnStateChange func(name string, from State, to State)
IsSuccessful func(err error) bool
}
-
Name
is the name of theCircuitBreaker
. -
MaxRequests
is the maximum number of requests allowed to pass through when theCircuitBreaker
is half-open IfMaxRequests
is 0,CircuitBreaker
allows only 1 request. -
Interval
is the cyclic period…
So to download it just type:
go get github.com/sony/gobreaker
We can connect it right in the client's implementation but it would be a bit messy. I prefer wrapping such structures with proxies. Let's create that proxy and implement the NotificationClient
interface to make the interaction equal.
// circuit_breaker.go
package main
import (
"log"
"os"
"time"
"github.com/sony/gobreaker"
)
type ClientCircuitBreakerProxy struct {
client NotificationClient
logger *log.Logger
gb *gobreaker.CircuitBreaker // downloaded lib structure
}
// shouldBeSwitchedToOpen checks if the circuit breaker should
// switch to the Open state
func shouldBeSwitchedToOpen(counts gobreaker.Counts) bool {
failureRatio := float64(counts.TotalFailures) / float64(counts.Requests)
return counts.Requests >= 3 && failureRatio >= 0.6
}
func NewClientCircuitBreakerProxy(client NotificationClient) *ClientCircuitBreakerProxy {
logger := log.New(os.Stdout, "CB\t", log.LstdFlags)
// We need circuit breaker configuration
cfg := gobreaker.Settings{
// When to flush counters int the Closed state
Interval: 5 * time.Second,
// Time to switch from Open to Half-open
Timeout: 7 * time.Second,
// Function with check when to switch from Closed to Open
ReadyToTrip: shouldBeSwitchedToOpen,
OnStateChange: func(_ string, from gobreaker.State, to gobreaker.State) {
// Handler for every state change. We'll use for debugging purpose
logger.Println("state changed from", from.String(), "to", to.String())
},
}
return &ClientCircuitBreakerProxy{
client: client,
logger: logger,
gb: gobreaker.NewCircuitBreaker(cfg),
}
}
func (c *ClientCircuitBreakerProxy) Send() error {
// We call the Execute method and wrap our client's call
_, err := c.gb.Execute(func() (interface{}, error) {
err := c.client.Send()
return nil, err
})
return err
}
Let's take a look at our new Proxy. Here are two of the most important things:
-
ReadyToTrip
setting defines the function, which detects when the chain should be broken; -
Timeout
setting describes how often we should recheck the API's health (and switch to the Half-open state).
After preparing all the configurations all we need to do is simply wrap the Client's method. And to start using proxy we need to add few lines to the main.go
file.
// main.go
package main
// ...
func main() {
// ...
var client NotificationClient
// Create a common Client
client = NewSmsClient("http://127.0.0.1:8080")
// And then wrap it
client = NewClientCircuitBreakerProxy(client)
for {
err := client.Send()
time.Sleep(1 * time.Second)
if err != nil {
logger.Println("caught an error", err)
}
}
}
Everything is done! Let's run the code and give it a try. When the server is active it works as before.
Server 2021/09/22 22:09:32 responded with OK
Server 2021/09/22 22:09:33 responded with OK
Server 2021/09/22 22:09:34 responded with OK
But if we toggle it via /toggle
endpoint – our shouldBeSwitchedToOpen
method enters the game.
Server 2021/09/22 22:11:12 responded with OK
Server 2021/09/22 22:11:12 toggled. Is enabled: false
Server 2021/09/22 22:11:13 responded with Error
Main 2021/09/22 22:11:14 caught an error bad response
Server 2021/09/22 22:11:14 responded with Error
Main 2021/09/22 22:11:15 caught an error bad response
Server 2021/09/22 22:11:15 responded with Error
Main 2021/09/22 22:11:16 caught an error bad response
Server 2021/09/22 22:11:16 responded with Error
Main 2021/09/22 22:11:17 caught an error bad response
Server 2021/09/22 22:11:17 responded with Error
CB 2021/09/22 22:11:17 state changed from closed to open
Main 2021/09/22 22:11:18 caught an error bad response
Main 2021/09/22 22:11:19 caught an error circuit breaker is open
Main 2021/09/22 22:11:20 caught an error circuit breaker is open
We configured it to recheck the API's health every 5 seconds. So that's how we can debug it – toggle the server's behavior to the active.
CB 2021/09/22 22:13:13 state changed from closed to open
Main 2021/09/22 22:13:14 caught an error bad response
Main 2021/09/22 22:13:15 caught an error circuit breaker is open
Server 2021/09/22 22:13:15 toggled. Is enabled: true
Main 2021/09/22 22:13:16 caught an error circuit breaker is open
Main 2021/09/22 22:13:17 caught an error circuit breaker is open
Main 2021/09/22 22:13:18 caught an error circuit breaker is open
Main 2021/09/22 22:13:19 caught an error circuit breaker is open
Main 2021/09/22 22:13:20 caught an error circuit breaker is open
CB 2021/09/22 22:13:20 state changed from open to half-open
Server 2021/09/22 22:13:20 responded with OK
And that's it! That simple.
Anything else?
Yes, there are some tips to use that pattern. First of all, you need good monitoring. That means that logs with all the client's requests are required to have an opportunity to reproduce the error. And you need metrics to see how often switches happen and so as to react in time.
The next important thing is use cases. Some integrations require Retry-pattern, some – simple error handling with default values, and only integrations, which can slow your project down or even kill it deserves use with Circuit Breaker.
Hope you'll find it handy. Enjoy coding! <3
Top comments (2)
I have an API server which sends requests to APIs downstream. I want to implement circuit breaker which handles the go breaker counts across requests, i.e first client sends request and if the circuit is open, I want to mark it open for future calls by another client until the Timeout.
Is there a way to persist the gobreaker.Counts across requests ?
Great article!
Here you go.... twitter.com/golangch/status/144085...