In this article, we’ll explore the Circuit Breaker pattern — a crucial design pattern for building resilient microservices in Go.
Building Self-Healing and Fault-Tolerant Systems
When designing distributed systems, it’s not a question of if something will fail — it’s when.
Even the most stable services can occasionally hang, slow down, or fail under high load.
In production systems, retries and backoff strategies help, but sometimes they’re not enough.
That’s where Circuit Breaker patterns come in — providing a way to gracefully degrade your system instead of letting failures cascade.
💡 What Is a Circuit Breaker?
A Circuit Breaker acts like an electrical switch.
When a downstream service starts failing too often, the breaker “opens” — stopping further requests and giving the failing system time to recover.
It prevents resource exhaustion, retry storms, and latency chain reactions across your microservices.
Circuit Breaker States:
- Closed → Everything works fine. Requests flow normally.
- Open → Too many failures occurred. Stop requests immediately.
- Half-Open → Test a few requests to check if recovery happened.
⚙️ Example: Implementing a Simple Circuit Breaker in Go
Let’s simulate a production scenario: a Go service calling a flaky downstream API.
package main
import (
"errors"
"fmt"
"sync"
"time"
)
type CircuitBreaker struct {
failureCount int
lastFailureTime time.Time
state string // "closed", "open", "half-open"
mu sync.Mutex
threshold int
resetTimeout time.Duration
}
func NewCircuitBreaker(threshold int, resetTimeout time.Duration) *CircuitBreaker {
return &CircuitBreaker{
threshold: threshold,
resetTimeout: resetTimeout,
state: "closed",
}
}
func (cb *CircuitBreaker) Call(fn func() error) error {
cb.mu.Lock()
defer cb.mu.Unlock()
switch cb.state {
case "open":
if time.Since(cb.lastFailureTime) > cb.resetTimeout {
cb.state = "half-open"
} else {
return errors.New("circuit breaker: open (request blocked)")
}
}
err := fn()
if err != nil {
cb.failureCount++
cb.lastFailureTime = time.Now()
if cb.failureCount >= cb.threshold {
cb.state = "open"
fmt.Println("⚠️ Circuit breaker opened!")
}
return err
}
// success
cb.failureCount = 0
if cb.state == "half-open" {
cb.state = "closed"
fmt.Println("✅ Circuit breaker closed (service recovered)")
}
return nil
}
Usage Example
func flakyService() error {
if time.Now().UnixNano()%2 == 0 {
return errors.New("temporary network error")
}
return nil
}
func main() {
cb := NewCircuitBreaker(3, 5*time.Second)
for i := 0; i < 10; i++ {
err := cb.Call(flakyService)
if err != nil {
fmt.Println("Request failed:", err)
} else {
fmt.Println("Request succeeded!")
}
time.Sleep(500 * time.Millisecond)
}
}
🧩 Output (simplified):
Request succeeded!
Request failed: temporary network error
Request failed: temporary network error
⚠️ Circuit breaker opened!
Request failed: circuit breaker: open (request blocked)
✅ Circuit breaker closed (service recovered)
🔁 Integration with Retry and Backoff
Circuit breakers shine when combined with retry and backoff patterns (as covered in your previous article).
Example: you can wrap the breaker call with a retry function.
func resilientCall(cb *CircuitBreaker, fn func() error) error {
for i := 0; i < 3; i++ {
err := cb.Call(fn)
if err == nil {
return nil
}
time.Sleep(time.Duration(i+1) * 200 * time.Millisecond)
}
return errors.New("all retries failed")
}
This gives you layered resilience:
- Retry with exponential backoff for transient errors
- Circuit breaker for systemic failures
🧩 Production-Ready Alternatives
You don’t always need to reinvent the wheel.
Some great open-source implementations exist for Go:
-
sony/gobreaker– production-proven, simple API -
afex/hystrix-go– Netflix-style circuit breaker -
go-resilience– includes retry, timeout, and breaker
Example using sony/gobreaker:
import "github.com/sony/gobreaker"
var cb *gobreaker.CircuitBreaker
func init() {
st := gobreaker.Settings{
Name: "HTTPClient",
MaxRequests: 5,
Interval: 0,
Timeout: 10 * time.Second,
ReadyToTrip: func(counts gobreaker.Counts) bool {
return counts.ConsecutiveFailures > 3
},
}
cb = gobreaker.NewCircuitBreaker(st)
}
🔍 Observability and Metrics
To make circuit breakers production-grade, track:
- Success/failure rates
- Open/close transitions
- Latency during half-open state
- Log structured events for monitoring
Integration with Prometheus, Grafana, or OpenTelemetry gives you full visibility.
💬 Key Takeaways
✅ Circuit breakers protect your systems from cascading failures.
✅ Combine them with retry, backoff, and rate limiting for full resilience.
✅ Use context cancellation to cleanly handle timeout or shutdown scenarios.
✅ Always monitor breaker state transitions in production.
Happy Coding! 🚀
Top comments (0)