The circuit breaker state machine
When a third-party API or downstream service fails, executing uncontrolled retries is dangerous. It triggers a phenomenon known as "retry storms", overwhelming the unhealthy dependency and amplifying the outage.
To prevent cascading failures, Orkai incorporates a robust Circuit Breaker built completely from scratch without external libraries. It monitors request results and transitions dynamically among three states:
- CLOSED: Normal operations. Calls flow freely. If the error threshold is reached, the breaker trips.
- OPEN: The circuit detected that the downstream service is unhealthy. Outbound requests are blocked in memory instantly with a fast-fail error, avoiding network transmission to protect the infra.
-
HALF-OPEN: After a configured cooldown period, the circuit allows a limited number of trial requests. If any fail, it trips back to
OPEN. If all pass, it restores theCLOSEDstate.
To ensure absolute thread safety under heavy concurrent access, we protect all state changes and metrics counters using mutual exclusion locks (sync.Mutex).
[DIAGRAM_CIRCUIT_BREAKER]
type CircuitBreakerState int
const (
StateClosed CircuitBreakerState = iota
StateOpen
StateHalfOpen
)
type CircuitBreaker struct {
mu sync.Mutex
state CircuitBreakerState
failures int
successes int
threshold int
openTime time.Time
cooldown time.Duration
}
func (cb *CircuitBreaker) Execute(operation func() error) error {
cb.mu.Lock()
if cb.state == StateOpen {
if time.Since(cb.openTime) > cb.cooldown {
cb.state = StateHalfOpen // Cooldown expired, transition to test mode
} else {
cb.mu.Unlock()
return fmt.Errorf("circuit breaker is OPEN") // Fast-Fail in memory
}
}
cb.mu.Unlock()
err := operation()
cb.mu.Lock()
defer cb.mu.Unlock()
if err != nil {
cb.recordFailure()
return err
}
cb.recordSuccess()
return nil
}
Exponential Backoff with Progressive Backoff
For short-lived network glitches, immediate retries are highly ineffective. The Exponential Backoff algorithm solves this by progressively increasing the delay duration between consecutive retries geometrically: $delay = base \times 2^{attempt}$.
For example, the first retry waits 100ms, the second 200ms, the third 400ms, giving the downstream server or network path breathing room to recover.
func ExecuteWithRetry(maxRetries int, baseDelay time.Duration, op func() error) error {
var err error
for attempt := 0; attempt < maxRetries; attempt++ {
err = op()
if err == nil {
return nil // Operation completed successfully
}
// Progressive exponential delay calculation: base * 2^attempt
delay := baseDelay * time.Duration(math.Pow(2, float64(attempt)))
time.Sleep(delay) // Block execution for the calculated interval
}
return err
}
Technical Terms Explained
- Cascading Failure: A domino effect where a failure in one local component saturates common resources (like database connections or CPU threads), propagating errors throughout the entire architecture.
- Fast-Fail: A design pattern that rejects operations immediately when it knows a target dependency is unreachable, avoiding resources being held up in waiting states.
- Exponential Backoff: A mathematical progressive backoff algorithm used to back off and delay retry attempts sequentially, giving target services time to recover.
Top comments (0)