Your service calls an external API. It goes down. Your threads pile up waiting for timeouts.
Your whole app dies. The Circuit Breaker pattern exists to prevent exactly this.
What We're Building
A production-grade Circuit Breaker with three states, configurable thresholds, and zero unsafe code.
┌─────────────────────────────────┐
│ │
failures >= threshold call succeeds
│ │
┌───────────────▼──────────┐ ┌────────────┴─────────┐
│ │ │ │
│ CLOSED │ │ HALF-OPEN │
│ (requests pass through)│ │ (one probe call) │
│ │ │ │
└──────────────────────────┘ └───────────────────────┘
▲ │
│ call fails│
│ ▼
│ ┌────────────────────────┐
│ │ │
│ timeout expires │ OPEN │
└──────────────────│ (requests rejected) │
│ │
└────────────────────────┘
Three states:
- Closed — normal operation, requests flow through, failures are counted
- Open — service is assumed down, requests are rejected immediately (fail fast)
- Half-Open — after a timeout, one probe request goes through to check recovery
Step 1 — Model the State Machine
Start with the types. In Rust, states map naturally to an enum.
use std::time::{Duration, Instant};
#[derive(Debug, Clone, PartialEq)]
pub enum CircuitState {
Closed,
Open { opened_at: Instant },
HalfOpen,
}
Open carries an Instant so we know when to transition to Half-Open. No booleans, no stringly-typed states.
Step 2 — Define the Configuration
Separate config from runtime state — single responsibility, easy to test.
#[derive(Debug)]
pub struct CircuitBreakerConfig {
/// How many consecutive failures before opening the circuit
pub failure_threshold: u32,
/// How long to wait in Open before probing again
pub recovery_timeout: Duration,
/// How many consecutive successes in Half-Open to close again
pub success_threshold: u32,
}
impl Default for CircuitBreakerConfig {
fn default() -> Self {
Self {
failure_threshold: 5,
recovery_timeout: Duration::from_secs(30),
success_threshold: 2,
}
}
}
Step 3 — Build the Circuit Breaker
use std::sync::{Arc, Mutex};
#[derive(Debug)]
pub struct CircuitBreaker {
config: CircuitBreakerConfig,
state: Mutex<CircuitState>,
failure_count: Mutex<u32>,
success_count: Mutex<u32>,
}
impl CircuitBreaker {
pub fn new(config: CircuitBreakerConfig) -> Arc<Self> {
Arc::new(Self {
config,
state: Mutex::new(CircuitState::Closed),
failure_count: Mutex::new(0),
success_count: Mutex::new(0),
})
}
}
We wrap it in Arc<Self> immediately — a Circuit Breaker is always shared across threads.
Step 4 — The Core: call()
This is where the state machine lives.
#[derive(Debug, thiserror::Error)]
pub enum CircuitError<E> {
#[error("Circuit is open — service unavailable")]
Open,
#[error("Service call failed: {0}")]
Inner(E),
}
impl CircuitBreaker {
pub fn call<F, T, E>(&self, f: F) -> Result<T, CircuitError<E>>
where
F: FnOnce() -> Result<T, E>,
{
// 1. Check if we should allow the request
self.check_state()?;
// 2. Execute the call
match f() {
Ok(value) => {
self.on_success();
Ok(value)
}
Err(e) => {
self.on_failure();
Err(CircuitError::Inner(e))
}
}
}
fn check_state(&self) -> Result<(), CircuitError<()>> {
let mut state = self.state.lock().unwrap();
match &*state {
CircuitState::Closed => Ok(()),
CircuitState::Open { opened_at } => {
// Has the recovery timeout elapsed?
if opened_at.elapsed() >= self.config.recovery_timeout {
*state = CircuitState::HalfOpen;
*self.success_count.lock().unwrap() = 0;
Ok(()) // allow the probe request
} else {
Err(CircuitError::Open)
}
}
CircuitState::HalfOpen => Ok(()), // allow one probe
}
}
fn on_success(&self) {
let state = self.state.lock().unwrap().clone();
match state {
CircuitState::HalfOpen => {
let mut successes = self.success_count.lock().unwrap();
*successes += 1;
if *successes >= self.config.success_threshold {
*self.state.lock().unwrap() = CircuitState::Closed;
*self.failure_count.lock().unwrap() = 0;
*successes = 0;
tracing::info!("Circuit breaker closed — service recovered");
}
}
CircuitState::Closed => {
// Reset failure count on success
*self.failure_count.lock().unwrap() = 0;
}
CircuitState::Open { .. } => {} // shouldn't happen
}
}
fn on_failure(&self) {
let state = self.state.lock().unwrap().clone();
match state {
CircuitState::Closed => {
let mut failures = self.failure_count.lock().unwrap();
*failures += 1;
if *failures >= self.config.failure_threshold {
*self.state.lock().unwrap() = CircuitState::Open {
opened_at: Instant::now(),
};
tracing::warn!(
failures = *failures,
"Circuit breaker opened — too many failures"
);
}
}
CircuitState::HalfOpen => {
// Probe failed — back to Open
*self.state.lock().unwrap() = CircuitState::Open {
opened_at: Instant::now(),
};
tracing::warn!("Circuit breaker re-opened — probe failed");
}
CircuitState::Open { .. } => {}
}
}
}
Step 5 — Async Support
Real services use async. Wrap the sync version with an async variant:
impl CircuitBreaker {
pub async fn call_async<F, Fut, T, E>(&self, f: F) -> Result<T, CircuitError<E>>
where
F: FnOnce() -> Fut,
Fut: Future<Output = Result<T, E>>,
{
self.check_state().map_err(|_| CircuitError::Open)?;
match f().await {
Ok(value) => {
self.on_success();
Ok(value)
}
Err(e) => {
self.on_failure();
Err(CircuitError::Inner(e))
}
}
}
}
Step 6 — Wire It Up
A real example — wrapping an HTTP client call:
use std::sync::Arc;
#[derive(Clone)]
pub struct PaymentGatewayClient {
http: reqwest::Client,
breaker: Arc<CircuitBreaker>,
base_url: String,
}
impl PaymentGatewayClient {
pub fn new(base_url: String) -> Self {
let config = CircuitBreakerConfig {
failure_threshold: 3,
recovery_timeout: Duration::from_secs(60),
success_threshold: 1,
};
Self {
http: reqwest::Client::new(),
breaker: CircuitBreaker::new(config),
base_url,
}
}
pub async fn charge(&self, amount: u64, token: &str) -> Result<ChargeResponse, AppError> {
self.breaker
.call_async(|| async {
self.http
.post(format!("{}/charge", self.base_url))
.json(&serde_json::json!({ "amount": amount, "token": token }))
.send()
.await?
.json::<ChargeResponse>()
.await
})
.await
.map_err(|e| match e {
CircuitError::Open => AppError::ServiceUnavailable("payment gateway"),
CircuitError::Inner(e) => AppError::HttpError(e),
})
}
}
Callers never deal with circuit breaker logic — it's fully encapsulated. ✅
Step 7 — Observability
A Circuit Breaker is useless if you can't see it. Expose state via metrics:
impl CircuitBreaker {
pub fn state_label(&self) -> &'static str {
match *self.state.lock().unwrap() {
CircuitState::Closed => "closed",
CircuitState::Open { .. } => "open",
CircuitState::HalfOpen => "half_open",
}
}
}
// In your metrics handler (e.g. Prometheus via metrics crate)
fn record_circuit_state(name: &str, breaker: &CircuitBreaker) {
let state = match breaker.state_label() {
"closed" => 0.0,
"half_open" => 0.5,
"open" => 1.0,
_ => -1.0,
};
metrics::gauge!("circuit_breaker_state", state, "name" => name.to_string());
}
Alert when circuit_breaker_state == 1.0 for more than 2 minutes. That's your on-call trigger.
Testing the State Transitions
#[cfg(test)]
mod tests {
use super::*;
fn failing_call() -> Result<(), &'static str> {
Err("timeout")
}
fn succeeding_call() -> Result<&'static str, ()> {
Ok("ok")
}
#[test]
fn opens_after_threshold_failures() {
let cb = CircuitBreaker::new(CircuitBreakerConfig {
failure_threshold: 3,
..Default::default()
});
for _ in 0..3 {
let _ = cb.call(|| failing_call());
}
// Next call should be rejected immediately
let result = cb.call(|| succeeding_call());
assert!(matches!(result, Err(CircuitError::Open)));
}
#[test]
fn recovers_after_timeout() {
let cb = CircuitBreaker::new(CircuitBreakerConfig {
failure_threshold: 1,
recovery_timeout: Duration::from_millis(10), // short for tests
success_threshold: 1,
});
let _ = cb.call(|| failing_call());
assert_eq!(cb.state_label(), "open");
std::thread::sleep(Duration::from_millis(20));
// Probe call succeeds → back to Closed
let _ = cb.call(|| succeeding_call());
assert_eq!(cb.state_label(), "closed");
}
}
The Full Picture
┌──────────────────────────────────────┐
│ Your Service │
└──────────────┬───────────────────────┘
│ call()
┌──────────────▼───────────────────────┐
│ Circuit Breaker │
│ ┌─────────┐ ┌──────┐ ┌──────────┐ │
│ │ Closed │→│ Open │→│ Half-Open│ │
│ └─────────┘ └──────┘ └──────────┘ │
└──────────────┬───────────────────────┘
│ (if Closed or Half-Open)
┌──────────────▼───────────────────────┐
│ External Service │
│ (Payment API, DB, etc.) │
└──────────────────────────────────────┘
Key Takeaways
- Fail fast — don't waste threads waiting for a dead service
- State is an enum — Rust's type system enforces valid transitions
- Config is separate — easy to tune per-service without changing logic
- Observability first — a silent Circuit Breaker is a dangerous one
- Wrap at the boundary — one Circuit Breaker per external dependency, not per call site
What's Next?
- Add retry with backoff inside the Closed state before counting as a failure
- Combine with Bulkhead (separate thread pools per dependency)
- Use
tokio::sync::RwLockinstead ofMutexfor better async throughput - Persist state to Redis for multi-instance deployments
Part of the Architecture Patterns series.
Top comments (0)