Your service calls an external API. It goes down. Your threads pile up waiting for timeouts.
Your whole app dies. The Circuit Breaker pattern exists to prevent exactly this.
What We're Building
A production-grade Circuit Breaker with three states, configurable thresholds, and zero unsafe code.
┌─────────────────────────────────┐
│ │
failures >= threshold call succeeds
│ │
┌───────────────▼──────────┐ ┌────────────┴─────────┐
│ │ │ │
│ CLOSED │ │ HALF-OPEN │
│ (requests pass through)│ │ (one probe call) │
│ │ │ │
└──────────────────────────┘ └───────────────────────┘
▲ │
│ call fails│
│ ▼
│ ┌────────────────────────┐
│ │ │
│ timeout expires │ OPEN │
└──────────────────│ (requests rejected) │
│ │
└────────────────────────┘
Three states:
- Closed — normal operation, requests flow through, failures are counted
- Open — service is assumed down, requests are rejected immediately (fail fast)
- Half-Open — after a timeout, one probe request goes through to check recovery
Step 1 — Model the State Machine
Start with the types. In Rust, states map naturally to an enum.
use std::time::{Duration, Instant};
#[derive(Debug, Clone, PartialEq)]
pub enum CircuitState {
Closed,
Open { opened_at: Instant },
HalfOpen,
}
Open carries an Instant so we know when to transition to Half-Open. No booleans, no stringly-typed states.
Step 2 — Define the Configuration
Separate config from runtime state — single responsibility, easy to test.
#[derive(Debug)]
pub struct CircuitBreakerConfig {
/// How many consecutive failures before opening the circuit
pub failure_threshold: u32,
/// How long to wait in Open before probing again
pub recovery_timeout: Duration,
/// How many consecutive successes in Half-Open to close again
pub success_threshold: u32,
}
impl Default for CircuitBreakerConfig {
fn default() -> Self {
Self {
failure_threshold: 5,
recovery_timeout: Duration::from_secs(30),
success_threshold: 2,
}
}
}
Step 3 — Build the Circuit Breaker
use std::sync::{Arc, Mutex};
#[derive(Debug)]
pub struct CircuitBreaker {
config: CircuitBreakerConfig,
state: Mutex<CircuitState>,
failure_count: Mutex<u32>,
success_count: Mutex<u32>,
}
impl CircuitBreaker {
pub fn new(config: CircuitBreakerConfig) -> Arc<Self> {
Arc::new(Self {
config,
state: Mutex::new(CircuitState::Closed),
failure_count: Mutex::new(0),
success_count: Mutex::new(0),
})
}
}
We wrap it in Arc<Self> immediately — a Circuit Breaker is always shared across threads.
Aside: Interior Mutability and Choosing the Right Primitive
CircuitBreaker is shared between threads via Arc<Self>. But Arc only grants immutable access to its contents — that is its safety contract. To still modify internal state, we use interior mutability: wrappers that shift the "only one writer at a time" rule from the compiler to runtime.
Compile time Runtime
Borrow rule: &mut T exclusive → Mutex<T> (blocks if already locked)
→ RwLock<T> (N readers OR 1 writer)
→ Atomic* (CPU-level atomic ops)
Why Mutex<u32> and not AtomicU32?
Atomics (AtomicU32, AtomicBool) would be faster for simple counters. But here, failure_count and state must be modified together, consistently — opening the circuit requires writing Open { opened_at } into state AND reading failure_count. With separate atomics, another thread could read an inconsistent state between those two operations. The Mutex ensures the read/write pair is atomic from every other thread's perspective.
Rule of thumb: atomics for independent counters,
Mutexwhenever you need consistency across multiple fields.
Step 4 — Define Errors with thiserror
Before defining our errors, add the thiserror crate to Cargo.toml:
[dependencies]
thiserror = "1"
thiserror automatically generates implementations of the std::fmt::Display trait (which defines how the error prints) and the std::error::Error trait (which enables interoperability with the rest of the Rust ecosystem). Without it, you would write this by hand:
// What thiserror generates for you — boilerplate you no longer have to maintain:
impl<E: std::fmt::Display> std::fmt::Display for CircuitError<E> {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
CircuitError::Open => write!(f, "Circuit is open — service unavailable"),
CircuitError::Inner(e) => write!(f, "Service call failed: {}", e),
}
}
}
impl<E: std::error::Error + 'static> std::error::Error for CircuitError<E> {
fn source(&self) -> Option<&(dyn std::error::Error + 'static)> {
match self {
CircuitError::Inner(e) => Some(e),
_ => None,
}
}
}
With thiserror, that collapses to:
#[derive(Debug, thiserror::Error)]
pub enum CircuitError<E> {
#[error("Circuit is open — service unavailable")]
Open,
#[error("Service call failed: {0}")]
Inner(E),
}
thiserroris the ecosystem convention for libraries. Its counterpartanyhowis preferred in application binaries where you just want to propagate errors without modeling them precisely.
Step 5 — The Core: call()
This is where the state machine lives. Before introducing the final generic signature, let's start with a naive version to understand the structure:
// Naive version — only works with closures that return String
fn call_naive(&self, f: impl FnOnce() -> Result<String, String>) -> Result<String, String> {
self.check_state().map_err(|_| "Circuit open".to_string())?;
f()
}
This is too restrictive: it only works with String. We want a circuit breaker that is agnostic to the return type. We introduce type parameters T (success value) and E (error type):
impl CircuitBreaker {
pub fn call<F, T, E>(&self, f: F) -> Result<T, CircuitError<E>>
where
F: FnOnce() -> Result<T, E>,
// ^^^^^^^
// FnOnce, not Fn: the closure may consume resources
// (a connection, a buffer) that can only be used once.
// Fn would require the closure to be callable multiple times,
// which would rule out non-Copy types like network handles.
{
// 1. Check if we should allow the request
self.check_state()?;
// 2. Execute the call
match f() {
Ok(value) => {
self.on_success();
Ok(value)
}
Err(e) => {
self.on_failure();
Err(CircuitError::Inner(e))
}
}
}
fn check_state(&self) -> Result<(), CircuitError<()>> {
let mut state = self.state.lock().unwrap();
match &*state {
CircuitState::Closed => Ok(()),
CircuitState::Open { opened_at } => {
// Has the recovery timeout elapsed?
if opened_at.elapsed() >= self.config.recovery_timeout {
*state = CircuitState::HalfOpen;
*self.success_count.lock().unwrap() = 0;
Ok(()) // allow the probe request
} else {
Err(CircuitError::Open)
}
}
CircuitState::HalfOpen => Ok(()), // allow one probe
}
}
fn on_success(&self) {
let state = self.state.lock().unwrap().clone();
match state {
CircuitState::HalfOpen => {
let mut successes = self.success_count.lock().unwrap();
*successes += 1;
if *successes >= self.config.success_threshold {
*self.state.lock().unwrap() = CircuitState::Closed;
*self.failure_count.lock().unwrap() = 0;
*successes = 0;
tracing::info!("Circuit breaker closed — service recovered");
}
}
CircuitState::Closed => {
// Reset failure count on success
*self.failure_count.lock().unwrap() = 0;
}
CircuitState::Open { .. } => {} // shouldn't happen
}
}
fn on_failure(&self) {
let state = self.state.lock().unwrap().clone();
match state {
CircuitState::Closed => {
let mut failures = self.failure_count.lock().unwrap();
*failures += 1;
if *failures >= self.config.failure_threshold {
*self.state.lock().unwrap() = CircuitState::Open {
opened_at: Instant::now(),
};
tracing::warn!(
failures = *failures,
"Circuit breaker opened — too many failures"
);
}
}
CircuitState::HalfOpen => {
// Probe failed — back to Open
*self.state.lock().unwrap() = CircuitState::Open {
opened_at: Instant::now(),
};
tracing::warn!("Circuit breaker re-opened — probe failed");
}
CircuitState::Open { .. } => {}
}
}
}
FnOnce vs Fn in one sentence: Fn = shared borrow of the captured environment, FnOnce = ownership transfer. Use FnOnce whenever the closure may capture something that can only be used once (a network handle, a buffer).
Step 6 — Async Support
Real services use async. The async variant introduces a new concept: the Future trait.
impl CircuitBreaker {
pub async fn call_async<F, Fut, T, E>(&self, f: F) -> Result<T, CircuitError<E>>
where
F: FnOnce() -> Fut,
// ^^^
// F returns a Future (not T or E directly)
Fut: Future<Output = Result<T, E>>,
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
// That Future, once resolved (awaited), yields Result<T, E>
{
self.check_state().map_err(|_| CircuitError::Open)?;
match f().await {
// ^^^^^
// Call f() to get the Future, then await it
Ok(value) => {
self.on_success();
Ok(value)
}
Err(e) => {
self.on_failure();
Err(CircuitError::Inner(e))
}
}
}
}
Why not write
f: impl AsyncFn() -> Result<T, E>? That syntax is not yet available on stable Rust. TheF: FnOnce() -> Fut, Fut: Future<Output = ...>pattern is the current standard idiom for accepting async closures.
Step 7 — Wire It Up
A real example — wrapping an HTTP client call:
use std::sync::Arc;
#[derive(Clone)]
pub struct PaymentGatewayClient {
http: reqwest::Client,
breaker: Arc<CircuitBreaker>,
base_url: String,
}
impl PaymentGatewayClient {
pub fn new(base_url: String) -> Self {
let config = CircuitBreakerConfig {
failure_threshold: 3,
recovery_timeout: Duration::from_secs(60),
success_threshold: 1,
};
Self {
http: reqwest::Client::new(),
breaker: CircuitBreaker::new(config),
base_url,
}
}
pub async fn charge(&self, amount: u64, token: &str) -> Result<ChargeResponse, AppError> {
self.breaker
.call_async(|| async {
self.http
.post(format!("{}/charge", self.base_url))
.json(&serde_json::json!({ "amount": amount, "token": token }))
.send()
.await?
.json::<ChargeResponse>()
.await
})
.await
.map_err(|e| match e {
CircuitError::Open => AppError::ServiceUnavailable("payment gateway"),
CircuitError::Inner(e) => AppError::HttpError(e),
})
}
}
Callers never deal with circuit breaker logic — it's fully encapsulated. ✅
Step 8 — Observability
A Circuit Breaker is useless if you can't see it. Expose state via metrics:
impl CircuitBreaker {
pub fn state_label(&self) -> &'static str {
match *self.state.lock().unwrap() {
CircuitState::Closed => "closed",
CircuitState::Open { .. } => "open",
CircuitState::HalfOpen => "half_open",
}
}
}
// In your metrics handler (e.g. Prometheus via metrics crate)
fn record_circuit_state(name: &str, breaker: &CircuitBreaker) {
let state = match breaker.state_label() {
"closed" => 0.0,
"half_open" => 0.5,
"open" => 1.0,
_ => -1.0,
};
metrics::gauge!("circuit_breaker_state", state, "name" => name.to_string());
}
Alert when circuit_breaker_state == 1.0 for more than 2 minutes. That's your on-call trigger.
Testing the State Transitions
#[cfg(test)]
mod tests {
use super::*;
fn failing_call() -> Result<(), &'static str> {
Err("timeout")
}
fn succeeding_call() -> Result<&'static str, ()> {
Ok("ok")
}
#[test]
fn opens_after_threshold_failures() {
let cb = CircuitBreaker::new(CircuitBreakerConfig {
failure_threshold: 3,
..Default::default()
});
for _ in 0..3 {
let _ = cb.call(|| failing_call());
}
// Next call should be rejected immediately
let result = cb.call(|| succeeding_call());
assert!(matches!(result, Err(CircuitError::Open)));
}
#[test]
fn recovers_after_timeout() {
let cb = CircuitBreaker::new(CircuitBreakerConfig {
failure_threshold: 1,
recovery_timeout: Duration::from_millis(10), // short for tests
success_threshold: 1,
});
let _ = cb.call(|| failing_call());
assert_eq!(cb.state_label(), "open");
std::thread::sleep(Duration::from_millis(20));
// Probe call succeeds → back to Closed
let _ = cb.call(|| succeeding_call());
assert_eq!(cb.state_label(), "closed");
}
}
The Full Picture
┌──────────────────────────────────────┐
│ Your Service │
└──────────────┬───────────────────────┘
│ call()
┌──────────────▼───────────────────────┐
│ Circuit Breaker │
│ ┌─────────┐ ┌──────┐ ┌──────────┐ │
│ │ Closed │→│ Open │→│ Half-Open│ │
│ └─────────┘ └──────┘ └──────────┘ │
└──────────────┬───────────────────────┘
│ (if Closed or Half-Open)
┌──────────────▼───────────────────────┐
│ External Service │
│ (Payment API, DB, etc.) │
└──────────────────────────────────────┘
Key Takeaways
- Fail fast — don't waste threads waiting for a dead service
- State is an enum — Rust's type system enforces valid transitions
- Config is separate — easy to tune per-service without changing logic
- Observability first — a silent Circuit Breaker is a dangerous one
- Wrap at the boundary — one Circuit Breaker per external dependency, not per call site
-
FnOnceoverFnfor closures that consume single-use resources -
Mutexover atomics whenever you need consistency across multiple fields -
thiserrorto avoid hand-rollingDisplayandErrorimplementations
What's Next?
- Add retry with backoff inside the Closed state before counting as a failure
- Combine with Bulkhead (separate thread pools per dependency)
- Use
tokio::sync::RwLockinstead ofMutexfor better async throughput - Persist state to Redis for multi-instance deployments
Further Reading
- Learn Rust in a Month of Lunches (MacLeod) — hands-on Rust from scratch, perfect complement to the async patterns used in this circuit breaker.
- Designing Data-Intensive Applications (Kleppmann) — deep coverage of fault tolerance, retries, and resilience patterns across distributed systems.
Part of the Architecture Patterns series.
Top comments (1)
The tutorial explains the circuit breaker pattern and the design decisions behind it well. However, several advanced Rust concepts are used without explanation, which makes the "zero to production" title misleading for newer Rust developers.
call<F, T, E>signature is introduced without scaffolding. There is no explanation of whyFnOnceis used instead ofFn, or what thewhereclause means. Starting with a simpler non-generic version and then refactoring to generics would bridge this gap.Mutexand interior mutability are used without discussion. WhyMutex<u32>overatomics? What is interior mutability? These are not obvious to the target audience the title implies.thiserrorderivemacros appear without context - no mention of which crate they come from or why they are preferred over manually implementingDisplay.asyncvariant assumes familiarity withFuturetrait bounds, which deserves at least a brief explanation.The tutorial is strong on architecture - why the state machine is modeled this way, why
Arc<Self>is the right constructor pattern. It would be equally strong on language mechanics with a bit more scaffolding around the generics, concurrency primitives, and error handling.