DEV Community

Cover image for Building a Circuit Breaker in Rust: From Zero to Production
Dylan Dumont
Dylan Dumont

Posted on • Edited on

Building a Circuit Breaker in Rust: From Zero to Production

Your service calls an external API. It goes down. Your threads pile up waiting for timeouts.
Your whole app dies. The Circuit Breaker pattern exists to prevent exactly this.


What We're Building

A production-grade Circuit Breaker with three states, configurable thresholds, and zero unsafe code.

                    ┌─────────────────────────────────┐
                    │                                 │
           failures >= threshold            call succeeds
                    │                                 │
    ┌───────────────▼──────────┐         ┌────────────┴─────────┐
    │                          │         │                       │
    │         CLOSED           │         │      HALF-OPEN        │
    │   (requests pass through)│         │   (one probe call)    │
    │                          │         │                       │
    └──────────────────────────┘         └───────────────────────┘
                    ▲                               │
                    │                     call fails│
                    │                               ▼
                    │                  ┌────────────────────────┐
                    │                  │                        │
                    │  timeout expires │         OPEN           │
                    └──────────────────│   (requests rejected)  │
                                       │                        │
                                       └────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Three states:

  • Closed — normal operation, requests flow through, failures are counted
  • Open — service is assumed down, requests are rejected immediately (fail fast)
  • Half-Open — after a timeout, one probe request goes through to check recovery

Step 1 — Model the State Machine

Start with the types. In Rust, states map naturally to an enum.

use std::time::{Duration, Instant};

#[derive(Debug, Clone, PartialEq)]
pub enum CircuitState {
    Closed,
    Open { opened_at: Instant },
    HalfOpen,
}
Enter fullscreen mode Exit fullscreen mode

Open carries an Instant so we know when to transition to Half-Open. No booleans, no stringly-typed states.


Step 2 — Define the Configuration

Separate config from runtime state — single responsibility, easy to test.

#[derive(Debug)]
pub struct CircuitBreakerConfig {
    /// How many consecutive failures before opening the circuit
    pub failure_threshold: u32,
    /// How long to wait in Open before probing again
    pub recovery_timeout: Duration,
    /// How many consecutive successes in Half-Open to close again
    pub success_threshold: u32,
}

impl Default for CircuitBreakerConfig {
    fn default() -> Self {
        Self {
            failure_threshold: 5,
            recovery_timeout: Duration::from_secs(30),
            success_threshold: 2,
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Step 3 — Build the Circuit Breaker

use std::sync::{Arc, Mutex};

#[derive(Debug)]
pub struct CircuitBreaker {
    config: CircuitBreakerConfig,
    state: Mutex<CircuitState>,
    failure_count: Mutex<u32>,
    success_count: Mutex<u32>,
}

impl CircuitBreaker {
    pub fn new(config: CircuitBreakerConfig) -> Arc<Self> {
        Arc::new(Self {
            config,
            state: Mutex::new(CircuitState::Closed),
            failure_count: Mutex::new(0),
            success_count: Mutex::new(0),
        })
    }
}
Enter fullscreen mode Exit fullscreen mode

We wrap it in Arc<Self> immediately — a Circuit Breaker is always shared across threads.

Aside: Interior Mutability and Choosing the Right Primitive

CircuitBreaker is shared between threads via Arc<Self>. But Arc only grants immutable access to its contents — that is its safety contract. To still modify internal state, we use interior mutability: wrappers that shift the "only one writer at a time" rule from the compiler to runtime.

                    Compile time             Runtime
Borrow rule:       &mut T exclusive  →   Mutex<T>   (blocks if already locked)
                                      →   RwLock<T>  (N readers OR 1 writer)
                                      →   Atomic*    (CPU-level atomic ops)
Enter fullscreen mode Exit fullscreen mode

Why Mutex<u32> and not AtomicU32?

Atomics (AtomicU32, AtomicBool) would be faster for simple counters. But here, failure_count and state must be modified together, consistently — opening the circuit requires writing Open { opened_at } into state AND reading failure_count. With separate atomics, another thread could read an inconsistent state between those two operations. The Mutex ensures the read/write pair is atomic from every other thread's perspective.

Rule of thumb: atomics for independent counters, Mutex whenever you need consistency across multiple fields.


Step 4 — Define Errors with thiserror

Before defining our errors, add the thiserror crate to Cargo.toml:

[dependencies]
thiserror = "1"
Enter fullscreen mode Exit fullscreen mode

thiserror automatically generates implementations of the std::fmt::Display trait (which defines how the error prints) and the std::error::Error trait (which enables interoperability with the rest of the Rust ecosystem). Without it, you would write this by hand:

// What thiserror generates for you — boilerplate you no longer have to maintain:
impl<E: std::fmt::Display> std::fmt::Display for CircuitError<E> {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match self {
            CircuitError::Open => write!(f, "Circuit is open — service unavailable"),
            CircuitError::Inner(e) => write!(f, "Service call failed: {}", e),
        }
    }
}
impl<E: std::error::Error + 'static> std::error::Error for CircuitError<E> {
    fn source(&self) -> Option<&(dyn std::error::Error + 'static)> {
        match self {
            CircuitError::Inner(e) => Some(e),
            _ => None,
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

With thiserror, that collapses to:

#[derive(Debug, thiserror::Error)]
pub enum CircuitError<E> {
    #[error("Circuit is open — service unavailable")]
    Open,
    #[error("Service call failed: {0}")]
    Inner(E),
}
Enter fullscreen mode Exit fullscreen mode

thiserror is the ecosystem convention for libraries. Its counterpart anyhow is preferred in application binaries where you just want to propagate errors without modeling them precisely.


Step 5 — The Core: call()

This is where the state machine lives. Before introducing the final generic signature, let's start with a naive version to understand the structure:

// Naive version — only works with closures that return String
fn call_naive(&self, f: impl FnOnce() -> Result<String, String>) -> Result<String, String> {
    self.check_state().map_err(|_| "Circuit open".to_string())?;
    f()
}
Enter fullscreen mode Exit fullscreen mode

This is too restrictive: it only works with String. We want a circuit breaker that is agnostic to the return type. We introduce type parameters T (success value) and E (error type):

impl CircuitBreaker {
    pub fn call<F, T, E>(&self, f: F) -> Result<T, CircuitError<E>>
    where
        F: FnOnce() -> Result<T, E>,
        //  ^^^^^^^
        //  FnOnce, not Fn: the closure may consume resources
        //  (a connection, a buffer) that can only be used once.
        //  Fn would require the closure to be callable multiple times,
        //  which would rule out non-Copy types like network handles.
    {
        // 1. Check if we should allow the request
        self.check_state()?;

        // 2. Execute the call
        match f() {
            Ok(value) => {
                self.on_success();
                Ok(value)
            }
            Err(e) => {
                self.on_failure();
                Err(CircuitError::Inner(e))
            }
        }
    }

    fn check_state(&self) -> Result<(), CircuitError<()>> {
        let mut state = self.state.lock().unwrap();

        match &*state {
            CircuitState::Closed => Ok(()),

            CircuitState::Open { opened_at } => {
                // Has the recovery timeout elapsed?
                if opened_at.elapsed() >= self.config.recovery_timeout {
                    *state = CircuitState::HalfOpen;
                    *self.success_count.lock().unwrap() = 0;
                    Ok(()) // allow the probe request
                } else {
                    Err(CircuitError::Open)
                }
            }

            CircuitState::HalfOpen => Ok(()), // allow one probe
        }
    }

    fn on_success(&self) {
        let state = self.state.lock().unwrap().clone();

        match state {
            CircuitState::HalfOpen => {
                let mut successes = self.success_count.lock().unwrap();
                *successes += 1;

                if *successes >= self.config.success_threshold {
                    *self.state.lock().unwrap() = CircuitState::Closed;
                    *self.failure_count.lock().unwrap() = 0;
                    *successes = 0;
                    tracing::info!("Circuit breaker closed — service recovered");
                }
            }
            CircuitState::Closed => {
                // Reset failure count on success
                *self.failure_count.lock().unwrap() = 0;
            }
            CircuitState::Open { .. } => {} // shouldn't happen
        }
    }

    fn on_failure(&self) {
        let state = self.state.lock().unwrap().clone();

        match state {
            CircuitState::Closed => {
                let mut failures = self.failure_count.lock().unwrap();
                *failures += 1;

                if *failures >= self.config.failure_threshold {
                    *self.state.lock().unwrap() = CircuitState::Open {
                        opened_at: Instant::now(),
                    };
                    tracing::warn!(
                        failures = *failures,
                        "Circuit breaker opened — too many failures"
                    );
                }
            }
            CircuitState::HalfOpen => {
                // Probe failed — back to Open
                *self.state.lock().unwrap() = CircuitState::Open {
                    opened_at: Instant::now(),
                };
                tracing::warn!("Circuit breaker re-opened — probe failed");
            }
            CircuitState::Open { .. } => {}
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

FnOnce vs Fn in one sentence: Fn = shared borrow of the captured environment, FnOnce = ownership transfer. Use FnOnce whenever the closure may capture something that can only be used once (a network handle, a buffer).


Step 6 — Async Support

Real services use async. The async variant introduces a new concept: the Future trait.

impl CircuitBreaker {
    pub async fn call_async<F, Fut, T, E>(&self, f: F) -> Result<T, CircuitError<E>>
    where
        F: FnOnce() -> Fut,
        //             ^^^
        //             F returns a Future (not T or E directly)
        Fut: Future<Output = Result<T, E>>,
        //   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        //   That Future, once resolved (awaited), yields Result<T, E>
    {
        self.check_state().map_err(|_| CircuitError::Open)?;

        match f().await {
        //         ^^^^^
        //         Call f() to get the Future, then await it
            Ok(value) => {
                self.on_success();
                Ok(value)
            }
            Err(e) => {
                self.on_failure();
                Err(CircuitError::Inner(e))
            }
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Why not write f: impl AsyncFn() -> Result<T, E>? That syntax is not yet available on stable Rust. The F: FnOnce() -> Fut, Fut: Future<Output = ...> pattern is the current standard idiom for accepting async closures.


Step 7 — Wire It Up

A real example — wrapping an HTTP client call:

use std::sync::Arc;

#[derive(Clone)]
pub struct PaymentGatewayClient {
    http: reqwest::Client,
    breaker: Arc<CircuitBreaker>,
    base_url: String,
}

impl PaymentGatewayClient {
    pub fn new(base_url: String) -> Self {
        let config = CircuitBreakerConfig {
            failure_threshold: 3,
            recovery_timeout: Duration::from_secs(60),
            success_threshold: 1,
        };

        Self {
            http: reqwest::Client::new(),
            breaker: CircuitBreaker::new(config),
            base_url,
        }
    }

    pub async fn charge(&self, amount: u64, token: &str) -> Result<ChargeResponse, AppError> {
        self.breaker
            .call_async(|| async {
                self.http
                    .post(format!("{}/charge", self.base_url))
                    .json(&serde_json::json!({ "amount": amount, "token": token }))
                    .send()
                    .await?
                    .json::<ChargeResponse>()
                    .await
            })
            .await
            .map_err(|e| match e {
                CircuitError::Open => AppError::ServiceUnavailable("payment gateway"),
                CircuitError::Inner(e) => AppError::HttpError(e),
            })
    }
}
Enter fullscreen mode Exit fullscreen mode

Callers never deal with circuit breaker logic — it's fully encapsulated. ✅


Step 8 — Observability

A Circuit Breaker is useless if you can't see it. Expose state via metrics:

impl CircuitBreaker {
    pub fn state_label(&self) -> &'static str {
        match *self.state.lock().unwrap() {
            CircuitState::Closed => "closed",
            CircuitState::Open { .. } => "open",
            CircuitState::HalfOpen => "half_open",
        }
    }
}

// In your metrics handler (e.g. Prometheus via metrics crate)
fn record_circuit_state(name: &str, breaker: &CircuitBreaker) {
    let state = match breaker.state_label() {
        "closed"    => 0.0,
        "half_open" => 0.5,
        "open"      => 1.0,
        _           => -1.0,
    };

    metrics::gauge!("circuit_breaker_state", state, "name" => name.to_string());
}
Enter fullscreen mode Exit fullscreen mode

Alert when circuit_breaker_state == 1.0 for more than 2 minutes. That's your on-call trigger.


Testing the State Transitions

#[cfg(test)]
mod tests {
    use super::*;

    fn failing_call() -> Result<(), &'static str> {
        Err("timeout")
    }

    fn succeeding_call() -> Result<&'static str, ()> {
        Ok("ok")
    }

    #[test]
    fn opens_after_threshold_failures() {
        let cb = CircuitBreaker::new(CircuitBreakerConfig {
            failure_threshold: 3,
            ..Default::default()
        });

        for _ in 0..3 {
            let _ = cb.call(|| failing_call());
        }

        // Next call should be rejected immediately
        let result = cb.call(|| succeeding_call());
        assert!(matches!(result, Err(CircuitError::Open)));
    }

    #[test]
    fn recovers_after_timeout() {
        let cb = CircuitBreaker::new(CircuitBreakerConfig {
            failure_threshold: 1,
            recovery_timeout: Duration::from_millis(10), // short for tests
            success_threshold: 1,
        });

        let _ = cb.call(|| failing_call());
        assert_eq!(cb.state_label(), "open");

        std::thread::sleep(Duration::from_millis(20));

        // Probe call succeeds → back to Closed
        let _ = cb.call(|| succeeding_call());
        assert_eq!(cb.state_label(), "closed");
    }
}
Enter fullscreen mode Exit fullscreen mode

The Full Picture

                    ┌──────────────────────────────────────┐
                    │           Your Service               │
                    └──────────────┬───────────────────────┘
                                   │ call()
                    ┌──────────────▼───────────────────────┐
                    │         Circuit Breaker              │
                    │  ┌─────────┐ ┌──────┐ ┌──────────┐  │
                    │  │ Closed  │→│ Open │→│ Half-Open│  │
                    │  └─────────┘ └──────┘ └──────────┘  │
                    └──────────────┬───────────────────────┘
                                   │ (if Closed or Half-Open)
                    ┌──────────────▼───────────────────────┐
                    │         External Service             │
                    │       (Payment API, DB, etc.)        │
                    └──────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Key Takeaways

  • Fail fast — don't waste threads waiting for a dead service
  • State is an enum — Rust's type system enforces valid transitions
  • Config is separate — easy to tune per-service without changing logic
  • Observability first — a silent Circuit Breaker is a dangerous one
  • Wrap at the boundary — one Circuit Breaker per external dependency, not per call site
  • FnOnce over Fn for closures that consume single-use resources
  • Mutex over atomics whenever you need consistency across multiple fields
  • thiserror to avoid hand-rolling Display and Error implementations

What's Next?

  • Add retry with backoff inside the Closed state before counting as a failure
  • Combine with Bulkhead (separate thread pools per dependency)
  • Use tokio::sync::RwLock instead of Mutex for better async throughput
  • Persist state to Redis for multi-instance deployments

Further Reading

Part of the Architecture Patterns series.

Top comments (1)

Collapse
 
sayanb profile image
sayanb

The tutorial explains the circuit breaker pattern and the design decisions behind it well. However, several advanced Rust concepts are used without explanation, which makes the "zero to production" title misleading for newer Rust developers.

  • The call<F, T, E> signature is introduced without scaffolding. There is no explanation of why FnOnce is used instead of Fn, or what the where clause means. Starting with a simpler non-generic version and then refactoring to generics would bridge this gap.
  • Mutex and interior mutability are used without discussion. Why Mutex<u32> over atomics? What is interior mutability? These are not obvious to the target audience the title implies.
  • The thiserror derive macros appear without context - no mention of which crate they come from or why they are preferred over manually implementing Display.
  • The async variant assumes familiarity with Future trait bounds, which deserves at least a brief explanation.

The tutorial is strong on architecture - why the state machine is modeled this way, why Arc<Self> is the right constructor pattern. It would be equally strong on language mechanics with a bit more scaffolding around the generics, concurrency primitives, and error handling.