DEV Community

Cover image for Building a Circuit Breaker in Rust: From Zero to Production
Dylan Dumont
Dylan Dumont

Posted on

Building a Circuit Breaker in Rust: From Zero to Production

Your service calls an external API. It goes down. Your threads pile up waiting for timeouts.
Your whole app dies. The Circuit Breaker pattern exists to prevent exactly this.


What We're Building

A production-grade Circuit Breaker with three states, configurable thresholds, and zero unsafe code.

                    ┌─────────────────────────────────┐
                    │                                 │
           failures >= threshold            call succeeds
                    │                                 │
    ┌───────────────▼──────────┐         ┌────────────┴─────────┐
    │                          │         │                       │
    │         CLOSED           │         │      HALF-OPEN        │
    │   (requests pass through)│         │   (one probe call)    │
    │                          │         │                       │
    └──────────────────────────┘         └───────────────────────┘
                    ▲                               │
                    │                     call fails│
                    │                               ▼
                    │                  ┌────────────────────────┐
                    │                  │                        │
                    │  timeout expires │         OPEN           │
                    └──────────────────│   (requests rejected)  │
                                       │                        │
                                       └────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Three states:

  • Closed — normal operation, requests flow through, failures are counted
  • Open — service is assumed down, requests are rejected immediately (fail fast)
  • Half-Open — after a timeout, one probe request goes through to check recovery

Step 1 — Model the State Machine

Start with the types. In Rust, states map naturally to an enum.

use std::time::{Duration, Instant};

#[derive(Debug, Clone, PartialEq)]
pub enum CircuitState {
    Closed,
    Open { opened_at: Instant },
    HalfOpen,
}
Enter fullscreen mode Exit fullscreen mode

Open carries an Instant so we know when to transition to Half-Open. No booleans, no stringly-typed states.


Step 2 — Define the Configuration

Separate config from runtime state — single responsibility, easy to test.

#[derive(Debug)]
pub struct CircuitBreakerConfig {
    /// How many consecutive failures before opening the circuit
    pub failure_threshold: u32,
    /// How long to wait in Open before probing again
    pub recovery_timeout: Duration,
    /// How many consecutive successes in Half-Open to close again
    pub success_threshold: u32,
}

impl Default for CircuitBreakerConfig {
    fn default() -> Self {
        Self {
            failure_threshold: 5,
            recovery_timeout: Duration::from_secs(30),
            success_threshold: 2,
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Step 3 — Build the Circuit Breaker

use std::sync::{Arc, Mutex};

#[derive(Debug)]
pub struct CircuitBreaker {
    config: CircuitBreakerConfig,
    state: Mutex<CircuitState>,
    failure_count: Mutex<u32>,
    success_count: Mutex<u32>,
}

impl CircuitBreaker {
    pub fn new(config: CircuitBreakerConfig) -> Arc<Self> {
        Arc::new(Self {
            config,
            state: Mutex::new(CircuitState::Closed),
            failure_count: Mutex::new(0),
            success_count: Mutex::new(0),
        })
    }
}
Enter fullscreen mode Exit fullscreen mode

We wrap it in Arc<Self> immediately — a Circuit Breaker is always shared across threads.


Step 4 — The Core: call()

This is where the state machine lives.

#[derive(Debug, thiserror::Error)]
pub enum CircuitError<E> {
    #[error("Circuit is open — service unavailable")]
    Open,
    #[error("Service call failed: {0}")]
    Inner(E),
}

impl CircuitBreaker {
    pub fn call<F, T, E>(&self, f: F) -> Result<T, CircuitError<E>>
    where
        F: FnOnce() -> Result<T, E>,
    {
        // 1. Check if we should allow the request
        self.check_state()?;

        // 2. Execute the call
        match f() {
            Ok(value) => {
                self.on_success();
                Ok(value)
            }
            Err(e) => {
                self.on_failure();
                Err(CircuitError::Inner(e))
            }
        }
    }

    fn check_state(&self) -> Result<(), CircuitError<()>> {
        let mut state = self.state.lock().unwrap();

        match &*state {
            CircuitState::Closed => Ok(()),

            CircuitState::Open { opened_at } => {
                // Has the recovery timeout elapsed?
                if opened_at.elapsed() >= self.config.recovery_timeout {
                    *state = CircuitState::HalfOpen;
                    *self.success_count.lock().unwrap() = 0;
                    Ok(()) // allow the probe request
                } else {
                    Err(CircuitError::Open)
                }
            }

            CircuitState::HalfOpen => Ok(()), // allow one probe
        }
    }

    fn on_success(&self) {
        let state = self.state.lock().unwrap().clone();

        match state {
            CircuitState::HalfOpen => {
                let mut successes = self.success_count.lock().unwrap();
                *successes += 1;

                if *successes >= self.config.success_threshold {
                    *self.state.lock().unwrap() = CircuitState::Closed;
                    *self.failure_count.lock().unwrap() = 0;
                    *successes = 0;
                    tracing::info!("Circuit breaker closed — service recovered");
                }
            }
            CircuitState::Closed => {
                // Reset failure count on success
                *self.failure_count.lock().unwrap() = 0;
            }
            CircuitState::Open { .. } => {} // shouldn't happen
        }
    }

    fn on_failure(&self) {
        let state = self.state.lock().unwrap().clone();

        match state {
            CircuitState::Closed => {
                let mut failures = self.failure_count.lock().unwrap();
                *failures += 1;

                if *failures >= self.config.failure_threshold {
                    *self.state.lock().unwrap() = CircuitState::Open {
                        opened_at: Instant::now(),
                    };
                    tracing::warn!(
                        failures = *failures,
                        "Circuit breaker opened — too many failures"
                    );
                }
            }
            CircuitState::HalfOpen => {
                // Probe failed — back to Open
                *self.state.lock().unwrap() = CircuitState::Open {
                    opened_at: Instant::now(),
                };
                tracing::warn!("Circuit breaker re-opened — probe failed");
            }
            CircuitState::Open { .. } => {}
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Step 5 — Async Support

Real services use async. Wrap the sync version with an async variant:

impl CircuitBreaker {
    pub async fn call_async<F, Fut, T, E>(&self, f: F) -> Result<T, CircuitError<E>>
    where
        F: FnOnce() -> Fut,
        Fut: Future<Output = Result<T, E>>,
    {
        self.check_state().map_err(|_| CircuitError::Open)?;

        match f().await {
            Ok(value) => {
                self.on_success();
                Ok(value)
            }
            Err(e) => {
                self.on_failure();
                Err(CircuitError::Inner(e))
            }
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Step 6 — Wire It Up

A real example — wrapping an HTTP client call:

use std::sync::Arc;

#[derive(Clone)]
pub struct PaymentGatewayClient {
    http: reqwest::Client,
    breaker: Arc<CircuitBreaker>,
    base_url: String,
}

impl PaymentGatewayClient {
    pub fn new(base_url: String) -> Self {
        let config = CircuitBreakerConfig {
            failure_threshold: 3,
            recovery_timeout: Duration::from_secs(60),
            success_threshold: 1,
        };

        Self {
            http: reqwest::Client::new(),
            breaker: CircuitBreaker::new(config),
            base_url,
        }
    }

    pub async fn charge(&self, amount: u64, token: &str) -> Result<ChargeResponse, AppError> {
        self.breaker
            .call_async(|| async {
                self.http
                    .post(format!("{}/charge", self.base_url))
                    .json(&serde_json::json!({ "amount": amount, "token": token }))
                    .send()
                    .await?
                    .json::<ChargeResponse>()
                    .await
            })
            .await
            .map_err(|e| match e {
                CircuitError::Open => AppError::ServiceUnavailable("payment gateway"),
                CircuitError::Inner(e) => AppError::HttpError(e),
            })
    }
}
Enter fullscreen mode Exit fullscreen mode

Callers never deal with circuit breaker logic — it's fully encapsulated. ✅


Step 7 — Observability

A Circuit Breaker is useless if you can't see it. Expose state via metrics:

impl CircuitBreaker {
    pub fn state_label(&self) -> &'static str {
        match *self.state.lock().unwrap() {
            CircuitState::Closed => "closed",
            CircuitState::Open { .. } => "open",
            CircuitState::HalfOpen => "half_open",
        }
    }
}

// In your metrics handler (e.g. Prometheus via metrics crate)
fn record_circuit_state(name: &str, breaker: &CircuitBreaker) {
    let state = match breaker.state_label() {
        "closed"    => 0.0,
        "half_open" => 0.5,
        "open"      => 1.0,
        _           => -1.0,
    };

    metrics::gauge!("circuit_breaker_state", state, "name" => name.to_string());
}
Enter fullscreen mode Exit fullscreen mode

Alert when circuit_breaker_state == 1.0 for more than 2 minutes. That's your on-call trigger.


Testing the State Transitions

#[cfg(test)]
mod tests {
    use super::*;

    fn failing_call() -> Result<(), &'static str> {
        Err("timeout")
    }

    fn succeeding_call() -> Result<&'static str, ()> {
        Ok("ok")
    }

    #[test]
    fn opens_after_threshold_failures() {
        let cb = CircuitBreaker::new(CircuitBreakerConfig {
            failure_threshold: 3,
            ..Default::default()
        });

        for _ in 0..3 {
            let _ = cb.call(|| failing_call());
        }

        // Next call should be rejected immediately
        let result = cb.call(|| succeeding_call());
        assert!(matches!(result, Err(CircuitError::Open)));
    }

    #[test]
    fn recovers_after_timeout() {
        let cb = CircuitBreaker::new(CircuitBreakerConfig {
            failure_threshold: 1,
            recovery_timeout: Duration::from_millis(10), // short for tests
            success_threshold: 1,
        });

        let _ = cb.call(|| failing_call());
        assert_eq!(cb.state_label(), "open");

        std::thread::sleep(Duration::from_millis(20));

        // Probe call succeeds → back to Closed
        let _ = cb.call(|| succeeding_call());
        assert_eq!(cb.state_label(), "closed");
    }
}
Enter fullscreen mode Exit fullscreen mode

The Full Picture

                    ┌──────────────────────────────────────┐
                    │           Your Service               │
                    └──────────────┬───────────────────────┘
                                   │ call()
                    ┌──────────────▼───────────────────────┐
                    │         Circuit Breaker              │
                    │  ┌─────────┐ ┌──────┐ ┌──────────┐  │
                    │  │ Closed  │→│ Open │→│ Half-Open│  │
                    │  └─────────┘ └──────┘ └──────────┘  │
                    └──────────────┬───────────────────────┘
                                   │ (if Closed or Half-Open)
                    ┌──────────────▼───────────────────────┐
                    │         External Service             │
                    │       (Payment API, DB, etc.)        │
                    └──────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Key Takeaways

  • Fail fast — don't waste threads waiting for a dead service
  • State is an enum — Rust's type system enforces valid transitions
  • Config is separate — easy to tune per-service without changing logic
  • Observability first — a silent Circuit Breaker is a dangerous one
  • Wrap at the boundary — one Circuit Breaker per external dependency, not per call site

What's Next?

  • Add retry with backoff inside the Closed state before counting as a failure
  • Combine with Bulkhead (separate thread pools per dependency)
  • Use tokio::sync::RwLock instead of Mutex for better async throughput
  • Persist state to Redis for multi-instance deployments

Part of the Architecture Patterns series.

Top comments (0)