A Circuit Breaker at the Go Adapter: Fail Fast, Leave the Domain Alone

#go #architecture #hexagonal #concurrency

Book: Hexagonal Architecture in Go
Also by me: The Complete Guide to Go Programming — the companion book in the Thinking in Go series
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

A downstream payment provider starts timing out. Not failing fast — timing out, at 30 seconds a call. Your checkout handler holds a goroutine and a connection for each of those 30-second waits. Traffic keeps coming. Within a minute every worker is parked on a call that was never going to return, and a slow dependency has turned into a full outage on your side.

This is the failure a circuit breaker exists to stop. After enough failures it stops calling the sick dependency at all, returns an error immediately, and gives the downstream time to recover. In a Go service built around ports and adapters, the right place for that logic is the adapter — the outbound edge that talks to the network. The use case behind the port should never know a breaker is there.

Where the breaker goes

Start with the port. It is an interface the use case owns, phrased in the domain's words, with no mention of HTTP, retries, or breakers.

// port/payment.go
package port

import "context"

type Charge struct {
    OrderID string
    Cents   int64
}

type PaymentGateway interface {
    Authorize(ctx context.Context, c Charge) (string, error)
}

The use case depends on PaymentGateway and nothing else.

// app/checkout.go
package app

func (s *Checkout) Pay(
    ctx context.Context, c port.Charge,
) error {
    ref, err := s.gateway.Authorize(ctx, c)
    if err != nil {
        return fmt.Errorf("authorize: %w", err)
    }
    s.orders.MarkPaid(ctx, c.OrderID, ref)
    return nil
}

There is one real adapter that speaks HTTP to the provider. The breaker is a second adapter that wraps the first and also satisfies PaymentGateway. The use case is wired with the wrapper. It calls Authorize, gets a value or an error, and cannot tell whether the call went to the network or bounced off an open breaker. That is the whole point: the resilience policy lives at the boundary, not in the domain.

The three states

A breaker is a small state machine with three states.

Closed — calls pass through. Failures are counted. Past a threshold, the breaker trips to open.
Open — calls fail immediately without touching the dependency. After a cooldown, the breaker moves to half-open.
Half-open — one trial call is allowed through. If it succeeds, the breaker closes. If it fails, it goes back to open and the cooldown restarts.

Closed is the healthy path. Open is the protective path. Half-open is the probe that decides whether the dependency has come back, without dropping full traffic on it the instant the cooldown ends.

// adapter/breaker/state.go
package breaker

type State int

const (
    Closed State = iota
    Open
    HalfOpen
)

The breaker itself

Keep it plain: a mutex, a couple of counters, and a timestamp for when the open state should expire. No goroutine, no ticker. State transitions happen on the calling goroutine when a call arrives, which keeps the thing easy to reason about.

// adapter/breaker/breaker.go
package breaker

import (
    "errors"
    "sync"
    "time"
)

var ErrOpen = errors.New("circuit breaker is open")

type Breaker struct {
    mu           sync.Mutex
    state        State
    failures     int
    maxFailures  int
    cooldown     time.Duration
    openedAt     time.Time
    now          func() time.Time
}

func New(maxFailures int, cooldown time.Duration) *Breaker {
    return &Breaker{
        state:       Closed,
        maxFailures: maxFailures,
        cooldown:    cooldown,
        now:         time.Now,
    }
}

The now field is a function so tests can control the clock without sleeping. Default it to time.Now and override it in a test with a one-line setter:

func (b *Breaker) SetClock(f func() time.Time) {
    b.now = f
}

allow decides whether a call may proceed and folds the timing logic into the state read:

func (b *Breaker) allow() error {
    b.mu.Lock()
    defer b.mu.Unlock()

    if b.state == Open {
        if b.now().Sub(b.openedAt) < b.cooldown {
            return ErrOpen
        }
        b.state = HalfOpen
    }
    return nil
}

When the breaker is open and the cooldown has not passed, the caller gets ErrOpen and never touches the network. When the cooldown has passed, the breaker flips to half-open and lets this one call through as the probe.

Two more methods record the result:

func (b *Breaker) success() {
    b.mu.Lock()
    defer b.mu.Unlock()
    b.failures = 0
    b.state = Closed
}

func (b *Breaker) failure() {
    b.mu.Lock()
    defer b.mu.Unlock()

    if b.state == HalfOpen {
        b.state = Open
        b.openedAt = b.now()
        return
    }
    b.failures++
    if b.failures >= b.maxFailures {
        b.state = Open
        b.openedAt = b.now()
    }
}

A failure in half-open goes straight back to open — the probe told you the dependency is still sick, so there is no reason to let more traffic through. A failure in closed increments the counter and trips only at the threshold.

Do ties them together:

func (b *Breaker) Do(fn func() error) error {
    if err := b.allow(); err != nil {
        return err
    }
    if err := fn(); err != nil {
        b.failure()
        return err
    }
    b.success()
    return nil
}

Wrapping the adapter

Now the wrapper adapter. It holds the real gateway and a breaker, and it satisfies the same PaymentGateway interface.

// adapter/breaker/gateway.go
package breaker

import (
    "context"

    "yourapp/port"
)

type Gateway struct {
    inner   port.PaymentGateway
    breaker *Breaker
}

func Wrap(
    inner port.PaymentGateway, b *Breaker,
) *Gateway {
    return &Gateway{inner: inner, breaker: b}
}

func (g *Gateway) Authorize(
    ctx context.Context, c port.Charge,
) (string, error) {
    var ref string
    err := g.breaker.Do(func() error {
        var e error
        ref, e = g.inner.Authorize(ctx, c)
        return e
    })
    return ref, err
}

Wiring is one line in main, and it is the only line that knows a breaker exists:

// main.go
raw := httpgw.New(providerURL, apiKey)
b := breaker.New(5, 10*time.Second)
gateway := breaker.Wrap(raw, b)

checkout := app.NewCheckout(gateway, orders)

Swap gateway for raw and the breaker is gone with no other change. The use case, the port, and the domain are untouched. That swap-in-one-place property is what you get for keeping the policy at the adapter.

Not every error should trip it

A 400 Bad Request for a malformed charge is not the provider being down. Counting it as a failure would trip the breaker on your own bug and take out a healthy dependency. Only count errors that mean the dependency is unhealthy: timeouts, connection refused, 5xx, context deadlines.

Have the inner adapter classify, and let the wrapper only trip on the ones that count:

func (g *Gateway) Authorize(
    ctx context.Context, c port.Charge,
) (string, error) {
    var ref string
    err := g.breaker.Do(func() error {
        var e error
        ref, e = g.inner.Authorize(ctx, c)
        if e != nil && !isDownstreamFault(e) {
            // client-side error: return it, do not
            // count it against the breaker
            return nil
        }
        return e
    })
    return ref, err
}

Returning nil from the breaker closure when the error is client-side keeps the failure counter clean while still handing the real error back through the outer err. A 4xx from a bad request should not push you one step closer to an open circuit.

Testing it without sleeping

Because the clock is injectable, a state-machine test runs instantly and deterministically. No time.Sleep, no flakiness.

func TestBreakerOpensThenRecovers(t *testing.T) {
    now := time.Now()
    b := breaker.New(2, 10*time.Second)
    b.SetClock(func() time.Time { return now })

    fail := func() error { return errors.New("boom") }
    b.Do(fail)
    b.Do(fail) // second failure trips it

    if err := b.Do(fail); err != breaker.ErrOpen {
        t.Fatalf("want ErrOpen, got %v", err)
    }

    now = now.Add(11 * time.Second) // cooldown passes
    ok := func() error { return nil }
    if err := b.Do(ok); err != nil {
        t.Fatalf("probe should pass, got %v", err)
    }
    // success in half-open closes the breaker
}

SetClock is the one-line helper from earlier that assigns b.now. Advancing now by hand walks the state machine through open, half-open, and closed in a few microseconds. That is the payoff of not spawning a background goroutine: the whole thing is a pure function of calls and time, and both are in the test's hands.

What you actually shipped

The domain still speaks in charges and orders. The use case still calls Authorize and handles an error. The breaker is a wrapper adapter you can add, remove, or tune from one line in main. When the payment provider starts timing out, your service returns an error in microseconds instead of parking a goroutine for 30 seconds, and it probes for recovery on its own. The failure mode from the top of this post (a slow dependency dragging your whole service down) does not happen. The code that models your business never had to learn why.

If this was useful

A breaker is a small state machine, and getting it right leans on Go fundamentals: mutex discipline, an injectable clock for testable time, error wrapping so classification survives the call stack. The Complete Guide to Go Programming covers those runtime and stdlib pieces in depth. Hexagonal Architecture in Go is about the other half of this post — keeping resilience at the adapter boundary so the use case and domain stay unaware of it.