DEV Community

kamal namdeo
kamal namdeo

Posted on

Bulkhead Pattern - Go

Go Bulkhead Pattern — Complete Reference


Table of Contents

  1. What is the Bulkhead Pattern
  2. Three Mechanisms
  3. HTTP Transport Parameters — Deep Dive
  4. Little's Law — Sizing Semaphores Correctly
  5. Full Production Implementation
  6. Wiring It Up
  7. Correctly Sized Real-World Configs
  8. Mental Model Summary

1. What is the Bulkhead Pattern

Named after watertight compartments in ship hulls. If one compartment floods, the others stay dry.

In software: isolate resources per dependency so a failure or slowdown in one cannot exhaust shared resources and take down unrelated parts of the system.

Without bulkheads:

Auth service slows down
  → goroutines pile up waiting
  → shared thread/connection pool exhausted
  → Payment service also fails (never touched auth)
  → Entire service down
Enter fullscreen mode Exit fullscreen mode

With bulkheads:

Auth service slows down
  → only auth's semaphore/pool fills up
  → auth calls get fast 503s
  → Payment service unaffected
  → Ship stays afloat
Enter fullscreen mode Exit fullscreen mode

2. Three Mechanisms

Mechanism 1 — Semaphore (Concurrency Bulkhead)

Caps how many concurrent in-flight calls exist to a dependency at any moment.

sem := semaphore.NewWeighted(20) // max 20 concurrent calls

func call(ctx context.Context) error {
    // TryAcquire = instant rejection if full (true bulkhead)
    // Acquire    = waits up to QueueTimeout for a slot
    acquireCtx, cancel := context.WithTimeout(ctx, 50*time.Millisecond)
    defer cancel()

    if err := sem.Acquire(acquireCtx, 1); err != nil {
        return fmt.Errorf("bulkhead full: %w", err) // fast 503
    }
    defer sem.Release(1)

    // make the actual call
}
Enter fullscreen mode Exit fullscreen mode

TryAcquire vs Acquire:

Method Behaviour Use when
TryAcquire(1) Instant reject if full Strict load shedding
Acquire(ctx, 1) Wait up to ctx deadline Brief queuing acceptable

Each dependency gets its own independent semaphore — they do not share state.


Mechanism 2 — Connection Pool (TCP Bulkhead)

Caps how many TCP connections exist to each host. Without this, all services share http.DefaultTransport and one slow host monopolises all connections.

transport := &http.Transport{
    MaxConnsPerHost:     20, // hard ceiling on total connections
    MaxIdleConnsPerHost: 20, // warm connections kept alive
    MaxIdleConns:        20, // global idle pool
}
Enter fullscreen mode Exit fullscreen mode

Mechanism 3 — Timeouts (Blast Radius Bulkhead)

Even if concurrency is high, tight timeouts ensure slots are released quickly when a dependency misbehaves. This is your blast radius control knob.

client := &http.Client{
    Transport: transport,
    Timeout:   600 * time.Millisecond, // absolute end-to-end deadline
}
Enter fullscreen mode Exit fullscreen mode

3. HTTP Transport Parameters — Deep Dive

transport := &http.Transport{
    MaxIdleConnsPerHost:   N,
    MaxConnsPerHost:       N,
    MaxIdleConns:          N,
    DialContext:           ...,
    TLSHandshakeTimeout:   ...,
    ResponseHeaderTimeout: ...,
    IdleConnTimeout:       ...,
}
client := &http.Client{
    Transport: transport,
    Timeout:   ...,
}
Enter fullscreen mode Exit fullscreen mode

MaxConnsPerHost — Hard Ceiling

What:    Maximum total TCP connections (active + idle) to a single host.
         New requests BLOCK when this is hit (until a connection is freed or context times out).
Effect:  This IS the TCP-level bulkhead. Beyond this number, no new connections are opened.
Enter fullscreen mode Exit fullscreen mode

How to set it: Match your semaphore's MaxConcurrent. There is no point allowing 50 concurrent calls if you only have 10 TCP connections — requests would block waiting for a free connection anyway.

MaxConnsPerHost: cfg.MaxConcurrent // keep these equal
Enter fullscreen mode Exit fullscreen mode

MaxIdleConnsPerHost — Warm Connection Cache

What:    How many idle (reusable) connections to keep open to a single host after a request completes.
Effect:  Avoids TCP+TLS handshake overhead on the next request. Higher = faster reuse, more FDs held open.
Enter fullscreen mode Exit fullscreen mode

How to set it:

  • Set equal to or slightly below MaxConnsPerHost
  • For internal services (fast, high RPS): match MaxConnsPerHost exactly — you want all connections warm
  • For external APIs (slow, low RPS): can be lower since connections are held for longer and fewer are needed warm simultaneously
MaxIdleConnsPerHost = MaxConnsPerHost       // internal, high-throughput
MaxIdleConnsPerHost = MaxConnsPerHost / 2   // external, slow/bursty
Enter fullscreen mode Exit fullscreen mode

MaxIdleConns — Global Idle Pool

What:    Total idle connections across ALL hosts combined.
Effect:  If this is too low, connections for one host get evicted to make room for another,
         causing unexpected TCP reconnects even though per-host limits haven't been hit.
Enter fullscreen mode Exit fullscreen mode

How to set it: Sum of all your per-host idle limits across all bulkheads in the process.

// If you have 3 bulkheads: auth(20) + payment(10) + inventory(15)
MaxIdleConns: 20 + 10 + 15, // = 45, avoids cross-host eviction
Enter fullscreen mode Exit fullscreen mode

Common mistake: Leaving this at Go's default of 100 when MaxIdleConnsPerHost is also 100. The global cap then silently limits what you think are independent pools.


ResponseHeaderTimeout — First Byte Timeout

What:    Time allowed between sending the request body and receiving the first byte of response headers.
Effect:  The most important timeout for catching slow/hung servers.
         Does NOT include time to read the response body.
Enter fullscreen mode Exit fullscreen mode

How to set it: Slightly above your p99 latency for this dependency. This is the "server is stuck" detector.

External API p99 = 500ms → ResponseHeaderTimeout = 600ms
Internal svc p99 = 5ms   → ResponseHeaderTimeout = 20ms
Database p99     = 20ms  → ResponseHeaderTimeout = 50ms
Enter fullscreen mode Exit fullscreen mode

http.Client.Timeout — Absolute End-to-End Deadline

What:    Time from request initiation to reading the last byte of response body.
         Includes: DNS + TCP dial + TLS + write request + wait for headers + read body.
Effect:  The outer hard deadline. Cancels the entire request if exceeded.
Enter fullscreen mode Exit fullscreen mode

How to set it: ResponseHeaderTimeout + estimated body read time. Always set this — without it you can leak goroutines indefinitely.

ResponseHeaderTimeout = 600ms  (detect slow server)
body read estimate    = 200ms  (for your expected payload size)
Client.Timeout        = 800ms  (slightly above their sum)
Enter fullscreen mode Exit fullscreen mode

DialContext.Timeout — TCP Handshake Limit

What:    Time allowed to establish the TCP connection itself.
Effect:  Catches unreachable hosts fast. Completely independent of request latency.
Enter fullscreen mode Exit fullscreen mode

How to set it: 1–3 seconds universally. TCP handshake should be near-instant on healthy networks; longer means routing problem, not slowness.

DialContext: (&net.Dialer{
    Timeout:   2 * time.Second,  // TCP dial hard limit
    KeepAlive: 30 * time.Second, // TCP keepalive probes
}).DialContext,
Enter fullscreen mode Exit fullscreen mode

TLSHandshakeTimeout — TLS Negotiation Limit

What:    Time allowed for TLS handshake after TCP is established.
Effect:  Catches TLS issues (expired certs, overloaded TLS terminator) separately from request latency.
Enter fullscreen mode Exit fullscreen mode

How to set it: 3–5 seconds. TLS handshake involves a round trip plus crypto; 5s is generous but safe.

TLSHandshakeTimeout: 5 * time.Second,
Enter fullscreen mode Exit fullscreen mode

IdleConnTimeout — Stale Connection Eviction

What:    How long an idle connection is kept in the pool before being closed.
Effect:  Prevents holding open TCP connections that the server has already closed (common after 60–90s).
Enter fullscreen mode Exit fullscreen mode

How to set it: 60–90 seconds. Most servers close idle connections at ~90s. Setting this below their limit prevents "connection reset" errors.

IdleConnTimeout: 90 * time.Second, // safe default
Enter fullscreen mode Exit fullscreen mode

QueueTimeout — Semaphore Wait Budget

What:    How long a request waits for a semaphore slot before being rejected.
Effect:  Controls the queue depth in time. Even if a slot frees in 10ms, you might not want to wait.
Enter fullscreen mode Exit fullscreen mode

How to set it: Based on your own SLA minus downstream latency.

Your SLA:              500ms total
Downstream p99:        200ms
Budget for queuing:    500 - 200 - 50(overhead) = 250ms
QueueTimeout:          ~100ms (conservative, leaves room for retries)
Enter fullscreen mode Exit fullscreen mode

For slow external APIs: lower QueueTimeout (fail fast, don't queue) For fast internal services: even lower (if pool is full, something is wrong)


All Parameters at a Glance

Parameter Controls Rule of Thumb
MaxConnsPerHost TCP connection ceiling MaxConcurrent
MaxIdleConnsPerHost Warm connection cache MaxConnsPerHost (internal), ÷2 (external)
MaxIdleConns Global idle pool = sum of all per-host idle limits
ResponseHeaderTimeout "Server is slow" detector p99 latency × 1.2
Client.Timeout Absolute end-to-end limit ResponseHeaderTimeout + body read estimate
DialContext.Timeout TCP dial limit 1–2s universally
TLSHandshakeTimeout TLS negotiation limit 3–5s universally
IdleConnTimeout Stale connection eviction 60–90s (below server's idle timeout)
QueueTimeout Semaphore wait budget Your SLA − p99 − overhead

4. Little's Law — Sizing Semaphores Correctly

L = λ × W

L = concurrent requests in-flight (what your semaphore controls)
λ = throughput (requests per second you want to sustain)
W = average time per request (latency in seconds)
Enter fullscreen mode Exit fullscreen mode

Key Insight

Slower dependency = MORE semaphore slots needed to sustain the same RPS.

Target: 100 rps

Auth svc  p99=100ms → L = 100 × 0.1 = 10 concurrent  (fast, slots free quickly)
Payment   p99=500ms → L = 100 × 0.5 = 50 concurrent  (slow, slots held longer)
Enter fullscreen mode Exit fullscreen mode

This is counterintuitive but correct: a slow service holds each semaphore slot for longer, so you need more slots to keep throughput flowing.

What Happens When Semaphore Is Too Small

Payment svc p99=500ms, target 100rps, but MaxConcurrent=10:

Max throughput = 10 / 0.5 = 20 rps   ← you're throttling yourself to 20%
Enter fullscreen mode Exit fullscreen mode

What Happens When Semaphore Is Too Large

Blast radius grows: more goroutines can pile up waiting on a misbehaving dep.
Blast radius is controlled by TIMEOUT, not concurrency.
Enter fullscreen mode Exit fullscreen mode

The Correct Formula

min_concurrent = target_rps × p99_latency_seconds
semaphore_limit = min_concurrent × 1.3   // +30% headroom for bursts
Enter fullscreen mode Exit fullscreen mode

NumCPU() is Wrong for IO-Bound Bulkheads

runtime.NumCPU() is the correct pool size only for CPU-bound work (hashing, compression, image processing) where goroutines burn CPU the entire time.

For IO-bound calls (HTTP, DB, Kafka): goroutines spend ~99% of their time parked waiting on the network. The Go scheduler puts them to sleep and runs other goroutines on the same OS thread. You can sustain thousands of concurrent IO goroutines on a 4-core machine.

IO call timeline:
[send ~50µs] [====== blocked waiting ~200ms ======] [read ~50µs]
                        ↑
              goroutine is PARKED here, OS thread runs other work

→ semaphore limit is a downstream capacity question, not a CPU question
Enter fullscreen mode Exit fullscreen mode

5. Full Production Implementation

package bulkhead

import (
    "context"
    "fmt"
    "net"
    "net/http"
    "sync/atomic"
    "time"

    "golang.org/x/sync/semaphore"
)

// Config defines the bulkhead parameters for one downstream dependency.
type Config struct {
    Name           string
    MaxConcurrent  int64         // semaphore size — derived from Little's Law: target_rps × p99_latency
    MaxConnections int           // TCP connection ceiling — keep equal to MaxConcurrent
    RequestTimeout time.Duration // absolute end-to-end HTTP deadline (blast radius knob)
    QueueTimeout   time.Duration // how long to wait for a semaphore slot before rejecting
}

// Metrics holds live counters. All fields use atomic ops — safe to read from any goroutine.
type Metrics struct {
    Accepted atomic.Int64 // requests that acquired a semaphore slot
    Rejected atomic.Int64 // requests rejected because bulkhead was full
    Errors   atomic.Int64 // requests that acquired a slot but the HTTP call failed
}

// Bulkhead combines a per-dependency semaphore and an isolated HTTP client.
// Each instance is independent — a saturated bulkhead has zero effect on others.
type Bulkhead struct {
    cfg     Config
    sem     *semaphore.Weighted // Gate 1: concurrency cap
    client  *http.Client        // Gate 2: connection pool cap + timeouts
    Metrics Metrics
}

// New creates a Bulkhead. Call once per downstream dependency at startup.
func New(cfg Config) *Bulkhead {
    transport := &http.Transport{
        // --- Gate 2: TCP-level bulkhead ---
        MaxConnsPerHost:     cfg.MaxConnections,        // hard ceiling: new reqs block when hit
        MaxIdleConnsPerHost: cfg.MaxConnections,        // warm connections: avoids TCP+TLS overhead
        MaxIdleConns:        cfg.MaxConnections,        // global pool: set >= sum of all per-host

        // --- TCP dial ---
        DialContext: (&net.Dialer{
            Timeout:   2 * time.Second,  // TCP handshake hard limit (unreachable host detection)
            KeepAlive: 30 * time.Second, // TCP keepalive probe interval
        }).DialContext,

        // --- TLS ---
        TLSHandshakeTimeout: 5 * time.Second, // TLS negotiation limit

        // --- Request lifecycle ---
        // ResponseHeaderTimeout: time between sending request and receiving first response byte.
        // This is the "server is hung" detector. Set to p99 × 1.2.
        ResponseHeaderTimeout: cfg.RequestTimeout,

        // Evict idle connections before the server closes them (~90s on most servers).
        IdleConnTimeout: 90 * time.Second,
    }

    return &Bulkhead{
        cfg: cfg,
        sem: semaphore.NewWeighted(cfg.MaxConcurrent),
        client: &http.Client{
            Transport: transport,
            // Absolute deadline covering DNS+TCP+TLS+write+headers+body.
            // Always set — without this, goroutines can leak indefinitely.
            Timeout: cfg.RequestTimeout,
        },
    }
}

// Do executes an HTTP request through the bulkhead.
// Returns an error immediately if the semaphore is full (after QueueTimeout).
func (b *Bulkhead) Do(ctx context.Context, req *http.Request) (*http.Response, error) {
    // --- Gate 1: Semaphore ---
    // Use a child context with QueueTimeout so we don't wait forever for a slot.
    acquireCtx, cancel := context.WithTimeout(ctx, b.cfg.QueueTimeout)
    defer cancel()

    if err := b.sem.Acquire(acquireCtx, 1); err != nil {
        b.Metrics.Rejected.Add(1)
        return nil, fmt.Errorf("bulkhead [%s] rejected (semaphore full after %s): %w",
            b.cfg.Name, b.cfg.QueueTimeout, err)
    }
    defer b.sem.Release(1) // always release, even on HTTP error

    b.Metrics.Accepted.Add(1)

    // --- Gate 2: HTTP client (connection pool + timeouts) ---
    resp, err := b.client.Do(req.WithContext(ctx))
    if err != nil {
        b.Metrics.Errors.Add(1)
        return nil, fmt.Errorf("bulkhead [%s] call failed: %w", b.cfg.Name, err)
    }

    return resp, nil
}

// Stats returns a human-readable snapshot of metrics.
func (b *Bulkhead) Stats() string {
    return fmt.Sprintf("[%s] accepted=%d rejected=%d errors=%d",
        b.cfg.Name,
        b.Metrics.Accepted.Load(),
        b.Metrics.Rejected.Load(),
        b.Metrics.Errors.Load(),
    )
}
Enter fullscreen mode Exit fullscreen mode

Why a Separate Semaphore per Dependency is Correct

authBulkhead.sem   ──► controls only auth calls
paymentBulkhead.sem ──► controls only payment calls

Auth saturated:
  authBulkhead.sem full → auth calls rejected
  paymentBulkhead.sem unaffected → payment calls proceed normally
Enter fullscreen mode Exit fullscreen mode

A shared semaphore across all dependencies defeats the purpose entirely.


6. Wiring It Up

package main

import (
    "context"
    "fmt"
    "io"
    "net/http"
    "time"

    "yourmodule/bulkhead"
)

// Declare one bulkhead per downstream dependency at package level.
// These are long-lived, safe to use concurrently.
var (
    authBulkhead    = bulkhead.New(authConfig())
    paymentBulkhead = bulkhead.New(paymentConfig())
)

func HandleOrder(ctx context.Context) error {
    // Auth call — isolated behind its own semaphore + connection pool
    req, _ := http.NewRequest("GET", "https://auth.internal/verify", nil)
    resp, err := authBulkhead.Do(ctx, req)
    if err != nil {
        // Bulkhead full or auth down — fail this request fast.
        // Payment bulkhead is completely unaffected.
        return fmt.Errorf("auth unavailable: %w", err)
    }
    defer resp.Body.Close()
    io.Copy(io.Discard, resp.Body) // always drain body to return connection to pool

    // Payment call — completely isolated from auth
    req2, _ := http.NewRequest("POST", "https://payments.internal/charge", nil)
    resp2, err := paymentBulkhead.Do(ctx, req2)
    if err != nil {
        return fmt.Errorf("payment unavailable: %w", err)
    }
    defer resp2.Body.Close()
    io.Copy(io.Discard, resp2.Body)

    return nil
}
Enter fullscreen mode Exit fullscreen mode

Important: Always io.Copy(io.Discard, resp.Body) before closing. If you close without draining, Go cannot reuse the TCP connection — it gets thrown away, causing unnecessary TCP handshakes on the next request.


7. Correctly Sized Real-World Configs

Sizing Formula

min_concurrent  = target_rps × p99_latency_seconds
semaphore_limit = min_concurrent × 1.3   // +30% burst headroom
MaxConnections  = semaphore_limit        // keep aligned
RequestTimeout  = p99_latency × 1.2     // slightly above p99, blast radius knob
QueueTimeout    = your_SLA - p99 - overhead_budget
Enter fullscreen mode Exit fullscreen mode

External HTTP API (slow, p99=500ms, target 30 rps)

// Little's Law: 30 × 0.5 = 15, +30% = 20
// Blast radius: controlled by tight RequestTimeout (600ms), NOT by low concurrency
// Slower dep → more slots needed to sustain throughput
func externalAPIConfig() bulkhead.Config {
    return bulkhead.Config{
        Name:           "external-auth-api",
        MaxConcurrent:  20,                       // L = 30rps × 0.5s = 15, +30% = 20
        MaxConnections: 20,
        RequestTimeout: 600 * time.Millisecond,   // p99(500ms) × 1.2 — blast radius knob
        QueueTimeout:   100 * time.Millisecond,   // don't queue long for slow dep
    }
}
Enter fullscreen mode Exit fullscreen mode

Internal Microservice (fast, p99=5ms, target 500 rps)

// Little's Law: 500 × 0.005 = 2.5 → floor at ~10 (connection overhead)
// Faster dep → naturally fewer slots needed, slots free in milliseconds
func internalServiceConfig() bulkhead.Config {
    return bulkhead.Config{
        Name:           "inventory-internal",
        MaxConcurrent:  10,                       // L = 500rps × 0.005s = 2.5, floor at 10
        MaxConnections: 10,
        RequestTimeout: 20 * time.Millisecond,    // tight: it's internal, same DC
        QueueTimeout:   5 * time.Millisecond,     // fail fast: if pool busy, something is wrong
    }
}
Enter fullscreen mode Exit fullscreen mode

PostgreSQL (p99=20ms, target 200 rps, 5 app instances)

// Little's Law: 200 × 0.02 = 4 per instance
// Hard constraint: Postgres max_connections=100, shared across 5 instances → 20 per instance
// Use: max(Little's Law result × 1.3, safety floor) but never exceed hard cap
func postgresConfig() bulkhead.Config {
    return bulkhead.Config{
        Name:           "postgres",
        MaxConcurrent:  10,                       // L=4, ×1.3=5.2, safety floor=10, hard cap=20 ✓
        MaxConnections: 10,
        RequestTimeout: 100 * time.Millisecond,   // DB should be fast; slow DB = problem
        QueueTimeout:   20 * time.Millisecond,
    }
}
Enter fullscreen mode Exit fullscreen mode

Kafka Producer (p99=2ms, target 2000 rps)

// Little's Law: 2000 × 0.002 = 4 concurrent
// Kafka batches internally — keep TCP connections low, semaphore has headroom for bursts
func kafkaConfig() bulkhead.Config {
    return bulkhead.Config{
        Name:           "kafka-producer",
        MaxConcurrent:  10,                       // L=4, headroom for burst
        MaxConnections: 3,                        // Kafka multiplexes internally; few TCP conns enough
        RequestTimeout: 50 * time.Millisecond,
        QueueTimeout:   5 * time.Millisecond,
    }
}
Enter fullscreen mode Exit fullscreen mode

Summary Table

Dependency p99 Target RPS L=λ×W +30% MaxConcurrent RequestTimeout QueueTimeout
External API 500ms 30 15 20 20 600ms 100ms
Internal svc 5ms 500 2.5 4* 10* 20ms 5ms
Postgres 20ms 200 4 5** 10** 100ms 20ms
Kafka 2ms 2000 4 5 10 50ms 5ms
  • floor applied: connection establishment overhead makes sub-10 impractical ** floor applied, hard cap from Postgres max_connections / num_instances

8. Mental Model Summary

Two Laws, Two Knobs

Little's Law          → sets MaxConcurrent
  slow dep = MORE slots (slots held longer)
  fast dep = FEWER slots (slots released quickly)

Blast Radius Control  → sets RequestTimeout + QueueTimeout
  NOT concurrency — tight timeouts release slots fast
  even if MaxConcurrent is high, a 600ms timeout limits damage
Enter fullscreen mode Exit fullscreen mode

The Failure Isolation Guarantee

Normal:                         Auth saturated:

User → Auth    ✓               User → Auth    ✗ (semaphore full → instant 503)
User → Payment ✓               User → Payment ✓ (own semaphore, unaffected)
User → S3      ✓               User → S3      ✓ (own semaphore, unaffected)
Enter fullscreen mode Exit fullscreen mode

What Each Layer Protects Against

Layer                   | Protects against
────────────────────────┼──────────────────────────────────────────
Semaphore               | Goroutine pile-up from slow/hung deps
Connection pool         | TCP connection exhaustion from one host
ResponseHeaderTimeout   | Server that accepts connection but never responds
Client.Timeout          | Goroutine leak from infinite response body reads
QueueTimeout            | Cascading slowdowns from queue buildup
Enter fullscreen mode Exit fullscreen mode

Bulkhead vs Circuit Breaker

These are complementary:

Pattern Question it answers Action
Bulkhead "How much capacity do I allocate?" Limits concurrent slots per dep
Circuit Breaker "Should I even try this call?" Opens when error rate exceeds threshold

Use both together: bulkhead limits blast radius, circuit breaker stops calling a dep that's known to be down.


Reference covers: golang.org/x/sync/semaphorenet/http.Transport, Little's Law (L=λW), Go scheduler IO parking, per-dependency isolation.

Note

So the minimal complete config for a single-host (not shared amont multiple deps) bulkhead transport is:

transport := &http.Transport{
    MaxConnsPerHost:     20,               // bulkhead ceiling
    MaxIdleConnsPerHost: 20,               // full connection reuse
    IdleConnTimeout:     90 * time.Second, // evict before server does

    DialContext: (&net.Dialer{
        Timeout:   2 * time.Second,
        KeepAlive: 30 * time.Second,
    }).DialContext,
    TLSHandshakeTimeout:   5 * time.Second,
    ResponseHeaderTimeout: cfg.RequestTimeout,
}
Enter fullscreen mode Exit fullscreen mode

Everything else (MaxIdleConnsMaxConnsPerHost without PerHost suffix) is either irrelevant for single-host transports or has safe defaults for this use case.

Top comments (0)