Kamal Namdeo

Posted on Jun 14

Bulkhead Pattern - Go

#architecture #distributedsystems #go #systemdesign

Go Bulkhead Pattern — Complete Reference

What is the Bulkhead Pattern
Three Mechanisms
HTTP Transport Parameters — Deep Dive
Little's Law — Sizing Semaphores Correctly
Full Production Implementation
Wiring It Up
Correctly Sized Real-World Configs
Mental Model Summary

1. What is the Bulkhead Pattern

Named after watertight compartments in ship hulls. If one compartment floods, the others stay dry.

In software: isolate resources per dependency so a failure or slowdown in one cannot exhaust shared resources and take down unrelated parts of the system.

Without bulkheads:

Auth service slows down
  → goroutines pile up waiting
  → shared thread/connection pool exhausted
  → Payment service also fails (never touched auth)
  → Entire service down

With bulkheads:

Auth service slows down
  → only auth's semaphore/pool fills up
  → auth calls get fast 503s
  → Payment service unaffected
  → Ship stays afloat

2. Three Mechanisms

Mechanism 1 — Semaphore (Concurrency Bulkhead)

Caps how many concurrent in-flight calls exist to a dependency at any moment.

sem := semaphore.NewWeighted(20) // max 20 concurrent calls

func call(ctx context.Context) error {
    // TryAcquire = instant rejection if full (true bulkhead)
    // Acquire    = waits up to QueueTimeout for a slot
    acquireCtx, cancel := context.WithTimeout(ctx, 50*time.Millisecond)
    defer cancel()

    if err := sem.Acquire(acquireCtx, 1); err != nil {
        return fmt.Errorf("bulkhead full: %w", err) // fast 503
    }
    defer sem.Release(1)

    // make the actual call
}

TryAcquire vs Acquire:

Method	Behaviour	Use when
`TryAcquire(1)`	Instant reject if full	Strict load shedding
`Acquire(ctx, 1)`	Wait up to ctx deadline	Brief queuing acceptable

Each dependency gets its own independent semaphore — they do not share state.

Mechanism 2 — Connection Pool (TCP Bulkhead)

Caps how many TCP connections exist to each host. Without this, all services share http.DefaultTransport and one slow host monopolises all connections.

transport := &http.Transport{
    MaxConnsPerHost:     20, // hard ceiling on total connections
    MaxIdleConnsPerHost: 20, // warm connections kept alive
    MaxIdleConns:        20, // global idle pool
}

Mechanism 3 — Timeouts (Blast Radius Bulkhead)

Even if concurrency is high, tight timeouts ensure slots are released quickly when a dependency misbehaves. This is your blast radius control knob.

client := &http.Client{
    Transport: transport,
    Timeout:   600 * time.Millisecond, // absolute end-to-end deadline
}

3. HTTP Transport Parameters — Deep Dive

transport := &http.Transport{
    MaxIdleConnsPerHost:   N,
    MaxConnsPerHost:       N,
    MaxIdleConns:          N,
    DialContext:           ...,
    TLSHandshakeTimeout:   ...,
    ResponseHeaderTimeout: ...,
    IdleConnTimeout:       ...,
}
client := &http.Client{
    Transport: transport,
    Timeout:   ...,
}

`MaxConnsPerHost` — Hard Ceiling

What:    Maximum total TCP connections (active + idle) to a single host.
         New requests BLOCK when this is hit (until a connection is freed or context times out).
Effect:  This IS the TCP-level bulkhead. Beyond this number, no new connections are opened.

How to set it: Match your semaphore's MaxConcurrent. There is no point allowing 50 concurrent calls if you only have 10 TCP connections — requests would block waiting for a free connection anyway.

MaxConnsPerHost: cfg.MaxConcurrent // keep these equal

`MaxIdleConnsPerHost` — Warm Connection Cache

What:    How many idle (reusable) connections to keep open to a single host after a request completes.
Effect:  Avoids TCP+TLS handshake overhead on the next request. Higher = faster reuse, more FDs held open.

How to set it:

Set equal to or slightly below MaxConnsPerHost
For internal services (fast, high RPS): match MaxConnsPerHost exactly — you want all connections warm
For external APIs (slow, low RPS): can be lower since connections are held for longer and fewer are needed warm simultaneously

MaxIdleConnsPerHost = MaxConnsPerHost       // internal, high-throughput
MaxIdleConnsPerHost = MaxConnsPerHost / 2   // external, slow/bursty

`MaxIdleConns` — Global Idle Pool

What:    Total idle connections across ALL hosts combined.
Effect:  If this is too low, connections for one host get evicted to make room for another,
         causing unexpected TCP reconnects even though per-host limits haven't been hit.

How to set it: Sum of all your per-host idle limits across all bulkheads in the process.

// If you have 3 bulkheads: auth(20) + payment(10) + inventory(15)
MaxIdleConns: 20 + 10 + 15, // = 45, avoids cross-host eviction

Common mistake: Leaving this at Go's default of 100 when MaxIdleConnsPerHost is also 100. The global cap then silently limits what you think are independent pools.

`ResponseHeaderTimeout` — First Byte Timeout

What:    Time allowed between sending the request body and receiving the first byte of response headers.
Effect:  The most important timeout for catching slow/hung servers.
         Does NOT include time to read the response body.

How to set it: Slightly above your p99 latency for this dependency. This is the "server is stuck" detector.

External API p99 = 500ms → ResponseHeaderTimeout = 600ms
Internal svc p99 = 5ms   → ResponseHeaderTimeout = 20ms
Database p99     = 20ms  → ResponseHeaderTimeout = 50ms

`http.Client.Timeout` — Absolute End-to-End Deadline

What:    Time from request initiation to reading the last byte of response body.
         Includes: DNS + TCP dial + TLS + write request + wait for headers + read body.
Effect:  The outer hard deadline. Cancels the entire request if exceeded.

How to set it: ResponseHeaderTimeout + estimated body read time. Always set this — without it you can leak goroutines indefinitely.

ResponseHeaderTimeout = 600ms  (detect slow server)
body read estimate    = 200ms  (for your expected payload size)
Client.Timeout        = 800ms  (slightly above their sum)

`DialContext.Timeout` — TCP Handshake Limit

What:    Time allowed to establish the TCP connection itself.
Effect:  Catches unreachable hosts fast. Completely independent of request latency.

How to set it: 1–3 seconds universally. TCP handshake should be near-instant on healthy networks; longer means routing problem, not slowness.

DialContext: (&net.Dialer{
    Timeout:   2 * time.Second,  // TCP dial hard limit
    KeepAlive: 30 * time.Second, // TCP keepalive probes
}).DialContext,

`TLSHandshakeTimeout` — TLS Negotiation Limit

What:    Time allowed for TLS handshake after TCP is established.
Effect:  Catches TLS issues (expired certs, overloaded TLS terminator) separately from request latency.

How to set it: 3–5 seconds. TLS handshake involves a round trip plus crypto; 5s is generous but safe.

TLSHandshakeTimeout: 5 * time.Second,

`IdleConnTimeout` — Stale Connection Eviction

What:    How long an idle connection is kept in the pool before being closed.
Effect:  Prevents holding open TCP connections that the server has already closed (common after 60–90s).

How to set it: 60–90 seconds. Most servers close idle connections at ~90s. Setting this below their limit prevents "connection reset" errors.

IdleConnTimeout: 90 * time.Second, // safe default

`QueueTimeout` — Semaphore Wait Budget

What:    How long a request waits for a semaphore slot before being rejected.
Effect:  Controls the queue depth in time. Even if a slot frees in 10ms, you might not want to wait.

How to set it: Based on your own SLA minus downstream latency.

Your SLA:              500ms total
Downstream p99:        200ms
Budget for queuing:    500 - 200 - 50(overhead) = 250ms
QueueTimeout:          ~100ms (conservative, leaves room for retries)

For slow external APIs: lower QueueTimeout (fail fast, don't queue) For fast internal services: even lower (if pool is full, something is wrong)

All Parameters at a Glance

Parameter	Controls	Rule of Thumb
`MaxConnsPerHost`	TCP connection ceiling	= `MaxConcurrent`
`MaxIdleConnsPerHost`	Warm connection cache	= `MaxConnsPerHost` (internal), ÷2 (external)
`MaxIdleConns`	Global idle pool	= sum of all per-host idle limits
`ResponseHeaderTimeout`	"Server is slow" detector	p99 latency × 1.2
`Client.Timeout`	Absolute end-to-end limit	`ResponseHeaderTimeout` + body read estimate
`DialContext.Timeout`	TCP dial limit	1–2s universally
`TLSHandshakeTimeout`	TLS negotiation limit	3–5s universally
`IdleConnTimeout`	Stale connection eviction	60–90s (below server's idle timeout)
`QueueTimeout`	Semaphore wait budget	Your SLA − p99 − overhead

4. Little's Law — Sizing Semaphores Correctly

L = λ × W

L = concurrent requests in-flight (what your semaphore controls)
λ = throughput (requests per second you want to sustain)
W = average time per request (latency in seconds)

Key Insight

Slower dependency = MORE semaphore slots needed to sustain the same RPS.

Target: 100 rps

Auth svc  p99=100ms → L = 100 × 0.1 = 10 concurrent  (fast, slots free quickly)
Payment   p99=500ms → L = 100 × 0.5 = 50 concurrent  (slow, slots held longer)

This is counterintuitive but correct: a slow service holds each semaphore slot for longer, so you need more slots to keep throughput flowing.

What Happens When Semaphore Is Too Small

Payment svc p99=500ms, target 100rps, but MaxConcurrent=10:

Max throughput = 10 / 0.5 = 20 rps   ← you're throttling yourself to 20%

What Happens When Semaphore Is Too Large

Blast radius grows: more goroutines can pile up waiting on a misbehaving dep.
Blast radius is controlled by TIMEOUT, not concurrency.

The Correct Formula

min_concurrent = target_rps × p99_latency_seconds
semaphore_limit = min_concurrent × 1.3   // +30% headroom for bursts

NumCPU() is Wrong for IO-Bound Bulkheads

runtime.NumCPU() is the correct pool size only for CPU-bound work (hashing, compression, image processing) where goroutines burn CPU the entire time.

For IO-bound calls (HTTP, DB, Kafka): goroutines spend ~99% of their time parked waiting on the network. The Go scheduler puts them to sleep and runs other goroutines on the same OS thread. You can sustain thousands of concurrent IO goroutines on a 4-core machine.

IO call timeline:
[send ~50µs] [====== blocked waiting ~200ms ======] [read ~50µs]
                        ↑
              goroutine is PARKED here, OS thread runs other work

→ semaphore limit is a downstream capacity question, not a CPU question

5. Full Production Implementation

package bulkhead

import (
    "context"
    "fmt"
    "net"
    "net/http"
    "sync/atomic"
    "time"

    "golang.org/x/sync/semaphore"
)

// Config defines the bulkhead parameters for one downstream dependency.
type Config struct {
    Name           string
    MaxConcurrent  int64         // semaphore size — derived from Little's Law: target_rps × p99_latency
    MaxConnections int           // TCP connection ceiling — keep equal to MaxConcurrent
    RequestTimeout time.Duration // absolute end-to-end HTTP deadline (blast radius knob)
    QueueTimeout   time.Duration // how long to wait for a semaphore slot before rejecting
}

// Metrics holds live counters. All fields use atomic ops — safe to read from any goroutine.
type Metrics struct {
    Accepted atomic.Int64 // requests that acquired a semaphore slot
    Rejected atomic.Int64 // requests rejected because bulkhead was full
    Errors   atomic.Int64 // requests that acquired a slot but the HTTP call failed
}

// Bulkhead combines a per-dependency semaphore and an isolated HTTP client.
// Each instance is independent — a saturated bulkhead has zero effect on others.
type Bulkhead struct {
    cfg     Config
    sem     *semaphore.Weighted // Gate 1: concurrency cap
    client  *http.Client        // Gate 2: connection pool cap + timeouts
    Metrics Metrics
}

// New creates a Bulkhead. Call once per downstream dependency at startup.
func New(cfg Config) *Bulkhead {
    transport := &http.Transport{
        // --- Gate 2: TCP-level bulkhead ---
        MaxConnsPerHost:     cfg.MaxConnections,        // hard ceiling: new reqs block when hit
        MaxIdleConnsPerHost: cfg.MaxConnections,        // warm connections: avoids TCP+TLS overhead
        MaxIdleConns:        cfg.MaxConnections,        // global pool: set >= sum of all per-host

        // --- TCP dial ---
        DialContext: (&net.Dialer{
            Timeout:   2 * time.Second,  // TCP handshake hard limit (unreachable host detection)
            KeepAlive: 30 * time.Second, // TCP keepalive probe interval
        }).DialContext,

        // --- TLS ---
        TLSHandshakeTimeout: 5 * time.Second, // TLS negotiation limit

        // --- Request lifecycle ---
        // ResponseHeaderTimeout: time between sending request and receiving first response byte.
        // This is the "server is hung" detector. Set to p99 × 1.2.
        ResponseHeaderTimeout: cfg.RequestTimeout,

        // Evict idle connections before the server closes them (~90s on most servers).
        IdleConnTimeout: 90 * time.Second,
    }

    return &Bulkhead{
        cfg: cfg,
        sem: semaphore.NewWeighted(cfg.MaxConcurrent),
        client: &http.Client{
            Transport: transport,
            // Absolute deadline covering DNS+TCP+TLS+write+headers+body.
            // Always set — without this, goroutines can leak indefinitely.
            Timeout: cfg.RequestTimeout,
        },
    }
}

// Do executes an HTTP request through the bulkhead.
// Returns an error immediately if the semaphore is full (after QueueTimeout).
func (b *Bulkhead) Do(ctx context.Context, req *http.Request) (*http.Response, error) {
    // --- Gate 1: Semaphore ---
    // Use a child context with QueueTimeout so we don't wait forever for a slot.
    acquireCtx, cancel := context.WithTimeout(ctx, b.cfg.QueueTimeout)
    defer cancel()

    if err := b.sem.Acquire(acquireCtx, 1); err != nil {
        b.Metrics.Rejected.Add(1)
        return nil, fmt.Errorf("bulkhead [%s] rejected (semaphore full after %s): %w",
            b.cfg.Name, b.cfg.QueueTimeout, err)
    }
    defer b.sem.Release(1) // always release, even on HTTP error

    b.Metrics.Accepted.Add(1)

    // --- Gate 2: HTTP client (connection pool + timeouts) ---
    resp, err := b.client.Do(req.WithContext(ctx))
    if err != nil {
        b.Metrics.Errors.Add(1)
        return nil, fmt.Errorf("bulkhead [%s] call failed: %w", b.cfg.Name, err)
    }

    return resp, nil
}

// Stats returns a human-readable snapshot of metrics.
func (b *Bulkhead) Stats() string {
    return fmt.Sprintf("[%s] accepted=%d rejected=%d errors=%d",
        b.cfg.Name,
        b.Metrics.Accepted.Load(),
        b.Metrics.Rejected.Load(),
        b.Metrics.Errors.Load(),
    )
}

Why a Separate Semaphore per Dependency is Correct

authBulkhead.sem   ──► controls only auth calls
paymentBulkhead.sem ──► controls only payment calls

Auth saturated:
  authBulkhead.sem full → auth calls rejected
  paymentBulkhead.sem unaffected → payment calls proceed normally

A shared semaphore across all dependencies defeats the purpose entirely.

6. Wiring It Up

package main

import (
    "context"
    "fmt"
    "io"
    "net/http"
    "time"

    "yourmodule/bulkhead"
)

// Declare one bulkhead per downstream dependency at package level.
// These are long-lived, safe to use concurrently.
var (
    authBulkhead    = bulkhead.New(authConfig())
    paymentBulkhead = bulkhead.New(paymentConfig())
)

func HandleOrder(ctx context.Context) error {
    // Auth call — isolated behind its own semaphore + connection pool
    req, _ := http.NewRequest("GET", "https://auth.internal/verify", nil)
    resp, err := authBulkhead.Do(ctx, req)
    if err != nil {
        // Bulkhead full or auth down — fail this request fast.
        // Payment bulkhead is completely unaffected.
        return fmt.Errorf("auth unavailable: %w", err)
    }
    defer resp.Body.Close()
    io.Copy(io.Discard, resp.Body) // always drain body to return connection to pool

    // Payment call — completely isolated from auth
    req2, _ := http.NewRequest("POST", "https://payments.internal/charge", nil)
    resp2, err := paymentBulkhead.Do(ctx, req2)
    if err != nil {
        return fmt.Errorf("payment unavailable: %w", err)
    }
    defer resp2.Body.Close()
    io.Copy(io.Discard, resp2.Body)

    return nil
}

Important: Always io.Copy(io.Discard, resp.Body) before closing. If you close without draining, Go cannot reuse the TCP connection — it gets thrown away, causing unnecessary TCP handshakes on the next request.

7. Correctly Sized Real-World Configs

Sizing Formula

min_concurrent  = target_rps × p99_latency_seconds
semaphore_limit = min_concurrent × 1.3   // +30% burst headroom
MaxConnections  = semaphore_limit        // keep aligned
RequestTimeout  = p99_latency × 1.2     // slightly above p99, blast radius knob
QueueTimeout    = your_SLA - p99 - overhead_budget

External HTTP API (slow, p99=500ms, target 30 rps)

// Little's Law: 30 × 0.5 = 15, +30% = 20
// Blast radius: controlled by tight RequestTimeout (600ms), NOT by low concurrency
// Slower dep → more slots needed to sustain throughput
func externalAPIConfig() bulkhead.Config {
    return bulkhead.Config{
        Name:           "external-auth-api",
        MaxConcurrent:  20,                       // L = 30rps × 0.5s = 15, +30% = 20
        MaxConnections: 20,
        RequestTimeout: 600 * time.Millisecond,   // p99(500ms) × 1.2 — blast radius knob
        QueueTimeout:   100 * time.Millisecond,   // don't queue long for slow dep
    }
}

Internal Microservice (fast, p99=5ms, target 500 rps)

// Little's Law: 500 × 0.005 = 2.5 → floor at ~10 (connection overhead)
// Faster dep → naturally fewer slots needed, slots free in milliseconds
func internalServiceConfig() bulkhead.Config {
    return bulkhead.Config{
        Name:           "inventory-internal",
        MaxConcurrent:  10,                       // L = 500rps × 0.005s = 2.5, floor at 10
        MaxConnections: 10,
        RequestTimeout: 20 * time.Millisecond,    // tight: it's internal, same DC
        QueueTimeout:   5 * time.Millisecond,     // fail fast: if pool busy, something is wrong
    }
}

PostgreSQL (p99=20ms, target 200 rps, 5 app instances)

// Little's Law: 200 × 0.02 = 4 per instance
// Hard constraint: Postgres max_connections=100, shared across 5 instances → 20 per instance
// Use: max(Little's Law result × 1.3, safety floor) but never exceed hard cap
func postgresConfig() bulkhead.Config {
    return bulkhead.Config{
        Name:           "postgres",
        MaxConcurrent:  10,                       // L=4, ×1.3=5.2, safety floor=10, hard cap=20 ✓
        MaxConnections: 10,
        RequestTimeout: 100 * time.Millisecond,   // DB should be fast; slow DB = problem
        QueueTimeout:   20 * time.Millisecond,
    }
}

Kafka Producer (p99=2ms, target 2000 rps)

// Little's Law: 2000 × 0.002 = 4 concurrent
// Kafka batches internally — keep TCP connections low, semaphore has headroom for bursts
func kafkaConfig() bulkhead.Config {
    return bulkhead.Config{
        Name:           "kafka-producer",
        MaxConcurrent:  10,                       // L=4, headroom for burst
        MaxConnections: 3,                        // Kafka multiplexes internally; few TCP conns enough
        RequestTimeout: 50 * time.Millisecond,
        QueueTimeout:   5 * time.Millisecond,
    }
}

Summary Table

Dependency	p99	Target RPS	L=λ×W	+30%	MaxConcurrent	RequestTimeout	QueueTimeout
External API	500ms	30	15	20	20	600ms	100ms
Internal svc	5ms	500	2.5	4*	10*	20ms	5ms
Postgres	20ms	200	4	5**	10**	100ms	20ms
Kafka	2ms	2000	4	5	10	50ms	5ms

floor applied: connection establishment overhead makes sub-10 impractical ** floor applied, hard cap from Postgres max_connections / num_instances

8. Mental Model Summary

Two Laws, Two Knobs

Little's Law          → sets MaxConcurrent
  slow dep = MORE slots (slots held longer)
  fast dep = FEWER slots (slots released quickly)

Blast Radius Control  → sets RequestTimeout + QueueTimeout
  NOT concurrency — tight timeouts release slots fast
  even if MaxConcurrent is high, a 600ms timeout limits damage

The Failure Isolation Guarantee

Normal:                         Auth saturated:

User → Auth    ✓               User → Auth    ✗ (semaphore full → instant 503)
User → Payment ✓               User → Payment ✓ (own semaphore, unaffected)
User → S3      ✓               User → S3      ✓ (own semaphore, unaffected)

What Each Layer Protects Against

Layer                   | Protects against
────────────────────────┼──────────────────────────────────────────
Semaphore               | Goroutine pile-up from slow/hung deps
Connection pool         | TCP connection exhaustion from one host
ResponseHeaderTimeout   | Server that accepts connection but never responds
Client.Timeout          | Goroutine leak from infinite response body reads
QueueTimeout            | Cascading slowdowns from queue buildup

Bulkhead vs Circuit Breaker

These are complementary:

Pattern	Question it answers	Action
Bulkhead	"How much capacity do I allocate?"	Limits concurrent slots per dep
Circuit Breaker	"Should I even try this call?"	Opens when error rate exceeds threshold

Use both together: bulkhead limits blast radius, circuit breaker stops calling a dep that's known to be down.

Reference covers: golang.org/x/sync/semaphore, net/http.Transport, Little's Law (L=λW), Go scheduler IO parking, per-dependency isolation.

Note

So the minimal complete config for a single-host (not shared amont multiple deps) bulkhead transport is:

transport := &http.Transport{
    MaxConnsPerHost:     20,               // bulkhead ceiling
    MaxIdleConnsPerHost: 20,               // full connection reuse
    IdleConnTimeout:     90 * time.Second, // evict before server does

    DialContext: (&net.Dialer{
        Timeout:   2 * time.Second,
        KeepAlive: 30 * time.Second,
    }).DialContext,
    TLSHandshakeTimeout:   5 * time.Second,
    ResponseHeaderTimeout: cfg.RequestTimeout,
}

Everything else (MaxIdleConns, MaxConnsPerHost without PerHost suffix) is either irrelevant for single-host transports or has safe defaults for this use case.