Go's net/http Server.Shutdown: The Three-Stage Drain You Probably Skip

#go #http #backend #deployment

Book: Hexagonal Architecture in Go
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

A team I know had a clean shutdown story on paper. SIGTERM came
in, they called srv.Shutdown(ctx), the function returned nil,
the pod exited. Their dashboards said zero failed requests on
deploy. Their customers said otherwise.

Two things were happening that the textbook shutdown did not catch.
WebSocket connections from a long-poll feature stayed open through
the entire shutdown window because Shutdown does not interrupt
hijacked connections. And every handler that started a background
goroutine (audit logging, cache warming, outbound webhook
fire-and-forget) was getting cut off mid-flight because Shutdown
only tracks goroutines the server itself spawned, not the ones
your handlers spin up.

Server.Shutdown is not one operation. It is three stages, and
each stage has a failure mode that does not show up in a happy-path
test.

What Shutdown actually does, stage by stage

The implementation in net/http/server.go does roughly this:

Close the listener so no new connections get accepted.
Walk the set of idle keepalive connections and close each one immediately.
Poll for the active connection count to drop to zero, with backoff capped at 500ms, while honouring the context deadline.

That is the entire algorithm. It returns nil when active
connections hit zero before the context expires, or
context.DeadlineExceeded if the timeout fires first.

What Shutdown does not do is just as important. It will not
cancel handler contexts. Hijacked connections (WebSocket upgrades,
raw TCP) get no signal. Goroutines your handlers spawned with
go doSomething() are invisible to it. Application-owned
in-memory buffers will not be flushed. Each of those is your
problem.

package main

import (
    "context"
    "errors"
    "log/slog"
    "net/http"
    "os/signal"
    "syscall"
    "time"
)

func main() {
    ctx, stop := signal.NotifyContext(
        context.Background(),
        syscall.SIGINT, syscall.SIGTERM,
    )
    defer stop()

    srv := &http.Server{
        Addr:    ":8080",
        Handler: routes(),
    }

    go func() {
        if err := srv.ListenAndServe(); err != nil &&
            !errors.Is(err, http.ErrServerClosed) {
            slog.Error("listen failed", "err", err)
        }
    }()

    <-ctx.Done()
    slog.Info("shutdown signal received")

    shutdownCtx, cancel := context.WithTimeout(
        context.Background(), 30*time.Second,
    )
    defer cancel()

    if err := srv.Shutdown(shutdownCtx); err != nil {
        slog.Error("shutdown error", "err", err)
    }
}

This is the canonical pattern. It works for a vanilla REST API
with short handlers and no upgrades. Add a single WebSocket
upgrade and the picture breaks.

Stage 1: closing the listener, and the keepalive pool surprise

Stage 1 — closing the listener — is the part nobody gets wrong.
The kernel stops handing out new sockets to your process. Any
client that had not yet connected gets a connection refused. Any
client mid-handshake races against the close.

Stage 2 is where reading the source pays off. Go's HTTP server
keeps a pool of idle keepalive connections waiting for a follow-up
request. When Shutdown runs, it walks that pool and closes each
idle connection right away. From the client's perspective, the
TCP connection drops. Most HTTP clients (including Go's own) treat
that as a transport error and retry on a fresh connection.

The retry on a fresh connection is the part that bites. You just
closed the listener. The retry hits a different replica behind your
load balancer if you are running more than one, or it hits a
"connection refused" if you are not. If the LB has not yet
de-registered the pod, retries to this replica fail loudly.

The fix is on the load-balancer side, not in your Go code.
Kubernetes ships a preStop hook for exactly this reason:

lifecycle:
  preStop:
    exec:
      command: ["sh", "-c", "sleep 10"]

The pod gets SIGTERM, ignores it for ten seconds, and during those
ten seconds the kubelet de-registers it from the Service endpoints.
By the time your Go process actually starts shutting down, no new
traffic is being routed to it. Then Shutdown can close the
listener safely. Without the preStop sleep, you race the LB and
lose.

If you run on bare ECS, Nomad, or a hand-rolled LB, the same idea
applies. The deregistration delay must exceed the time it takes
for in-flight connections to finish, or the time it takes for
clients to notice and retry elsewhere.

Stage 3: in-flight handlers — and the WebSocket trap

Stage 3 is the wait. Shutdown polls (with backoff capped at
500ms) for active connections to hit zero, bounded by your context
deadline. Active means a connection currently servicing a request,
neither idle nor hijacked.

That last word matters. A WebSocket handshake calls
http.ResponseWriter.(http.Hijacker).Hijack() which detaches the
connection from the server's tracking. Shutdown no longer counts
it. Your WebSocket connections stay open through the shutdown
window. Your context deadline fires. Shutdown returns
DeadlineExceeded. Your pod exits. The WebSocket clients see a
TCP RST and reconnect to whichever replica took over.

If you actively want to drain WebSockets gracefully, you have to
do it yourself. Track every upgraded connection, fan a "please
disconnect" signal to each one, give them a budget to finish, then
force-close on timeout.

// Assumes gorilla/websocket. Construct with:
//   hub := &WSHub{conns: map[*websocket.Conn]struct{}{}}
type WSHub struct {
    mu    sync.Mutex
    conns map[*websocket.Conn]struct{}
}

func (h *WSHub) Add(c *websocket.Conn) {
    h.mu.Lock()
    h.conns[c] = struct{}{}
    h.mu.Unlock()
}

func (h *WSHub) Remove(c *websocket.Conn) {
    h.mu.Lock()
    delete(h.conns, c)
    h.mu.Unlock()
}

func (h *WSHub) DrainClose(ctx context.Context) error {
    h.mu.Lock()
    snapshot := make([]*websocket.Conn, 0, len(h.conns))
    for c := range h.conns {
        snapshot = append(snapshot, c)
    }
    h.mu.Unlock()

    for _, c := range snapshot {
        _ = c.WriteControl(
            websocket.CloseMessage,
            websocket.FormatCloseMessage(
                websocket.CloseGoingAway, "server shutdown",
            ),
            time.Now().Add(2*time.Second),
        )
    }

    poll := time.NewTicker(200 * time.Millisecond)
    defer poll.Stop()
    for {
        select {
        case <-ctx.Done():
            h.mu.Lock()
            for c := range h.conns {
                _ = c.Close()
            }
            h.mu.Unlock()
            return ctx.Err()
        case <-poll.C:
            h.mu.Lock()
            empty := len(h.conns) == 0
            h.mu.Unlock()
            if empty {
                return nil
            }
        }
    }
}

The pattern: send a CloseGoingAway frame to every client, give
them a few seconds to round-trip the close handshake, then force.
Long-poll handlers follow the same rule from a different angle.
They have to read r.Context().Done() inside their loop and
return when it fires. If your long-poll never checks the request
context, Shutdown will time out waiting for it.

func longPoll(w http.ResponseWriter, r *http.Request) {
    ticker := time.NewTicker(500 * time.Millisecond)
    defer ticker.Stop()

    for {
        select {
        case <-r.Context().Done():
            return
        case <-ticker.C:
            // tryRead is your queue/poll source;
            // returns (msg, true) when something is available.
            if msg, ok := tryRead(r); ok {
                _, _ = w.Write(msg)
                return
            }
        }
    }
}

r.Context() is cancelled when the connection drops or when the
server is being shut down past the handler's grace period. Use it.
A handler that ignores its request context is a handler that
deadlocks shutdown.

The background-goroutine trap

This is the one that costs the most to debug. A handler does this:

func auditAndRespond(w http.ResponseWriter, r *http.Request) {
    result := doWork(r)
    go writeAudit(result)        // fire-and-forget
    _, _ = w.Write([]byte("ok"))
}

The handler returns. From Shutdown's perspective, this connection
is now idle. The audit goroutine is still running, holding a
half-finished write to your audit DB. SIGTERM arrives.
Shutdown returns immediately because no requests are active. The
process exits. The audit row never lands.

Shutdown does not know about the goroutine you spawned inside
your handler. It only tracks goroutines the server's accept loop
created. Anything your code launches is invisible to it.

The fix is to track those goroutines yourself with a WaitGroup
or errgroup, and to wait on them after Shutdown returns.

type App struct {
    bg sync.WaitGroup
}

func (a *App) auditAndRespond(
    w http.ResponseWriter, r *http.Request,
) {
    result := doWork(r)
    a.bg.Add(1)
    go func() {
        defer a.bg.Done()
        writeAudit(result)
    }()
    _, _ = w.Write([]byte("ok"))
}

func (a *App) WaitBackground(ctx context.Context) error {
    done := make(chan struct{})
    go func() {
        a.bg.Wait()
        close(done)
    }()
    select {
    case <-ctx.Done():
        return ctx.Err()
    case <-done:
        return nil
    }
}

Now your shutdown sequence is "stop accepting → finish in-flight
handlers → wait for background goroutines → exit". Three stages
become four, and you stop dropping audit rows.

If you go this route, also propagate a context to the goroutine so
it can give up gracefully on a long shutdown. Use
context.WithoutCancel(r.Context()) (Go 1.21+) when you want to
keep request-scoped values like trace ID without inheriting the
request's cancellation.

A production drain pattern with errgroup

Putting it together, the pattern that handles the three stdlib
stages plus WebSockets plus background goroutines:

func runWithGracefulShutdown(
    ctx context.Context,
    srv *http.Server,
    hub *WSHub,
    bg *sync.WaitGroup,
) error {
    serverErr := make(chan error, 1)
    go func() {
        err := srv.ListenAndServe()
        if errors.Is(err, http.ErrServerClosed) {
            err = nil
        }
        serverErr <- err
    }()

    select {
    case err := <-serverErr:
        return err
    case <-ctx.Done():
    }

    drainCtx, cancel := context.WithTimeout(
        context.Background(), 30*time.Second,
    )
    defer cancel()

    g, gctx := errgroup.WithContext(drainCtx)

    g.Go(func() error {
        return srv.Shutdown(gctx)
    })

    g.Go(func() error {
        return hub.DrainClose(gctx)
    })

    g.Go(func() error {
        done := make(chan struct{})
        go func() { bg.Wait(); close(done) }()
        select {
        case <-gctx.Done():
            return gctx.Err()
        case <-done:
            return nil
        }
    })

    return g.Wait()
}

errgroup.WithContext runs the three drain operations in parallel
under a single deadline. If the deadline fires, every goroutine
sees gctx.Done() and returns. g.Wait() returns the first error.
Order does not matter here. The three goroutines all need to
finish before the process exits, and they do not depend on each
other.

The 30-second budget should be tuned to your environment.
Kubernetes defaults terminationGracePeriodSeconds to 30; if your
drain takes longer than that, the kubelet sends SIGKILL and you
lose the rest. Match your drain budget to your grace period minus
the preStop sleep, with a margin for the runtime to actually exit.

What to do with this on Monday

Audit your shutdown code for four things, in this order:

Do you have a load-balancer deregistration delay before SIGTERM reaches your process? Without it, Stage 1 races the LB and you lose.
Do any of your handlers hijack the connection for WebSockets, SSE, or raw TCP? If yes, you need a hub that tracks them and a drain that closes them.
Do any of your handlers spawn go routines that outlive the response? If yes, track them in a WaitGroup and wait on it after Shutdown returns.
Do your long-poll or streaming handlers actually read r.Context().Done()? If not, they will hang Shutdown until your context deadline fires.

The tutorial says Shutdown(ctx) and you are done. Production
says otherwise.

If this was useful

The Hexagonal Architecture in Go
book covers the boundary between your HTTP adapter and the rest of
your application — where shutdown signals propagate, where
background work lives, and how to keep your domain code free of
the runtime concerns this post is about. It is part of Thinking
in Go, the 2-book series — paired with
Complete Guide to Go Programming
for the language, runtime, and concurrency primitives that make
patterns like this one possible.