context.WithoutCancel: When You Need Work That Outlives Its Caller

#go #concurrency #stdlib #backend

Book: The Complete Guide to Go Programming
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

A team I worked with had a billing service. Every successful
checkout fired one final write: an audit-log row, async, after the
HTTP response had already gone out. The handler did the right
shape: spawn a goroutine, pass the request context, write the
audit row.

Then they turned on aggressive client-side cancellation. The
mobile client tore down the request the instant it had its 200,
because mobile networks. The HTTP server cancelled the request
context the moment the client hung up. The goroutine inherited
that cancellation and the audit-log write returned
context.Canceled before it ever hit the database.

The audit log started missing rows. Nobody noticed for two weeks
because the rows that were there looked normal. The auditor
noticed.

The fix is context.WithoutCancel, added in Go 1.21. It returns
a new context that carries the same values as its parent but is
never cancelled by the parent's cancellation. That is the right
tool for any work that needs to outlive the caller: final writes
on shutdown, fire-and-forget audit logs, the last span you want
to ship to your tracer before the process exits.

Why context.Background is the wrong reflex

The reflex most engineers reach for, when they realise the
request-context-cancelled-the-write bug exists, is to swap the
context for context.Background(). That stops the cancellation
from propagating, which is the correct half of the fix. It also
throws away every value the parent context was carrying, which is
the wrong half.

Look at what a typical request context carries by the time it gets
to your audit handler:

The trace ID and span context (OTel propagation).
The authenticated user / tenant ID injected by middleware.
A request-scoped logger pre-populated with the request ID.
Any feature-flag overrides resolved at the edge.
Deadlines (you want to drop these, but not the rest).

context.Background() drops all of it. The audit row goes out
without a tenant ID, the trace span has no parent, and your
on-call engineer cannot grep the log line back to the request that
caused it. You have replaced one bug with two.

context.WithoutCancel(ctx) keeps every value. It only severs the
cancellation signal and the deadline (see the next section). The
trace ID, user ID, logger, and feature flags all survive.

package audit

import (
    "context"
    "log/slog"
)

// Event is the audit payload; writeAudit persists it to the
// audit table. Imports omitted in subsequent snippets for brevity.
type Event struct{ ID string }

func writeAudit(ctx context.Context, e Event) error { return nil }

// inside the HTTP handler, after the response has been written
func fireAndForgetAudit(ctx context.Context, e Event) {
    detached := context.WithoutCancel(ctx)
    go func() {
        if err := writeAudit(detached, e); err != nil {
            slog.ErrorContext(detached, "audit write failed",
                "err", err, "event", e.ID)
        }
    }()
}

The goroutine survives the request being cancelled. The audit row
still carries the trace ID. The logger inside writeAudit reads
the request ID off the detached context and tags the log line with
it. Nothing else changes.

The deadline trap

WithoutCancel strips the deadline along with the cancellation.
That is the contract. The detached context will never be cancelled
by the parent, and Deadline() returns ok=false.

This is what you want for the cancellation half of the bug. It is
also where the next bug hides. A goroutine that depends on its
context to bound runtime now has no bound. If the database hangs
because the connection pool is saturated, the audit-log goroutine
hangs forever. You ship a goroutine leak.

The fix is to put the deadline back, on purpose, with the budget
that the detached work actually needs:

func fireAndForgetAudit(ctx context.Context, e Event) {
    detached := context.WithoutCancel(ctx)
    bounded, cancel := context.WithTimeout(detached, 5*time.Second)
    go func() {
        defer cancel()
        if err := writeAudit(bounded, e); err != nil {
            slog.ErrorContext(bounded, "audit write failed",
                "err", err, "event", e.ID)
        }
    }()
}

Now the goroutine can run past the request's cancellation but not
past five seconds. The trace context is still there. The user ID
is still there. The deadline is the one this work needs, not the
one the caller happened to set.

The pattern is two lines: detach to drop cancellation, then
re-apply the timeout that fits the detached work. Skip the second
line and you trade a missing-row bug for a leaking-goroutine bug.

Final-metric-on-shutdown

The same shape solves a related problem. You want to flush a
buffered counter or ship a last span when the process is shutting
down. The shutdown context you got from your signal handler is
already counting down. You do not want the metric flush to be
cancelled by that countdown — you want it to run to completion.

package telemetry

import (
    "context"
    "time"
)

func FlushOnShutdown(ctx context.Context, m *Meter) {
    detached := context.WithoutCancel(ctx)
    bounded, cancel := context.WithTimeout(detached,
        2*time.Second)
    defer cancel()

    if err := m.Flush(bounded); err != nil {
        slog.ErrorContext(bounded, "flush failed", "err", err)
    }
}

If the shutdown context already had a one-second deadline, this
function still gets two seconds because WithoutCancel strips the
parent deadline. That is the point: the flush gets two seconds
regardless of the caller's remaining budget.

You see this pattern in the standard library too. Look at
net/http.Server.Shutdown — it drains in-flight handlers under
the deadline you pass it, but if a handler spawns a background
goroutine for a final write, that goroutine has to detach and
re-time-box itself, because Shutdown only waits for active
connections, not goroutines you spawned out of a handler.

Where the values matter the most

There is one more place WithoutCancel earns its keep:
distributed tracing. Every OTel-based stack (and that includes
Datadog's and Honeycomb's tracers) pins the span context onto
context.Context. Any goroutine that runs without that context
has no parent span, and your trace tree shows the audit write as
an orphan, floating loose, untied to the request that triggered
it.

context.Background() orphans. context.WithoutCancel(ctx)
inherits.

func emitFinalSpan(ctx context.Context, kind string) {
    detached := context.WithoutCancel(ctx)
    bounded, cancel := context.WithTimeout(detached, time.Second)
    defer cancel()

    _, span := tracer.Start(bounded, "final."+kind)
    defer span.End()
    // ... record attributes, ship the span
}

The span starts as a child of the request span because the trace
context survived the detach. The flush is bounded by one second,
detached from the request lifecycle. If the request was cancelled
ten milliseconds ago, the span still ships and still appears under
the right trace tree.

That is where WithoutCancel earns its keep. It keeps the values
that make the goroutine traceable while dropping only the
cancellation, and forces you to bound runtime explicitly with a
fresh timeout.

context.WithDeadlineCause and friends

Go 1.21 shipped two related additions worth pairing with this
pattern. context.WithDeadlineCause and
context.WithTimeoutCause let you attach a cause error that is
returned by context.Cause(ctx) when the deadline fires. Default
Err() still returns DeadlineExceeded, but Cause() gives you
the specific reason you set.

This matters in the detached-and-rebounded pattern because you
want the log line to say "audit-flush deadline" rather than the
generic context deadline exceeded that every timeout in your
program produces:

import "errors"

var errAuditDeadline = errors.New("audit flush deadline (5s)")

func fireAndForgetAudit(ctx context.Context, e Event) {
    detached := context.WithoutCancel(ctx)
    bounded, cancel := context.WithTimeoutCause(detached,
        5*time.Second, errAuditDeadline)
    go func() {
        defer cancel()
        if err := writeAudit(bounded, e); err != nil {
            slog.ErrorContext(bounded, "audit write failed",
                "err", err,
                "cause", context.Cause(bounded))
        }
    }()
}

When the deadline fires you get a log line tagged
cause=audit flush deadline (5s). That tells your on-call
engineer the audit write is the bug. Without Cause, they're
grepping seventeen identical context deadline exceeded lines
for the right one.

When NOT to use WithoutCancel

The function is narrow on purpose. Reach for it only when the work
genuinely should outlive the caller. Two anti-patterns:

1. Hiding a slow handler. If your HTTP handler is taking too
long and you "fix" it by WithoutCancel-ing the slow call to make
it survive the client hanging up, you have not fixed anything. You
have moved the slow work off the request path while still serving
a stale response. Either the work belongs in the response (fix the
slowness) or it belongs in a job queue (do that properly).

2. Avoiding shutdown drain. If your handler spawns a long
goroutine and detaches it to escape Server.Shutdown's drain,
your shutdown is now lying about whether the service is finished.
Use a separate lifecycle (an errgroup at the service root, an
explicit WaitGroup) to track background work. Detach to keep
context values alive across a short final write — not to dodge
shutdown semantics.

If the work is "small, final, must complete, must keep the trace
context," WithoutCancel is the right reach. If the work is
"large, ongoing, parallel to the request," it belongs in a queue
or a separately-supervised goroutine pool, not behind a detached
context.

What to do with this on Monday

Three places to grep your codebase for, this week:

go func() inside HTTP handlers where the goroutine writes to a database, ships a metric, or emits a span. Check whether it passes the request context. If yes, ask whether that context being cancelled would lose data. If yes again, swap to WithoutCancel plus an explicit timeout.
context.Background() calls inside request-scoped code. Each one is a candidate for WithoutCancel(reqCtx) instead. The trace ID and tenant ID will start showing up in the downstream logs and your on-call rotation will thank you.
Shutdown paths that flush metrics or close exporters. The shutdown context's deadline is for the drain, not the flush. Detach and re-bound.

Pick one of those three this week and try the two-line pattern.
The trace IDs will start showing up in places they were missing,
and you will stop losing rows you thought you were writing.

If this was useful

The Complete Guide to Go Programming covers context end-to-end — values, cancellation, deadlines, the cause additions in Go 1.21, and the patterns for detaching, propagating, and bounding work that outlives its caller. It is part of Thinking in Go, the 2-book series — paired with Hexagonal Architecture in Go for putting these primitives behind clean ports and adapters in services that survive on-call.