DEV Community

Cover image for Defer Has 3 Performance Cliffs. Here's How to See Them
Gabriel Anhaia
Gabriel Anhaia

Posted on

Defer Has 3 Performance Cliffs. Here's How to See Them


You profile a hot path. The flame graph shows runtime.deferproc and runtime.deferreturn near the top, eating 8% of the CPU. You did not write either of those. You wrote defer mu.Unlock() and defer f.Close() and went home. Now someone on the team is asking why a function that does almost nothing is the most expensive line in the trace.

The answer is that defer has three implementations, not one. The fast one is open-coded defer. It landed in Go 1.14 and runs in around 6ns. The slow one routes through runtime.deferproc, allocates a _defer record, and costs about 35ns. Most of the time you are on the fast path. The compiler has rules for when it bails out, and once you fall off the cliff, every iteration of a tight loop pays the full cost.

Three patterns flip the switch. Each has a tool that makes it visible and a rewrite that gets you back.

What "open-coded defer" actually means

Before Go 1.14, every defer allocated a _defer record either on the heap or on a stack-managed list, and deferreturn walked that list at function exit. The proposal that landed in 1.14 (34481) replaced that for the common case: the compiler inlines the deferred call directly into each return path, tracks which defers fired with an 8-bit bitmap, and skips deferproc entirely. The cost drops from ~35ns to ~6ns per defer.

The optimization has hard preconditions. From the proposal:

  1. The defer must not appear in a loop in the control-flow graph.
  2. The function must contain at most 8 defers (one bit per defer in the deferBits byte).
  3. The function must not have too many exit points (the compiler emits the inlined return sequence at every exit).

Hit any of those and the function reverts to deferproc for every defer in the function, not just the offending one. That is the cliff. Walking off it is silent. The benchmark just gets slower.

Cliff 1: defer inside a hot loop

The most common version of this is a function that opens N files in a loop and defers Close each time:

func processAll(paths []string) error {
    for _, p := range paths {
        f, err := os.Open(p)
        if err != nil {
            return err
        }
        defer f.Close()
        if err := process(f); err != nil {
            return err
        }
    }
    return nil
}
Enter fullscreen mode Exit fullscreen mode

The famous bug is that no file closes until processAll returns. Open 50,000 paths and you exhaust the file-descriptor table. Fine, you knew that.

The quieter bug is that the defer sits inside the for loop. That disqualifies the whole function from open-coded defer. Every defer in processAll (including any other ones higher up) now goes through deferproc, allocates a _defer record, and runs the slow exit path. You can see it in go test -bench:

BenchmarkLoopDefer-8       1652  698.4 ns/op  240 B/op  10 allocs/op
BenchmarkLoopNoDefer-8     5142  234.1 ns/op    0 B/op   0 allocs/op
Enter fullscreen mode Exit fullscreen mode

Numbers vary by machine and Go version. The shape is the point: the version with defer inside the loop allocates per iteration; the rewrite has zero allocations.

The rewrite lifts cleanup into a helper that runs per iteration:

func processAll(paths []string) error {
    for _, p := range paths {
        if err := processOne(p); err != nil {
            return err
        }
    }
    return nil
}

func processOne(p string) error {
    f, err := os.Open(p)
    if err != nil {
        return err
    }
    defer f.Close()
    return process(f)
}
Enter fullscreen mode Exit fullscreen mode

processOne is a small function with one defer at function scope. It qualifies for open-coded defer. The file closes at the end of each iteration, and processAll itself has no defer at all so it pays nothing.

Cliff 2: too many defers in one function

The bitmap that tracks which defers fired is one byte. That is the constraint. The compiler-generated deferBits byte is a uint8: eight bits, eight defers. Add a ninth and the compiler falls back to the heap path for all of them.

A function that grew defers organically over time is the typical victim:

func handleRequest(ctx context.Context, w http.ResponseWriter,
                   r *http.Request) error {
    span, ctx := tracer.Start(ctx, "handleRequest")
    defer span.End()                  // 1

    timer := metrics.NewTimer("req")
    defer timer.Stop()                // 2

    conn, err := pool.Get(ctx)
    if err != nil { return err }
    defer pool.Put(conn)              // 3

    tx, err := conn.Begin(ctx)
    if err != nil { return err }
    defer tx.Rollback()               // 4

    lockCtx := lock.Acquire(r.UserID)
    defer lock.Release(lockCtx)       // 5

    cache := localCache.Snapshot()
    defer cache.Free()                // 6

    f, err := openTempFile(r)
    if err != nil { return err }
    defer os.Remove(f.Name())         // 7
    defer f.Close()                   // 8

    audit := audit.New(r)
    defer audit.Flush()               // 9 — cliff
    // ...
}
Enter fullscreen mode Exit fullscreen mode

Every defer here looks reasonable in isolation. Together, they push the function past the threshold and silently move it onto the slow path. Nobody counts defers per function in code review.

The compiler will tell you which kind it picked, with the right flag:

$ go build -gcflags='-d=defer'
./req.go:42:2: open-coded defer
./req.go:45:2: open-coded defer
./req.go:62:2: stack-allocated defer
Enter fullscreen mode Exit fullscreen mode

stack-allocated defer is the fallback. Once you see it, you know the threshold tripped.

The rewrite is to extract a coherent group of defers into a helper. The natural one here is the request-scoped resource bundle:

type reqResources struct {
    span   trace.Span
    timer  *metrics.Timer
    conn   *pool.Conn
    tx     *db.Tx
}

func acquireReqResources(
    ctx context.Context, r *http.Request,
) (*reqResources, func(), error) {
    rr := &reqResources{}
    cleanup := func() {
        if rr.tx != nil    { rr.tx.Rollback() }
        if rr.conn != nil  { pool.Put(rr.conn) }
        if rr.timer != nil { rr.timer.Stop() }
        if rr.span != nil  { rr.span.End() }
    }
    // ... acquire each, set fields, return on error
    return rr, cleanup, nil
}
Enter fullscreen mode Exit fullscreen mode

The handler now has one defer (cleanup) plus a couple of small ones it actually needs at the top level. Both functions stay under 8 and both qualify for open-coded defer.

Cliff 3: closures that capture loop variables

This one is sneakier. You are not deferring inside a loop. You are deferring a function literal that closes over a loop variable. Escape analysis sees the closure outlives the iteration and moves the captured variable to the heap. The _defer record itself may also escape because the closure is stored on the defer chain.

func processItems(items []*Item) {
    for _, it := range items {
        defer logCompletion(it)   // (a) plain call — fine
    }
    for _, it := range items {
        defer func() {            // (b) closure over `it`
            log.Printf("done: %s", it.ID)
        }()
    }
}
Enter fullscreen mode Exit fullscreen mode

(a) is on the slow path because of cliff 1, but its argument is at least evaluated and captured by value when the defer statement runs. (b) adds a problem: the closure forces the captured variable onto the heap. Run with -gcflags='-m=2' and you get the receipt:

$ go build -gcflags='-m=2' ./...
./loop.go:14:6: moved to heap: it
./loop.go:15:9: func literal escapes to heap
./loop.go:16:25: it.ID escapes to heap
Enter fullscreen mode Exit fullscreen mode

Three allocations per iteration where there should be zero. On a 10K-item slice that is 30K extra heap allocations.

The fix is the same shape as cliff 1: pull the body into a function so the defer lives at function scope and there is no closure left to capture anything.

func processItems(items []*Item) {
    for _, it := range items {
        processOne(it)
    }
}

func processOne(it *Item) {
    defer logCompletion(it)
    // ...
}
Enter fullscreen mode Exit fullscreen mode

processOne has one defer at function scope. Open-coded. No loop. No closure. it stays on the stack as a parameter. Run -gcflags='-m=2' again and the "moved to heap" lines are gone.

How to see the cliff before production does

Three flags get you everything:

go build -gcflags='-d=defer'      # which kind of defer
go build -gcflags='-m'            # escape analysis decisions
go test -bench=. -benchmem        # allocs and ns/op per case
Enter fullscreen mode Exit fullscreen mode

Run the first two on the package you suspect. Look for stack-allocated defer lines. Look for moved to heap lines that point at variables captured by deferred closures. Then write the benchmark that compares the suspect path with the rewritten one. The -benchmem column tells you whether the rewrite worked.

The cost of getting this wrong is not academic. A function that does 35ns of useful work and 35ns of deferproc overhead is half as fast as it should be. Multiply by a few thousand QPS and you have lost a CPU core.

defer is still the right primitive. The runtime is built around it, and the open-coded path is effectively free in most code. The discipline is keeping each function small enough, loop-free enough, and closure-light enough for the compiler to take the optimization. Three rules:

  1. Defer at function scope, never inside a loop. Push the body into a helper.
  2. Stay under 8 defers per function. Bundle related cleanups behind one closer.
  3. Defer a plain call, not a closure that captures loop variables.

Apply those three and your pprof profile stops showing deferproc near the top.


If this saved you a regression

The Complete Guide to Go Programming covers all three end-to-end: escape-analysis output, the open-coded defer rules, and the runtime mechanics that decide stack vs. heap. Most performance surprises in Go come from the same place this one does: a small change in a function's shape flipping the compiler off a fast path you did not know you were on.

The companion book, Hexagonal Architecture in Go, takes the same care to the design layer: how to structure services so the small, hot helpers stay small and the large, mixed-concern handlers do not grow nine defers by accident.

Thinking in Go — the 2-book series on Go programming and hexagonal architecture

Top comments (0)