sync.Pool: The Optimization You Reach for Last, Not First

#go #performance #optimization

Book: The Complete Guide to Go Programming
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

A new hire on a Go team opens a PR. Title: "Pool everything." The diff wraps every short-lived struct allocation in a sync.Pool Get/Put. The benchmark in the PR description shows a small win on one endpoint. Other endpoints regress. None of it is caught in review, because the pooled-everything code looks like exactly the kind of thing senior reviewers skim past.

That is the sync.Pool story in most Go codebases. Someone reads a blog post about how bytes.Buffer reuse cut allocation pressure for fmt, decides allocation is the enemy, and starts pooling things that should never have been pooled. The result is code that is harder to read and slower than the version it replaced.

sync.Pool is a useful primitive. It is also one of the most misapplied tools in the standard library. When does it earn its keep, when does it not, and what is the runtime actually doing with the objects you put in it? Those are the three questions worth answering before you reach for it.

What sync.Pool Actually Is

A sync.Pool is a free list with two unusual properties.

First, it is per-P (per logical processor in the Go scheduler). Get and Put usually hit a thread-local stack with no contention. That is what makes it fast.

Second, the runtime drops the contents of every pool on every GC cycle. Read that twice. The standard library docs say it plainly: pooled objects "may be removed automatically at any time without notification" (pkg.go.dev/sync#Pool). In practice, the runtime clears them at the start of each GC. Anything you put in the pool between two GCs is reclaimable on the next sweep.

That second property is a feature. The pool is not a leak waiting to happen. The GC keeps it bounded. Pooling buys you reuse between GC cycles, nothing more — not a long-lived cache.

The objects you should pool are the ones that fit all four of these:

Allocated and discarded many times per second.
Roughly the same size each time (so reuse actually matches).
Big enough that the allocation cost dominates the pool overhead.
Not retained anywhere outside the Get/Put scope.

If any one misses, the pool is probably costing you throughput.

Three Benchmarks, Three Outcomes

Same machine, same Go version, same GOMAXPROCS. Three workloads, three different verdicts on whether pooling helps. The code below is what you would write to compare them yourself with go test -bench=. -benchmem.

Case 1: bytes.Buffer reuse — pool wins

This is the canonical good case. A buffer that grows to a few KB on every request, gets read once, and is thrown away. Reusing the underlying slice avoids both the allocation and the zeroing the runtime would otherwise do.

package poolbench

import (
    "bytes"
    "fmt"
    "sync"
    "testing"
)

var bufPool = sync.Pool{
    New: func() any { return new(bytes.Buffer) },
}

func renderNoPool(n int) []byte {
    var buf bytes.Buffer
    for i := 0; i < n; i++ {
        fmt.Fprintf(&buf, "row=%d val=%d\n", i, i*7)
    }
    return append([]byte(nil), buf.Bytes()...)
}

func renderPooled(n int) []byte {
    buf := bufPool.Get().(*bytes.Buffer)
    buf.Reset()
    defer bufPool.Put(buf)
    for i := 0; i < n; i++ {
        fmt.Fprintf(buf, "row=%d val=%d\n", i, i*7)
    }
    return append([]byte(nil), buf.Bytes()...)
}

func BenchmarkRenderNoPool(b *testing.B) {
    for i := 0; i < b.N; i++ {
        _ = renderNoPool(200)
    }
}

func BenchmarkRenderPooled(b *testing.B) {
    for i := 0; i < b.N; i++ {
        _ = renderPooled(200)
    }
}

Expect the pooled version to be measurably faster on this kind of formatted-write loop, and to drop the allocation count substantially. Run the benchmark on your own machine to see the magnitude on your hardware. The buffer grows once to its steady-state size on its first use, then every subsequent Get returns a buffer that already has the right capacity. Reset only moves the length back to zero; it does not free the underlying slice.

This is the case the standard library has in mind. fmt, encoding/json, and net/http all pool buffers internally for exactly this reason.

Case 2: small struct — pool loses

Now the same pattern with a struct that is small enough that the allocator handles it on a fast path. The pool overhead becomes pure cost.

type Event struct {
    ID     int64
    Kind   uint8
    Source uint16
}

var eventPool = sync.Pool{
    New: func() any { return new(Event) },
}

func makeNoPool(id int64) *Event {
    e := &Event{ID: id, Kind: 1, Source: 42}
    return e
}

func makePooled(id int64) *Event {
    e := eventPool.Get().(*Event)
    e.ID, e.Kind, e.Source = id, 1, 42
    return e
}

func BenchmarkEventNoPool(b *testing.B) {
    for i := 0; i < b.N; i++ {
        e := makeNoPool(int64(i))
        eventPool.Put(e)
    }
}

func BenchmarkEventPooled(b *testing.B) {
    for i := 0; i < b.N; i++ {
        e := makePooled(int64(i))
        eventPool.Put(e)
    }
}

For a struct that fits the Go allocator's tiny-object fast path (see the comment block in runtime/malloc.go for the exact threshold), expect the pooled path to be slower than the plain allocation, often by a meaningful margin. Escape analysis frequently puts a struct of this shape on the stack anyway, where the cost is effectively zero.

Run the benchmark with -gcflags='-m' to see escape analysis decisions. If the no-pool version reports does not escape, the allocation never happened on the heap to begin with. Pooling cannot beat free.

Case 3: variable-size []byte — depends

This is the case where most readers get burned. A workload like decoding network frames, where the allocation looks regular at first glance but the sizes drift over time.

var bytePool = sync.Pool{
    New: func() any {
        b := make([]byte, 0, 1024)
        return &b
    },
}

func decodeNoPool(payload []byte) int {
    buf := make([]byte, 0, len(payload))
    buf = append(buf, payload...)
    return len(buf)
}

func decodePooled(payload []byte) int {
    bp := bytePool.Get().(*[]byte)
    buf := (*bp)[:0]
    buf = append(buf, payload...)
    n := len(buf)
    *bp = buf
    bytePool.Put(bp)
    return n
}

If payload is reliably under 1 KiB, the pool wins because every Get returns a buffer with enough capacity. If payload occasionally hits 1 MiB, the pool now holds onto a 1 MiB buffer that gets reused for tiny payloads, wasting memory until the next GC frees it. Worse, if you do not bound the size on Put, one outlier request can permanently inflate every pooled buffer until GC clears them.

The fix in production is to drop oversized buffers before returning them to the pool:

const maxPoolBuf = 64 * 1024

func decodePooledSafe(payload []byte) int {
    bp := bytePool.Get().(*[]byte)
    buf := (*bp)[:0]
    buf = append(buf, payload...)
    n := len(buf)
    if cap(buf) <= maxPoolBuf {
        *bp = buf
        bytePool.Put(bp)
    }
    return n
}

net/http does this with its read buffers. The pattern is in the server source: buffers above a threshold are dropped instead of pooled, exactly to avoid the inflation problem.

What to Check Before You Reach for Pool

Before adding sync.Pool to a hot path, run through this in order. Stop at the first one that returns a real number.

Does the allocation actually escape to the heap? Build with -gcflags='-m'. If the variable does not escape, the allocator is not your problem.
What does the allocation profile look like? Run go test -bench -memprofile mem.out and go tool pprof -alloc_objects mem.out. The path you are about to pool should be in the top 5. If it is in the long tail, pooling will not move the throughput needle.
Is the alloc size large and consistent? Print the size distribution. If 95% of allocations are within 2x of each other, pooling can help. If the spread is 100x, pool sizing becomes its own bug source.
Is there a simpler fix? Pre-sizing a slice with make([]T, 0, n), reusing a bytes.Buffer inside a loop, or restructuring to avoid the allocation entirely will often beat a pool with less code.

Pool only after those four are exhausted. The runtime's allocator is good, and Go has shipped sub-millisecond GC pause targets as a documented design goal since Go 1.5. The bar for "this allocation is hurting me enough to add a pool" is higher than most people assume.

The Mental Model

Think of sync.Pool as a hint to the runtime: "I am about to allocate something that I just discarded a moment ago and that has the same shape." The runtime takes that hint, holds the object on a per-P stack, and gives it back to you on the next Get if it has not run a GC in between. If a GC ran, the hint expired and you allocate fresh.

That mental model gets the cases right. A buffer reused inside one request fits the hint and pooling wins. A 16-byte struct is too small for the round trip to matter. A buffer whose size varies wildly burns memory because the hint is wrong half the time.

Measure before and after. When you do pool something, write down in a comment why the alternative did not work. Future-you will need that note when the workload shifts and the pool becomes the wrong shape.

If this saved you a regression

Allocation, escape analysis, and the runtime details that decide whether your code uses the stack or the heap are covered end-to-end in The Complete Guide to Go Programming. If you have ever stared at a pprof profile and wondered why your "obviously cheap" function shows up at the top, that is the chapter to read.

The companion book, Hexagonal Architecture in Go, takes the same care to the design layer: how to structure services so that the hot paths you eventually need to optimize are isolated from the domain logic that should never know about a pool.