- Book: The Complete Guide to Go Programming
- Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
- My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
- Me: xgabriel.com | GitHub
You added a tiny functional helpers package to the repo last
quarter. Map, Filter, Reduce, GroupBy. Twelve lines
each, all generic. The team loved it. Six months later the CI
job that used to finish in a handful of minutes is now taking
nearly twice that. Nobody admits to a culprit because nobody
changed anything obvious. You start bisecting and the line
you land on is the import of that helpers package in the
third service that pulled it in.
Generics in Go are not free. They look free at the call site,
which is the whole point. The bill arrives later, in go build
wall-clock time and in the size of your release binaries. If
you ship a library that is generic in many places and is
imported into many packages, the bill grows with both numbers.
The fix is usually not to delete the generic. It is to put a
non-generic boundary between the generic and the rest of the
code, so the compiler stops paying for the type parameter at
every import site.
The shape Go's compiler picks
Go does not do full monomorphisation the way C++ and Rust do.
It does not give every distinct instantiation its own machine
code. It also does not do full dictionary passing the way OCaml
or some Haskell implementations do. It picks a hybrid that the
proposal calls GC-shape stenciling with dictionaries. The
design doc lives in the proposal repo at
generics-implementation-gcshape.md
and is the source of truth for what the compiler actually
does.
The short version. The compiler groups instantiations by their
GC shape. Two types share a GC shape if they have the same
size, alignment, and pointer layout. int32 and int32
obviously share. *Order and *Invoice share, because both
are single-word pointers as far as the GC is concerned. A
struct with two ints and a struct with one int and a float32
of the same width also share, if the pointer maps line up.
For each GC shape used by any instantiation in your program,
the compiler emits one chunk of code. It also emits a
dictionary per concrete instantiation, which carries the
information the shared code body cannot see: the type
descriptor, method tables for type-asserting calls, and so on.
Two consequences fall out of this. First, instantiating
Filter[int] and Filter[int32] produces two stencils,
because they are different shapes (size 8 vs 4 on most
platforms). Second, instantiating Filter[*Order] and
Filter[*Invoice] shares one stencil but two dictionaries.
The shared body is small. The per-shape body is not. If your
helpers package is generic over five or six shapes (int sizes,
strings, pointer-to-struct, struct-of-two-pointers,
struct-of-one-pointer-and-one-non-pointer) and is imported
into thirty packages, the compiler is doing real work.
Measuring the actual cost
Two numbers matter, and they are easy to read. Compile time
and binary size. Run a clean build with the timer on:
go clean -cache && time go build ./...
Then a stripped-binary size:
go build -ldflags='-w -s' -o /tmp/svc ./cmd/svc
ls -l /tmp/svc
For the per-instantiation breakdown, the tool you want is
go tool objdump. Build with -gcflags='-m=2' first to see
what got inlined and what did not:
go build -gcflags='-m=2' ./... 2> build.log
grep -E '^.*: inlining call to|^.*: cannot inline' build.log
Then disassemble and grep for the generic function name:
go tool objdump -s 'pkg/helpers.Filter' /tmp/svc | head -40
Each distinct function symbol that comes back is a stencil the
compiler had to emit. If you see one symbol, you got the share.
If you see five, you got five stencils. Multiply that by the
helper functions in your library and you have a number you
can defend in code review.
A tiny benchmark module lets you watch the compile time move
in isolation:
package helpers
func Map[T any, U any](xs []T, f func(T) U) []U {
out := make([]U, len(xs))
for i, x := range xs {
out[i] = f(x)
}
return out
}
func Filter[T any](xs []T, p func(T) bool) []T {
out := make([]T, 0, len(xs))
for _, x := range xs {
if p(x) {
out = append(out, x)
}
}
return out
}
func Reduce[T any, A any](
xs []T, seed A, f func(A, T) A,
) A {
a := seed
for _, x := range xs {
a = f(a, x)
}
return a
}
Add a _test.go that calls each helper with a different
concrete type per test. Then build the test binary:
go clean -cache
time go test -c -o /tmp/helpers.test ./helpers
ls -l /tmp/helpers.test
go tool objdump -s 'helpers.(Filter|Map|Reduce)' \
/tmp/helpers.test | grep '^TEXT' | wc -l
Illustrative numbers from a synthetic benchmark on a local
machine (not a peer-reviewed measurement — yours will differ
in absolute terms; the shape of the change is what matters):
go test -c 3.42s user 0.61s system 192% cpu 2.094 total
-rwxr-xr-x 1 user staff 3884240 /tmp/helpers.test
12
Twelve TEXT entries for three generic helpers means four
shapes in use, each emitting its three helpers. Rerun after
adding a fourth concrete shape in the tests and the count
climbs in step. That is the bill.
Pattern 1: do not propagate the parameter past the boundary
Most generic-heavy libraries land in one of two designs. The
first is a top-level helper that takes [T any] and never
itself stores anything generic. It does work and returns. The
second is a generic container that survives across calls,
holds state, and exposes methods. The second is more expensive
because every method on the container is itself generic and
re-stencilled per shape.
The fix for the second one is an interface boundary that does
not carry the type parameter. The cache below is generic in
the value type, but readers and writers cross a non-generic
seam:
type cache[V any] struct {
m map[string]V
}
func (c *cache[V]) Get(k string) (V, bool) {
v, ok := c.m[k]
return v, ok
}
func (c *cache[V]) Put(k string, v V) {
c.m[k] = v
}
type AnyCache interface {
GetAny(k string) (any, bool)
}
Internal callers still use cache[V] and pay nothing for
the boxing. External callers (observability, debugging, an
admin endpoint, a test harness) go through AnyCache and the
compiler stops emitting per-shape code at every one of those
import sites. The price is a single any at the seam, paid
once on the way out.
This is the same trick the standard library uses around
encoding/json. The decoder is concrete; the value comes back
as any because the wire shape is genuinely open. You are
applying the same pattern internally, on a much smaller scale,
to keep compile time linear in your usage.
Pattern 2: constrain to one concrete type when the generic is theatre
Half of the generics in real codebases are used at exactly one
type. Repository[Order]. Service[User]. Handler[Request].
The square brackets are there because somebody anticipated
future polymorphism that never showed up. The compiler still
stencils. The reader still pays the cognitive overhead.
type Repository[T any] struct {
db *sql.DB
}
func (r *Repository[T]) FindByID(id int) (T, error) {
var zero T
return zero, nil
}
If T is Order everywhere and only Order, drop the type
parameter and write the concrete version. The diff is small.
The compile time is smaller. The next reader does not have to
trace the parameter through three layers of code to confirm
that yes, it really is always Order:
type OrderRepository struct {
db *sql.DB
}
func (r *OrderRepository) FindByID(id int) (Order, error) {
return Order{}, nil
}
The audit for this one is a short pipeline. Find every use
site of the generic type, strip the type-parameter declaration
lines ([T any], [T comparable], etc.), and see how many
distinct concrete instantiations remain:
git grep -nE '\bRepository\[' -- '*.go' | \
awk -F'[][]' '{print $2}' | \
grep -vE '\b(any|comparable)\b' | \
sort -u
If the output is one line, the generic is theatre. Delete it.
If it is two lines and the second is a test stub, the generic
is theatre. Delete it.
Pattern 3: use any deliberately at known-broad seams
The careful exception. There are seams in a service where you
genuinely do not know the type and pretending otherwise costs
more than it saves. The wire boundary into a JSON service. A
generic logging sink. An event bus that carries dozens of
event shapes nobody wants to enumerate as a type set.
At those seams, any is the honest answer. The compiler emits
one path. The runtime pays a single boxing per call. You
recover the type with a type assertion or a slog-style
attribute set, and you do not pay per-shape stencilling on the
way through.
The mistake is using any inside a generic. The mistake is
also using a generic at a wire seam. Both are the same bug
in opposite directions: the type parameter and the empty
interface are tools for different problems and they fight each
other when you mix them.
A working rule. If the function knows what T is, write the
generic and constrain it tightly. If the function is on a
boundary where the type is genuinely open, take any and
move on. Do not let a generic API leak across a boundary just
because the call site can spell [Order].
What the bill looks like after
Run the same time go build ./... and objdump | grep TEXT | before and after adding the non-generic seam. On the
wc -l
synthetic benchmark above, dropping from four shapes to one
shape across the import graph cut the stencil count from 12 to
3 and shaved roughly 10–25% off the cold-build wall clock for
the test binary. Workload-specific — the only number that
matters is the diff between your two runs.
If this was useful
Compile time, binary size, and the way the runtime ferries
type information are the kind of thing that go from invisible
to load-bearing the moment a service grows past a single
binary. The Complete Guide to Go Programming covers the
generics implementation, the constraint vocabulary, and the
escape-analysis behaviour that interacts with generics in
enough detail that you can reason about a time go build
regression without bisecting commits at random. Hexagonal
Architecture in Go is the design-side counterpart: how to
draw the seams in a service so the generic helpers stay in the
inner hexagon and the boundaries stay narrow, which is the
same trick this post recommends scaled to whole modules.

Top comments (0)