Gabriel Anhaia

Posted on May 5

Reading -gcflags='-m=2' Output: What the Go Compiler Tells You About Inlining

#go #performance #compiler

Book: The Complete Guide to Go Programming
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

You have a Go service that is "fast enough" until the day a profile says otherwise. Someone on the team says PGO will fix it. Someone else says the function should already be inlined. A third person says the compiler probably can't inline through the interface call. The tool that settles all three sits in the toolchain you've been ignoring: -gcflags='-m=2'.

The flag is older than most Go careers, and the output is unforgiving. The lines look like the compiler muttering to itself, because in a sense it is — those are the inliner's notes about what got inlined and what didn't, plus the escape-analysis decisions that share the stream. Reading them once changes how you write Go forever, mostly because you stop guessing.

What the flag actually does

-gcflags='-m' asks the compiler to print its optimization decisions. -m=2 doubles the verbosity: it adds the reasons the inliner kept or rejected each call, plus the inlining-cost budget for every function it considered. -m=3 exists and adds devirtualization decisions and PGO call-site weights, but you rarely need it on a normal day.

Run it on the smallest possible package:

go build -gcflags='-m=2' ./pkg/cache 2>&1 | head -40

The output mixes three kinds of lines. Once you can tell them apart, the rest is mechanical.

./cache.go:14:6: can inline (*Cache).Get with cost 18 as: ...
./cache.go:23:6: cannot inline (*Cache).Set: function too complex (cost 102 exceeds budget 80)
./cache.go:31:7: inlining call to (*Cache).Get
./cache.go:42:13: parameter k leaks to {storage for ...}
./cache.go:55:9: moved to heap: buf
./cache.go:60:6: devirtualizing fn.(io.Writer).Write to *bytes.Buffer

can inline X with cost N is the inliner saying this function is small enough to be a candidate. The cost is a proxy for AST node count; the inliner's default budget is around 80 in recent releases, which is why "cost 18" is a green light and "cost 102" is the rejection. inlining call to X is the inliner doing it at a specific call site. moved to heap: x is escape analysis admitting a local has to live longer than its frame. devirtualizing fn is PGO turning an interface call into a direct one.

Three different passes share a single output stream with nothing separating them. That is most of what makes -m=2 look impenetrable.

Pattern 1: PGO devirtualization, made visible

PGO has a user-facing story (collect a profile, rebuild, watch the binary get faster). -m=2 is where you watch it work.

Take a small program with an interface call site that almost always points at the same concrete type:

package main

import (
    "fmt"
    "os"
)

type Writer interface {
    Write(p []byte) (int, error)
}

type counter struct {
    n int
}

func (c *counter) Write(p []byte) (int, error) {
    c.n += len(p)
    return len(p), nil
}

func emit(w Writer, msg string) {
    w.Write([]byte(msg))
}

func main() {
    var w Writer = &counter{}
    for i := 0; i < 1_000_000; i++ {
        emit(w, "x\n")
    }
    fmt.Fprintln(os.Stderr, "done")
}

Without PGO, the call inside emit goes through the interface's dispatch table on every iteration. Build with -m=2:

go build -gcflags='-m=2' ./...

You will see can inline emit with cost ... but no devirtualizing line on w.Write. The compiler does not know which concrete type w is at compile time.

Now collect a CPU profile from a run, drop it next to the package as default.pgo, rebuild with -m=2. The line you are looking for shows up:

./main.go:18:9: devirtualizing w.Write to *main.counter

That is the inliner saying the profile told me this call site is dominated by *counter, so I am replacing the interface dispatch with a direct call to (*counter).Write plus a fallback. Once the call is direct, the inliner can reason about whether to inline it. With a small concrete method like the one above, you may also see inlining call to (*counter).Write on the next line. Devirtualization opens the door; whether inlining follows depends on the method's cost.

Two things to watch. The PGO devirtualizer is conservative about which sites it touches; the Go PGO guide notes that devirtualization fires only when one type accounts for a large share of the call site's traffic in the profile. And the fallback branch still pays a comparison on the type, so devirtualization speeds up the hot case at a small cost to the cold one. If your interface call is roughly 50/50 between two types, PGO will leave it alone and you will see no devirtualizing line.

Pattern 2: closures and the escape-analysis surprise

Write a function that builds a closure, and you have written something the inliner and the escape analyzer have to agree on. They often do not.

package count

func makeAdder(x int) func(int) int {
    return func(y int) int {
        return x + y
    }
}

func sum(n int) int {
    add := makeAdder(10)
    total := 0
    for i := 0; i < n; i++ {
        total = add(i)
    }
    return total
}

Build with -m=2:

./count.go:3:6: can inline makeAdder with cost 10 as: ...
./count.go:4:9: can inline makeAdder.func1 with cost 6 as: ...
./count.go:4:9: func literal escapes to heap
./count.go:3:14: moved to heap: x
./count.go:9:6: can inline sum with cost 31 as: ...
./count.go:10:18: inlining call to makeAdder
./count.go:10:18: inlining call to count.makeAdder.func1
./count.go:13:13: inlining call to count.makeAdder.func1

The interesting part is the third line. func literal escapes to heap means the closure value is heap-allocated. The fourth line, moved to heap: x, means the captured variable is too. Both are forced because, in general, a returned closure outlives the frame that built it. The escape analyzer cannot reason about the specific call site in sum where the closure obviously does not escape — that level of context-sensitivity is beyond what the analyzer attempts.

But notice the inlining call to makeAdder further down. Once the inliner inlines makeAdder into sum, the closure and its captured x are no longer crossing a frame boundary, and a later pass can stack-allocate them. Whether it actually does depends on the Go version and the surrounding context, which is why you check with -m=2 and -m=3 instead of guessing.

The rule of thumb that survives most refactors: closures returned from one function and called inside the same package, in a tight loop, can avoid heap allocation if the compiler decides to inline the constructor. If you write the same closure pattern across package boundaries, the inliner is much more likely to give up, and the heap cost stays. -m=2 is the cheapest way to confirm which path the compiler took on your code.

Pattern 3: -l=4 and the "deep inlining" question

-l=4 is the flag that makes the inliner more aggressive. It raises the cost budget and enables mid-stack inlining of larger functions. It also has to be the right answer to a specific question, because turning it on globally is rarely free.

go build -gcflags='-m=2 -l=4' ./...

You will see more inlining call to X lines, and some functions that previously printed cannot inline F: function too complex will now inline. On a CPU-bound microbenchmark with a small hot loop, this can show up as a measurable speedup. On a service binary, it shows up as larger compiled output, longer compile times, and sometimes a slowdown from instruction-cache pressure.

There are two situations where -l=4 earns its keep.

The first is when you have profiled a hot path, identified a wrapper function that the default budget refuses to inline, and you want to confirm that the wrapper is the bottleneck before you rewrite it. Build the package with -l=4, run the benchmark, and compare to the default build. If the win is significant, you have your answer about the wrapper. If the win is zero, the wrapper was not the problem and you have saved yourself a rewrite.

The second is when you ship a tight library (think serializer, hash, or parser) where the user's hot path threads through a chain of small helpers that each individually fit the budget but whose chain does not. -l=4 lets the chain collapse into a single inlined body. The Go standard library's encoding/binary and parts of runtime are built this way; your own library can be too, when it is the right shape.

A few traps. -l=4 can amplify the binary-size cost more than the speed cost — you can find yourself adding several percent to the binary for a fractional speedup, which is rarely the right trade. It can also push a function over the inliner's recursion-detection thresholds and produce less inlining than the default would. Always compare -m=2 output between the two builds, not just the benchmark numbers. The number is the headline; the -m=2 diff is the explanation.

How to read a real `-m=2` session

The workflow that survives the most ad-hoc questions:

Pick the smallest package that contains the hot path. Don't run -m=2 on the whole module — the output buries you.
Build with go build -gcflags='-m=2' ./pkg/... and pipe to a file. The file is the artifact you grep, not the terminal.
Grep for inlining call to and the names of the functions you expect to be hot. Confirmed inlines are a good sign. Missing inlines are a question.
Grep for cannot inline to find the rejected candidates. The reason is on the same line: function too complex, recursive, marked go:noinline, or a rejection tied to a runtime/unsafe call. The first two are actionable. The last two usually are not.
Grep for moved to heap, escapes to heap, parameter ... leaks on the function in question. These are escape-analysis decisions, separate from inlining, but they share the output stream because they share a pass.
If PGO is on, grep for devirtualizing to confirm the profile actually changed call sites. No devirtualizing lines mean your profile did not concentrate enough traffic on one concrete type at any call site to clear the threshold.

The output has not changed shape much across recent Go versions, and the inliner-cost numbers (with cost N) are stable enough across releases to read year over year. Treat the format as documentation by example: the cmd/compile source is the source of truth when a line confuses you, and grepping the inliner package for the message string lands on the case in the compiler that emitted it.

What changes about how you write Go

Reading -m=2 once a quarter, on the package that pprof points at, changes a few habits. You stop reaching for interface{} parameters in hot paths. You write smaller helper functions so the inliner has more it can chain. You stop writing var foo = func() ... at package scope when a plain function would do.

The flag is also the only honest way to test a PGO profile. If a profile is supposed to inline a hot wrapper or devirtualize an interface call, the -m=2 output is where the proof lives. If the lines are not there, the profile is not doing what you thought.

The compiler is willing to explain itself. Most Go developers never ask.

If this was useful

The compiler's mental model covers three things: inlining budgets, escape analysis, and call-site devirtualization. It's one of the parts of Go that most production engineers learn from a stack of blog posts and never assemble into a coherent picture. The Complete Guide to Go Programming walks through the inliner, the escape analyzer, and the runtime that depends on both, in the order you actually need them when you are reading a profile.

The companion book, Hexagonal Architecture in Go, sits one layer up: how to structure a service so the hot paths you eventually optimize are isolated from the domain code that should never need to think about closures or inlining cost.

DEV Community

Reading -gcflags='-m=2' Output: What the Go Compiler Tells You About Inlining

What the flag actually does

Pattern 1: PGO devirtualization, made visible

Pattern 2: closures and the escape-analysis surprise

Pattern 3: -l=4 and the "deep inlining" question

How to read a real `-m=2` session

What changes about how you write Go

If this was useful

Top comments (0)

What the flag actually does

Pattern 1: PGO devirtualization, made visible

Pattern 2: closures and the escape-analysis surprise

Pattern 3: -l=4 and the "deep inlining" question

How to read a real -m=2 session

What changes about how you write Go

If this was useful

How to read a real `-m=2` session