James Lee

Posted on May 19

Go Compiler & defer: Bootstrap, Three defer Implementations, panic/recover & Closures

#architecture #computerscience #go #programming

Go's compiler is written entirely in Go — a self-hosting compiler that handles everything from frontend parsing to backend code generation. In this article we'll trace the program lifecycle from the very first instruction, dig into how defer is implemented in three different ways, and cover the panic/recover model, build tags, and the infamous closure-in-loop trap.

1. The Go Compiler Pipeline

Source (.go)
    ↓
Lexer / Parser       → AST
    ↓
Type Checker         → typed AST
    ↓
IR (SSA)             → optimizations, escape analysis
    ↓
Code Generation      → machine code
    ↓
Linker               → binary

The entire pipeline — from frontend to backend — is implemented in Go itself. No C, no LLVM (by default).

Performance Tooling

Before optimizing, always measure:

Tool	What it shows
`go tool pprof`	CPU, memory, goroutine profiles — µs-level code hotspots
`go tool trace`	Runtime events: goroutine scheduling, GC, netpoller — ns-level

Optimization priority:

Storage layer (ms gains) > Business logic (µs gains) > Low-level code (ns gains)

Optimization workflow:

Load test with realistic traffic
pprof → identify CPU/memory hotspots
Fix: async, cache, algorithm change
benchmark → verify local improvement
Repeat load test → check p95 latency

2. Program Bootstrap: From `rt0_amd64` to `main.main`

Your Go program does not start at main.main. It doesn't even start at runtime.main. The real entry point is deep in the runtime assembly.

runtime.rt0_amd64          ← actual binary entry point
    ↓
runtime.rt0_go             ← determine CPU core count + physical page size
    ↓
runtime.schedinit()        ← initialize: scheduler, stack allocator,
    ↓                         memory allocator, GC
runtime.newproc(main)      ← create main goroutine, push to P's LRQ
    ↓
runtime.mstart()           ← start M0, enter scheduling loop (never returns)
    ↓
runtime.main()             ← main goroutine starts here

`runtime.main()` execution order:

func main() {
    // 1. Set max stack size: 1GB (64-bit) / 250MB (32-bit)
    maxstacksize = 1000000000

    // 2. Start sysmon background thread (GC, preemption, netpoll)
    systemstack(func() { newm(sysmon, nil) })

    // 3. Initialize runtime packages
    runtime_init()

    // 4. Enable GC background workers
    gcenable()

    // 5. Run all init() functions in imported packages
    main_init()

    // 6. Run user's main.main()
    main_main()

    // 7. Exit
    exit(0)
}

Key difference: Non-main goroutines return to goexit when done. The main goroutine calls exit(0) — terminating the entire process immediately.

3. `defer`: Three Implementation Strategies

defer looks like a simple "run this at function exit" mechanism. The reality is more nuanced — Go uses three different implementations depending on the context, each with different performance characteristics.

Why not just insert a function call at return?

Because defer can appear inside conditionals and loops, the compiler cannot always statically determine how many defers exist or which ones will execute. This makes a purely compile-time solution insufficient.

Strategy 1: Heap Allocation (General Case)

Each defer creates a _defer struct allocated on the heap, chained into the goroutine's defer linked list.

// Compiler transforms:
defer foo()
// Into:
deferproc(foo)   // allocate _defer on heap, push to G._defer list
...
deferreturn()    // at function exit: walk list, execute defers in LIFO order

type _defer struct {
    siz       int32      // size of arguments + return values
    sp        uintptr    // stack pointer at defer site
    pc        uintptr    // caller's program counter
    fn        *funcval   // the deferred function
    link      *_defer    // next defer in chain (linked list)
}

type g struct {
    _defer *_defer       // head of this goroutine's defer list
}

Cost: Heap allocation per defer call. Slowest strategy.

Strategy 2: Stack Allocation (Go ≥ 1.13)

When the compiler can prove a defer is not in a loop and the number is bounded, it allocates the _defer struct on the stack instead of the heap.

// Compiler allocates _defer directly in the function's stack frame
t := deferstruct(stksize)   // stack-allocated _defer
// ... initialize fields ...
deferreturn()

Cost: No heap allocation. Significantly faster than heap strategy.

Strategy 3: Open-Coded (Go ≥ 1.14, Most Common)

When all conditions are met, the compiler inlines the deferred calls directly at each return site, using a bitmask (deferBits) to track which defers should fire.

Conditions for open-coded defer:

Compiler optimizations not disabled (-gcflags "-N" not set)
≤ 8 defers in the function
num_defers × num_returns ≤ 15
No defer inside a loop

// Source:
defer f1(a1)
if cond {
    defer f2(a2)
}

// Compiler generates:
deferBits = 0b00000000
deferBits |= 1 << 0     // f1 is always deferred → bit 0 set
_f1, _a1 = f1, a1
if cond {
    deferBits |= 1 << 1 // f2 conditionally deferred → bit 1 set
    _f2, _a2 = f2, a2
}

// At every return site (reverse order):
if deferBits & (1<<1) != 0 {
    deferBits &^= (1 << 1)
    _f2(_a2)
}
if deferBits & (1<<0) != 0 {
    deferBits &^= (1 << 0)
    _f1(_a1)
}

Cost: Near-zero — just a few bit operations and direct calls. No allocation.

Not truly zero-cost: Arguments are evaluated and copied to the stack at the defer site. Conditional defers still need the deferBits check at runtime.

defer Gotchas

Gotcha 1: Arguments are evaluated immediately

// Prints "0s" — time.Since() is evaluated when defer is registered
func main() {
    startedAt := time.Now()
    defer fmt.Println(time.Since(startedAt))
    time.Sleep(time.Second)
}

// Prints "1s" — time.Since() is evaluated when the closure runs
func main() {
    startedAt := time.Now()
    defer func() { fmt.Println(time.Since(startedAt)) }()
    time.Sleep(time.Second)
}

Gotcha 2: Not all builtins can be deferred directly

// ❌ Cannot defer directly:
defer append(sl, 1)
defer cap(sl)
defer len(sl)

// ✅ Wrap in a closure:
defer func() { _ = append(sl, 1) }()

// ✅ Can defer directly:
defer close(ch)
defer delete(m, key)
defer recover()

Gotcha 3: Use anonymous functions to scope locks precisely

func someFunc() {
    // ... lots of code ...
    func() {
        mu.Lock()
        defer mu.Unlock()
        // critical section — lock released at end of anonymous func,
        // not at end of someFunc
    }()
    // ... more code runs without holding the lock ...
}

4. `panic` / `recover` Internals

Mental Model

Go	Java equivalent
`panic`	`RuntimeException` + `Error`
`recover`	`catch` (but only inside `defer`)

Key rule: recover only works inside a defer function. It catches panics propagated via runtime.panic(), but not runtime.throw() or runtime.fatal() (which are unrecoverable runtime errors).

Data Structures

type _panic struct {
    arg       interface{}  // value passed to panic()
    link      *_panic      // previous panic in chain
    recovered bool         // has this panic been recovered?
    aborted   bool         // has this panic been aborted?
}

type g struct {
    _panic *_panic   // head of panic chain (innermost first)
    _defer *_defer   // head of defer chain (innermost first)
}

`gopanic` Execution Flow

panic(val) called
    ↓
gopanic():
    allocate _panic on stack
    prepend to g._panic list
        ↓
    loop: walk g._defer list
        ↓
        execute each defer function
            ↓
            defer contains recover()?
                ├── YES: p.recovered = true
                │         mcall(recovery) → re-enter scheduler  ✅
                │         gopanic exits here
                └── NO:  continue to next defer
        ↓
    no more defers, p.recovered still false
        ↓
    preprintpanics() → print stack trace
    fatalpanic()     → terminate process  💥

// Minimal recover pattern:
func safeCall() {
    defer func() {
        if r := recover(); r != nil {
            fmt.Println("recovered:", r)
        }
    }()
    panic("something went wrong")
}

Panic is not for normal error handling. Use error returns for expected failures. Reserve panic for truly unrecoverable states (programmer errors, invariant violations).

5. Build Tags

Build tags control which files are included in compilation — at the file level, not the code block level.

// dev.go
//go:build dev
package main

func init() {
    configArr = append(configArr, "mysql dev")
}

// prod.go
//go:build prod
package main

func init() {
    configArr = append(configArr, "mysql prod")
}

go build -tags "dev"   # includes dev.go, excludes prod.go
go build -tags "prod"  # includes prod.go, excludes dev.go

Common use cases: environment-specific config, OS-specific implementations, feature flags, test fixtures.

6. Closures & the Goroutine Loop Trap

A closure captures variables by reference, not by value. This creates a classic bug when launching goroutines inside a loop.

The Trap

// ❌ All goroutines process the LAST node
for i := range nodes {
    go func() {
        node := nodes[i]  // i is shared — by the time goroutine runs,
        process(node)     // the loop has already advanced i
    }()
}

The Fix

// ✅ Capture a local copy of i at each iteration
for i := range nodes {
    index := i            // new variable per iteration
    go func() {
        node := nodes[index]  // each goroutine has its own index
        process(node)
    }()
}

// ✅ Or pass as argument (cleaner):
for i := range nodes {
    go func(idx int) {
        node := nodes[idx]
        process(node)
    }(i)
}

Go 1.22+ changed loop variable semantics: each iteration now creates a new variable, making the first pattern safe. But for pre-1.22 compatibility, always capture explicitly.

7. Summary

Topic	Key Takeaway
Bootstrap	Entry: `rt0_amd64` → `schedinit` → `newproc(main)` → `mstart`
defer (heap)	`_defer` on heap, linked list on G — slowest, most general
defer (stack)	`_defer` in stack frame — faster, no heap alloc
defer (open-coded)	Inlined at return sites with `deferBits` bitmask — near-zero cost
defer args	Evaluated at defer registration, not at execution
panic/recover	`gopanic` walks defer chain; `recover` sets `p.recovered`; `mcall(recovery)` re-enters scheduler
Build tags	File-level inclusion/exclusion at compile time
Closure trap	Loop variables are shared; always capture a local copy for goroutines

This concludes the Go Runtime Internals series. You now have a complete picture of how Go manages memory, I/O, system calls, scheduling, and language-level features like defer and panic — all the way down to the assembly level.

Found this series useful? Share it with your team and follow for more Go deep dives.

DEV Community

Go Compiler & defer: Bootstrap, Three defer Implementations, panic/recover & Closures

1. The Go Compiler Pipeline

Performance Tooling

2. Program Bootstrap: From `rt0_amd64` to `main.main`

`runtime.main()` execution order:

3. `defer`: Three Implementation Strategies

Why not just insert a function call at return?

Strategy 1: Heap Allocation (General Case)

Strategy 2: Stack Allocation (Go ≥ 1.13)

Strategy 3: Open-Coded (Go ≥ 1.14, Most Common)

defer Gotchas

4. `panic` / `recover` Internals

Mental Model

Data Structures

`gopanic` Execution Flow

5. Build Tags

6. Closures & the Goroutine Loop Trap

The Trap

The Fix

7. Summary

Top comments (0)

1. The Go Compiler Pipeline

Performance Tooling

2. Program Bootstrap: From rt0_amd64 to main.main

runtime.main() execution order:

3. defer: Three Implementation Strategies

Why not just insert a function call at return?

Strategy 1: Heap Allocation (General Case)

Strategy 2: Stack Allocation (Go ≥ 1.13)

Strategy 3: Open-Coded (Go ≥ 1.14, Most Common)

defer Gotchas

4. panic / recover Internals

Mental Model

Data Structures

gopanic Execution Flow

5. Build Tags

6. Closures & the Goroutine Loop Trap

The Trap

The Fix

7. Summary

2. Program Bootstrap: From `rt0_amd64` to `main.main`

`runtime.main()` execution order:

3. `defer`: Three Implementation Strategies

4. `panic` / `recover` Internals

`gopanic` Execution Flow