Go's compiler is written entirely in Go — a self-hosting compiler that handles everything from frontend parsing to backend code generation. In this article we'll trace the program lifecycle from the very first instruction, dig into how defer is implemented in three different ways, and cover the panic/recover model, build tags, and the infamous closure-in-loop trap.
1. The Go Compiler Pipeline
Source (.go)
↓
Lexer / Parser → AST
↓
Type Checker → typed AST
↓
IR (SSA) → optimizations, escape analysis
↓
Code Generation → machine code
↓
Linker → binary
The entire pipeline — from frontend to backend — is implemented in Go itself. No C, no LLVM (by default).
Performance Tooling
Before optimizing, always measure:
| Tool | What it shows |
|---|---|
go tool pprof |
CPU, memory, goroutine profiles — µs-level code hotspots |
go tool trace |
Runtime events: goroutine scheduling, GC, netpoller — ns-level |
Optimization priority:
Storage layer (ms gains) > Business logic (µs gains) > Low-level code (ns gains)
Optimization workflow:
- Load test with realistic traffic
-
pprof→ identify CPU/memory hotspots - Fix: async, cache, algorithm change
-
benchmark→ verify local improvement - Repeat load test → check p95 latency
2. Program Bootstrap: From rt0_amd64 to main.main
Your Go program does not start at main.main. It doesn't even start at runtime.main. The real entry point is deep in the runtime assembly.
runtime.rt0_amd64 ← actual binary entry point
↓
runtime.rt0_go ← determine CPU core count + physical page size
↓
runtime.schedinit() ← initialize: scheduler, stack allocator,
↓ memory allocator, GC
runtime.newproc(main) ← create main goroutine, push to P's LRQ
↓
runtime.mstart() ← start M0, enter scheduling loop (never returns)
↓
runtime.main() ← main goroutine starts here
runtime.main() execution order:
func main() {
// 1. Set max stack size: 1GB (64-bit) / 250MB (32-bit)
maxstacksize = 1000000000
// 2. Start sysmon background thread (GC, preemption, netpoll)
systemstack(func() { newm(sysmon, nil) })
// 3. Initialize runtime packages
runtime_init()
// 4. Enable GC background workers
gcenable()
// 5. Run all init() functions in imported packages
main_init()
// 6. Run user's main.main()
main_main()
// 7. Exit
exit(0)
}
Key difference: Non-main goroutines return to
goexitwhen done. The main goroutine callsexit(0)— terminating the entire process immediately.
3. defer: Three Implementation Strategies
defer looks like a simple "run this at function exit" mechanism. The reality is more nuanced — Go uses three different implementations depending on the context, each with different performance characteristics.
Why not just insert a function call at return?
Because defer can appear inside conditionals and loops, the compiler cannot always statically determine how many defers exist or which ones will execute. This makes a purely compile-time solution insufficient.
Strategy 1: Heap Allocation (General Case)
Each defer creates a _defer struct allocated on the heap, chained into the goroutine's defer linked list.
// Compiler transforms:
defer foo()
// Into:
deferproc(foo) // allocate _defer on heap, push to G._defer list
...
deferreturn() // at function exit: walk list, execute defers in LIFO order
type _defer struct {
siz int32 // size of arguments + return values
sp uintptr // stack pointer at defer site
pc uintptr // caller's program counter
fn *funcval // the deferred function
link *_defer // next defer in chain (linked list)
}
type g struct {
_defer *_defer // head of this goroutine's defer list
}
Cost: Heap allocation per defer call. Slowest strategy.
Strategy 2: Stack Allocation (Go ≥ 1.13)
When the compiler can prove a defer is not in a loop and the number is bounded, it allocates the _defer struct on the stack instead of the heap.
// Compiler allocates _defer directly in the function's stack frame
t := deferstruct(stksize) // stack-allocated _defer
// ... initialize fields ...
deferreturn()
Cost: No heap allocation. Significantly faster than heap strategy.
Strategy 3: Open-Coded (Go ≥ 1.14, Most Common)
When all conditions are met, the compiler inlines the deferred calls directly at each return site, using a bitmask (deferBits) to track which defers should fire.
Conditions for open-coded defer:
- Compiler optimizations not disabled (
-gcflags "-N"not set) - ≤ 8 defers in the function
num_defers × num_returns ≤ 15- No
deferinside a loop
// Source:
defer f1(a1)
if cond {
defer f2(a2)
}
// Compiler generates:
deferBits = 0b00000000
deferBits |= 1 << 0 // f1 is always deferred → bit 0 set
_f1, _a1 = f1, a1
if cond {
deferBits |= 1 << 1 // f2 conditionally deferred → bit 1 set
_f2, _a2 = f2, a2
}
// At every return site (reverse order):
if deferBits & (1<<1) != 0 {
deferBits &^= (1 << 1)
_f2(_a2)
}
if deferBits & (1<<0) != 0 {
deferBits &^= (1 << 0)
_f1(_a1)
}
Cost: Near-zero — just a few bit operations and direct calls. No allocation.
Not truly zero-cost: Arguments are evaluated and copied to the stack at the
defersite. Conditional defers still need thedeferBitscheck at runtime.
defer Gotchas
Gotcha 1: Arguments are evaluated immediately
// Prints "0s" — time.Since() is evaluated when defer is registered
func main() {
startedAt := time.Now()
defer fmt.Println(time.Since(startedAt))
time.Sleep(time.Second)
}
// Prints "1s" — time.Since() is evaluated when the closure runs
func main() {
startedAt := time.Now()
defer func() { fmt.Println(time.Since(startedAt)) }()
time.Sleep(time.Second)
}
Gotcha 2: Not all builtins can be deferred directly
// ❌ Cannot defer directly:
defer append(sl, 1)
defer cap(sl)
defer len(sl)
// ✅ Wrap in a closure:
defer func() { _ = append(sl, 1) }()
// ✅ Can defer directly:
defer close(ch)
defer delete(m, key)
defer recover()
Gotcha 3: Use anonymous functions to scope locks precisely
func someFunc() {
// ... lots of code ...
func() {
mu.Lock()
defer mu.Unlock()
// critical section — lock released at end of anonymous func,
// not at end of someFunc
}()
// ... more code runs without holding the lock ...
}
4. panic / recover Internals
Mental Model
| Go | Java equivalent |
|---|---|
panic |
RuntimeException + Error
|
recover |
catch (but only inside defer) |
Key rule: recover only works inside a defer function. It catches panics propagated via runtime.panic(), but not runtime.throw() or runtime.fatal() (which are unrecoverable runtime errors).
Data Structures
type _panic struct {
arg interface{} // value passed to panic()
link *_panic // previous panic in chain
recovered bool // has this panic been recovered?
aborted bool // has this panic been aborted?
}
type g struct {
_panic *_panic // head of panic chain (innermost first)
_defer *_defer // head of defer chain (innermost first)
}
gopanic Execution Flow
panic(val) called
↓
gopanic():
allocate _panic on stack
prepend to g._panic list
↓
loop: walk g._defer list
↓
execute each defer function
↓
defer contains recover()?
├── YES: p.recovered = true
│ mcall(recovery) → re-enter scheduler ✅
│ gopanic exits here
└── NO: continue to next defer
↓
no more defers, p.recovered still false
↓
preprintpanics() → print stack trace
fatalpanic() → terminate process 💥
// Minimal recover pattern:
func safeCall() {
defer func() {
if r := recover(); r != nil {
fmt.Println("recovered:", r)
}
}()
panic("something went wrong")
}
Panic is not for normal error handling. Use
errorreturns for expected failures. Reservepanicfor truly unrecoverable states (programmer errors, invariant violations).
5. Build Tags
Build tags control which files are included in compilation — at the file level, not the code block level.
// dev.go
//go:build dev
package main
func init() {
configArr = append(configArr, "mysql dev")
}
// prod.go
//go:build prod
package main
func init() {
configArr = append(configArr, "mysql prod")
}
go build -tags "dev" # includes dev.go, excludes prod.go
go build -tags "prod" # includes prod.go, excludes dev.go
Common use cases: environment-specific config, OS-specific implementations, feature flags, test fixtures.
6. Closures & the Goroutine Loop Trap
A closure captures variables by reference, not by value. This creates a classic bug when launching goroutines inside a loop.
The Trap
// ❌ All goroutines process the LAST node
for i := range nodes {
go func() {
node := nodes[i] // i is shared — by the time goroutine runs,
process(node) // the loop has already advanced i
}()
}
The Fix
// ✅ Capture a local copy of i at each iteration
for i := range nodes {
index := i // new variable per iteration
go func() {
node := nodes[index] // each goroutine has its own index
process(node)
}()
}
// ✅ Or pass as argument (cleaner):
for i := range nodes {
go func(idx int) {
node := nodes[idx]
process(node)
}(i)
}
Go 1.22+ changed loop variable semantics: each iteration now creates a new variable, making the first pattern safe. But for pre-1.22 compatibility, always capture explicitly.
7. Summary
| Topic | Key Takeaway |
|---|---|
| Bootstrap | Entry: rt0_amd64 → schedinit → newproc(main) → mstart
|
| defer (heap) |
_defer on heap, linked list on G — slowest, most general |
| defer (stack) |
_defer in stack frame — faster, no heap alloc |
| defer (open-coded) | Inlined at return sites with deferBits bitmask — near-zero cost |
| defer args | Evaluated at defer registration, not at execution |
| panic/recover |
gopanic walks defer chain; recover sets p.recovered; mcall(recovery) re-enters scheduler |
| Build tags | File-level inclusion/exclusion at compile time |
| Closure trap | Loop variables are shared; always capture a local copy for goroutines |
This concludes the Go Runtime Internals series. You now have a complete picture of how Go manages memory, I/O, system calls, scheduling, and language-level features like defer and panic — all the way down to the assembly level.
Found this series useful? Share it with your team and follow for more Go deep dives.
Top comments (0)