The Go scheduler is one of the most sophisticated pieces of the runtime. It manages thousands of goroutines across a handful of OS threads, with minimal overhead and near-zero context-switch cost. In this article we'll trace the full lifecycle of a goroutine — from creation to scheduling loop to preemption to cleanup.
1. System Call vs Function Call: The Fundamental Difference
Before diving into the scheduler, it's worth understanding why goroutine scheduling requires assembly — and why it's fundamentally different from a regular function call.
| Function Call | System Call | |
|---|---|---|
| Instruction | CALL |
SYSCALL / INT 0x80
|
| Address space | User space only | Crosses into kernel space |
| How to invoke | By function name/address | By integer syscall number |
| Registers saved | PC + SP only | All CPU registers (full context) |
| Virtual memory | Not touched | Not touched (same process) |
| Process switch | No | No (same process, intra-process switch) |
On AMD64 Linux, the syscall convention is:
-
rax→ syscall number -
rdi,rsi,rdx,r10,r8,r9→ first 6 arguments
Why assembly for goroutine switching? Because switching execution flows requires precisely controlling PC (program counter) and SP (stack pointer) registers — something no high-level language can do. The scheduler's
gogo,mcall, andgoexitare all written in assembly.
2. The GMP Model
Go's scheduler is built on three core abstractions:
┌─────────────────────────────────────────────────────────┐
│ GMP Model │
│ │
│ G (Goroutine) — the unit of work │
│ M (Machine) — OS thread, the actual executor │
│ P (Processor) — scheduling context, owns LRQ │
│ │
│ P0 P1 P2 │
│ [LRQ] [LRQ] [LRQ] ← local run queues │
│ │ │ │ │
│ M0 M1 M2 ← OS threads │
│ │ │ │ │
│ G G G ← running goroutines │
│ │
│ [Global Run Queue] ← shared, requires lock │
│ [netpoller] ← waiting on I/O │
└─────────────────────────────────────────────────────────┘
Key design: mcache lives on P, not M. When M+G enter a blocking syscall and P detaches, the new M that picks up P inherits its mcache — ensuring lock-free allocation continues.
3. Initialization: From g0 to main
Before any user code runs, the runtime bootstraps itself:
Step 1: Initialize g0
─────────────────────────────────────────────
g0 is the "runtime goroutine" — provides a stack
for the runtime itself to execute on.
m0 ←→ g0 are bound together.
get_tls() on main thread → returns g0 → g0.m → m0
Step 2: Initialize P and allp
─────────────────────────────────────────────
schedinit() → mcommoninit() → procresize()
Creates GOMAXPROCS P structs, stored in allp[]
Step 3: Create main goroutine (newg)
─────────────────────────────────────────────
runtime.newproc() creates newg with 2KB stack
newg.stack.hi / newg.stack.lo mark stack bounds
Stack is primed: goexit's address pushed as fake return
newg.sched.pc = runtime.main
newg state → _Grunnable → placed in P's LRQ
Step 4: gogo — switch from g0 to newg
─────────────────────────────────────────────
Save g0's SP to g0.sched.sp
schedule() → finds newg
gogo() → switches stack from g0 to newg
→ JMP to newg.sched.pc (runtime.main)
runtime.main startup sequence:
runtime.main()
├── start sysmon thread
├── init runtime packages
├── init all imported packages
├── call main.main()
└── exit() syscall → process ends
Note: Non-main goroutines return to
goexitafter finishing. The main goroutine terminates the entire process when it returns — this is the key difference.
4. The Scheduling Loop
Every M runs an infinite scheduling loop. In pseudocode:
// Each OS thread runs this loop
func scheduleLoop() {
for {
g := findRunnableGoroutine() // find next G
run(g) // execute G
saveState(g) // save registers
}
}
The actual schedule() function searches for a runnable G in priority order:
func schedule() {
// 1. Every 61 ticks: check global run queue (fairness guarantee)
if schedtick % 61 == 0 && sched.runqsize > 0 {
gp = globrunqget(p, 1)
}
// 2. Check P's local run queue (LRQ)
if gp == nil {
gp, inheritTime = runqget(p)
}
// 3. findrunnable(): steal from other Ps, check netpoller, block if nothing
if gp == nil {
gp, inheritTime = findrunnable() // blocks until a G is found
}
// 4. Execute the G
execute(gp, inheritTime)
}
Fairness mechanism: Every 61 scheduling ticks, the scheduler checks the global queue first. Without this, goroutines in the global queue could starve if local queues are always full.
execute() → gogo() → user goroutine
func execute(gp *g, inheritTime bool) {
casgstatus(gp, _Grunnable, _Grunning) // G state: Runnable → Running
gp.preempt = false
_g_.m.curg = gp // M → G (forward link)
gp.m = _g_.m // G → M (back link)
gogo(&gp.sched) // assembly: switch stack + JMP to gp's PC
}
The Full Scheduling Cycle
schedule()
→ execute()
→ gogo() ← assembly: switch to G's stack
→ user func() ← G runs
→ goexit() ← fake return (injected by newproc)
→ goexit1()
→ mcall(goexit0)
→ goexit0()
→ casgstatus(_Grunning → _Gdead)
→ dropg() ← unbind M and G
→ gfput(p, gp) ← recycle G into gfree list
→ schedule() ← start next cycle ♻️
This is the complete loop: schedule → execute → gogo → user code → goexit → schedule.
5. Goroutine Preemption
Go uses a hybrid cooperative + preemptive scheduling model.
Cooperative Preemption (Go ≤ 1.13)
-
sysmondetects G running > 10ms → setsg.preempt = true - At every function call, the compiler inserts a stack-growth check (
runtime.morestack_noctxt) - If
preempt == true, G detaches from M and is pushed to the global queue
The loophole:
func main() {
go fmt.Println("hi")
for {} // no function calls → preempt flag never checked → goroutine never yields
}
Signal-Based Async Preemption (Go ≥ 1.14)
Go 1.14 introduced true async preemption using OS signals:
Step 1: M registers sighandler for SIGURG at startup
Step 2: sysmon detects G running > 10ms
→ calls preemptone()
→ sends SIGURG to the M running that G
Step 3: M's sighandler fires
→ receives sigPreempt
→ calls doSigPreempt()
→ injects a CALL to runtime.asyncPreempt
into G's execution context (modifies PC)
Step 4: G executes asyncPreempt (without knowing it)
→ mcall → switches to g0 stack
→ gopreempt_m() → schedule()
Step 5: G is preempted and re-queued
When rescheduled, G resumes from original PC ✅
Why SIGURG? It's rarely used by application code, making it a safe signal to hijack for the runtime.
This mechanism also benefits the GC: to stop the world, Go simply sends SIGURG to all running threads — no need for safe-point polling.
6. gopark: Voluntarily Suspending a Goroutine
When a goroutine needs to wait (channel, mutex, I/O), it calls gopark:
func gopark(unlockf func(*g, unsafe.Pointer) bool, ...) {
mp := acquirem()
gp := mp.curg
mp.waitunlockf = unlockf
gp.waitreason = reason
releasem(mp)
mcall(park_m) // switch to g0, call park_m
}
func park_m(gp *g) {
casgstatus(gp, _Grunning, _Gwaiting) // G: Running → Waiting
dropg() // unbind M and G
if fn := _g_.m.waitunlockf; fn != nil {
ok := fn(gp, _g_.m.waitlock)
if !ok {
// condition already met, re-run immediately
casgstatus(gp, _Gwaiting, _Grunnable)
execute(gp, true)
}
}
schedule() // find next G to run
}
G state transitions:
_Grunnable ──execute()──► _Grunning
▲ │
│ gopark() / syscall
│ ▼
goready() _Gwaiting
▲ │
└──── event fires ─────────┘
7. Goroutine Stacks
Unlike OS threads (which have fixed-size stacks, typically 1–8MB), goroutine stacks are dynamic and managed by the runtime.
type stack struct {
lo uintptr // stack bottom
hi uintptr // stack top
}
-
Initial size: 2KB (allocated from
stackpoolormcache) -
Growth: When a function call would overflow the stack, the compiler-inserted
morestackcheck triggers a stack copy to a larger allocation -
Small stacks (< threshold): Allocated from
stackpool/mcache.stackcache -
Large stacks: Allocated from
stackLargeor directly frommheap
Goroutine stacks vs OS thread stacks: OS thread stacks are part of the process memory model (
task_struct). Goroutine stacks are runtime-allocated heap memory, referenced by theg.stackfield.
8. Goroutine Leaks
A goroutine leak occurs when a goroutine starts but never exits — typically because it's blocked forever on a channel, mutex, or I/O that never resolves.
Why leaks are serious:
- Each goroutine holds at least 2KB of stack memory
- Heap objects referenced by the goroutine cannot be GC'd
- Memory usage grows continuously → eventual OOM crash
Common leak patterns:
// Leak: channel send with no receiver
func leak() {
ch := make(chan int)
go func() {
ch <- 1 // blocks forever if no one reads
}()
// ch is never read
}
// Leak: infinite loop with no exit condition
func leak2(done <-chan struct{}) {
go func() {
for {
// forgot to check <-done
}
}()
}
Detection: Use go tool pprof → goroutine profile to see all live goroutines and their creation stacks.
9. Summary: The Complete Scheduling Picture
go func() { ... }
↓
runtime.newproc() → allocate G, push to P's LRQ
↓
M's schedule loop picks up G
↓
execute() → gogo() → G runs on M
↓
Preemption? ──► SIGURG → asyncPreempt → schedule()
Blocking? ──► gopark() → _Gwaiting → schedule()
Syscall? ──► entersyscall → P detaches → exitsyscall
Done? ──► goexit() → goexit0() → gfput() → schedule()
| Concept | Role |
|---|---|
schedule() |
Find next runnable G (LRQ → global queue → steal → netpoller) |
execute() |
Set G state to Running, bind M↔G, call gogo
|
gogo() |
Assembly: switch stack + JMP to G's PC |
goexit() |
Cleanup after G finishes: reset, recycle, re-schedule |
gopark() |
Voluntarily suspend G: Running → Waiting → schedule() |
sysmon |
Background monitor: preempt long Gs, reclaim stuck Ps |
| Signal preemption | SIGURG → inject asyncPreempt → force schedule() |
Next in this series: Go Compiler & defer: Bootstrap Compiler, pprof+trace Toolchain & defer Internals (Part 6)
Follow the series for more deep dives into Go's runtime internals.
Top comments (0)