DEV Community

James Lee
James Lee

Posted on

Go System Calls & Blocking: syscall Wrapping, Async vs Sync & GMP Separation

Every program eventually needs to talk to the kernel. In Go, that conversation is carefully managed to ensure that a single slow system call never brings down the entire scheduler. This article explains how Go wraps syscalls, how it handles the four types of blocking, and how the runtime recovers when an M gets stuck.


1. How Go Wraps System Calls

On UNIX systems, all programs ultimately talk to the kernel through C system calls. Most languages delegate this to glibc or a platform runtime library. Go chose to wrap syscalls itself, implementing everything in the syscall package — backed by hand-written assembly.

Go code
    ↓
syscall package  (Go + assembly)
    ↓
kernel (via int/syscall instruction)
Enter fullscreen mode Exit fullscreen mode

Why self-wrap? Full control. Go needs to intercept the moment a goroutine enters and exits a syscall so the scheduler can react — something you can't do if you're delegating to an opaque C library.


2. The Core Problem: M Escapes Runtime Control

The moment an M enters a system call, it leaves Go runtime control. If that syscall blocks:

  • The M is stuck
  • The M cannot be preempted
  • Any goroutines waiting on that P are starved

Go's solution is to detect this before it becomes a problem, using two distinct strategies depending on whether the syscall is async or sync.


3. The Four Blocking Scenarios

Go recognizes four distinct ways a goroutine can block, each handled differently:

# Blocking Cause Mechanism M Impact
1 Channel / mutex / atomic Scheduler parks G, runs next G from LRQ M stays free
2 Network I/O netpoller (epoll/kqueue/IOCP) parks G M stays free
3 File / OS syscall P detaches from M, finds another M M is stuck
4 time.Sleep / long-running G sysmon detects and preempts M may be reclaimed

Important distinction: Go is well-suited for network I/O-heavy workloads, but not for disk I/O-heavy ones. Here's why:

  • Network sockets implement .poll() — they can be set to non-blocking and monitored via epoll. When data isn't ready, the goroutine parks and the M is freed.
  • File handles do NOT implement .poll() — they are always "readable/writable" from the OS perspective, so read()/write() on files is synchronous. The M blocks until the disk responds.

Disk I/O = blocked M = reduced throughput. Under heavy disk I/O, Go runtime compensates by spawning more M threads, which can cause M count to spike dramatically.


4. Async System Calls (Network I/O)

When a goroutine makes a network system call, Go uses the netpoller to handle it asynchronously. The G separates from M+P:

Step 1: Normal execution
─────────────────────────────────────────────
P ──► M ──► G1 (running)
LRQ: [G2, G3, G4]
netpoller: idle

Step 2: G1 makes a network syscall
─────────────────────────────────────────────
G1 ──► moved to netpoller (waiting for fd)
M ──► picks up G2 from LRQ
P ──► M ──► G2 (running)
LRQ: [G3, G4]
netpoller: monitoring G1's socket fd

Step 3: Network I/O completes
─────────────────────────────────────────────
netpoller: fd ready → G1 marked Runnable
G1 ──► moved back to P's LRQ
P ──► M ──► G2 (still running)
LRQ: [G3, G4, G1]

Step 4: G1 resumes
─────────────────────────────────────────────
G1 scheduled onto M, continues execution  ✅
Enter fullscreen mode Exit fullscreen mode

Key outcome: No extra M is created. The M is never blocked. The netpoller runs on a dedicated system thread with its own event loop, keeping OS scheduling load minimal.

Summary: Async Path

G blocked on network I/O
    ↓
G detaches from M+P → moves to netpoller wait queue
    ↓
M continues running other Gs from LRQ
    ↓
epoll_wait fires → G marked Runnable → re-queued to P's LRQ
    ↓
G resumes on next available M  ✅
Enter fullscreen mode Exit fullscreen mode

5. Sync System Calls (File / OS I/O)

When a goroutine makes a blocking system call (e.g. file read, syscall.Read), the G and M are stuck together. Go responds by detaching the P:

Step 1: G1 enters a blocking syscall
─────────────────────────────────────────────
P ──► M1 ──► G1 (entering syscall → M1 will block)
LRQ: [G2, G3, G4]

Step 2: Scheduler detaches P from M1
─────────────────────────────────────────────
M1+G1 ──► stuck in syscall (isolated)
P ──► M2 (new or idle M)
M2 picks up G2 from LRQ
LRQ: [G3, G4]

Step 3: Syscall completes
─────────────────────────────────────────────
M1+G1 return from syscall
G1 ──► tries to reclaim a P
    ├── P available? → bind and continue
    └── No P? → G1 placed in global run queue
                M1 enters idle/standby state
Enter fullscreen mode Exit fullscreen mode

Key outcome: The P is never idle while M1 is stuck. Other goroutines keep running on M2.

Summary: Sync Path

G blocked on OS syscall
    ↓
P detaches from M → binds to another M (idle or new)
    ↓
M+G remain in syscall until kernel returns
    ↓
On return: G tries to reclaim P
    ├── success → resume on same M
    └── fail    → G to global queue, M to standby
Enter fullscreen mode Exit fullscreen mode

6. entersyscall and exitsyscall

Go wraps every system call with two runtime hooks:

entersyscall

Called before entering the syscall:

  • Sets the P's state to _Psyscall
  • Unbinds P from M (but M retains a pointer to P)
  • Signals to the scheduler: "this M may be about to block"

exitsyscall

Called after the syscall returns:

  • M still holds a pointer to its old P → tries to reclaim it first
  • If the old P was taken by another M → finds any available P
  • If no P is available → G is placed on the global run queue; M enters standby
entersyscall()
    ↓
[ kernel syscall executes ]
    ↓
exitsyscall()
    ├── old P still available? → rebind, continue  ✅
    ├── another P available?   → bind new P, continue  ✅
    └── no P available?        → G → global queue, M → standby
Enter fullscreen mode Exit fullscreen mode

Why is mcache on P, not M? Exactly because of this separation. When M+G enter a syscall and P detaches, the new M that picks up P also inherits its mcache — ensuring lock-free allocation continues uninterrupted.


7. sysmon: The Runtime Monitor Thread

Go starts a special background thread called sysmon at program startup. It runs as g0 on an M that requires no P — it always runs, regardless of scheduler state.

┌─────────────────────────────────────────────────────┐
│                  sysmon responsibilities            │
│                                                     │
│  Interval: 20µs → 10ms (adaptive)                  │
│                                                     │
│  ✦ Check for deadlocks (runtime.checkdead)          │
│  ✦ Fire due timers                                  │
│  ✦ Flush pending netpoll results to run queues      │
│  ✦ Preempt long-running Gs (retake)                 │
│  ✦ Force GC if no GC for > 2 minutes                │
│  ✦ Return idle spans to OS after > 5 minutes        │
│  ✦ Reclaim P from Ms stuck in syscall > 10ms        │
└─────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

sysmon and Syscall Recovery

The most critical sysmon behavior for syscall handling:

sysmon detects: M has been in syscall for > 10ms
    ↓
sysmon calls retake()
    ↓
P is forcibly stripped from M
    ↓
P handed to another M (idle or newly created)
    ↓
Goroutines on P's LRQ continue running  ✅
Enter fullscreen mode Exit fullscreen mode

This is why you'll see M count spike under heavy disk I/O or slow syscalls — sysmon keeps creating new Ms to keep Ps busy.


8. Complete Picture: All Four Blocking Scenarios

┌──────────────────────────────────────────────────────────────────┐
│              Go Blocking Handling — Decision Tree                │
│                                                                  │
│  G blocks on...                                                  │
│       │                                                          │
│       ├── channel / mutex / atomic                               │
│       │       → G parked in wait queue                           │
│       │       → M runs next G from LRQ  (M never blocked)        │
│       │                                                          │
│       ├── network I/O  (socket)                                  │
│       │       → G moved to netpoller                             │
│       │       → M runs next G from LRQ  (M never blocked)        │
│       │       → epoll fires → G re-queued → resumes              │
│       │                                                          │
│       ├── OS syscall  (file, etc.)                               │
│       │       → entersyscall: P detaches from M                  │
│       │       → new M picks up P and continues                   │
│       │       → exitsyscall: G tries to reclaim P                │
│       │       → sysmon reclaims P if M stuck > 10ms              │
│       │                                                          │
│       └── sleep / long-running G                                 │
│               → sysmon sets preemption flag                      │
│               → other Gs can preempt this G                      │
└──────────────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Summary

Mechanism Purpose
syscall package Self-wrapped OS calls via assembly — full runtime control
entersyscall Unbind P from M before syscall; mark P as _Psyscall
exitsyscall Reclaim P after syscall; fall back to global queue if needed
Async syscall path G parks in netpoller; M stays free; no new M needed
Sync syscall path P detaches; new M services P; M+G wait for kernel
sysmon Background monitor; reclaims stuck Ps, preempts long Gs, forces GC

Go's syscall handling is a masterclass in cooperative/preemptive hybrid scheduling: the runtime does everything it can to keep Ms busy and Ps utilized, falling back to creating new threads only when truly necessary.


Next in this series: Goroutine Scheduling: User Space vs Kernel, syscall Numbers & GMP in Action (Part 5)


Follow the series for more deep dives into Go's runtime internals.

Top comments (0)