Go's I/O model is deceptively simple from the outside: you write blocking-style code, and the runtime handles the async machinery underneath. In this article we'll peel back that abstraction — from the evolution of network programming models, to how netpoller bridges epoll and goroutines, to practical patterns for high-performance I/O.
1. The Core Philosophy: Hide Complexity, Expose Simplicity
The history of network programming is a story of trading simplicity for scale:
Era 1: One process per connection
→ Simple, but doesn't scale (process overhead)
↓
Era 2: One thread per connection
→ Better, but thread context-switch cost limits concurrency
↓
Era 3: Non-blocking I/O + multiplexing (epoll / kqueue / IOCP)
→ Scales to millions of connections
→ But: callback-based, control flow is fragmented, hard to reason about
↓
Go: goroutine-per-connection + runtime-managed netpoller
→ Scales like epoll
→ Reads like blocking code ✅
Go's designers recognized that callback-based async I/O (like Node.js or libevent) breaks the natural flow of logic. Their solution: move the complexity into the runtime, and let developers write straightforward blocking-style code.
Key insight: Go developers never touch
epoll_create,epoll_ctl, orepoll_waitdirectly. The runtime handles all of it transparently.
2. The io Package: A Universal Interface
At the heart of Go's I/O system is a minimal, composable interface:
type Reader interface {
Read(p []byte) (n int, err error)
}
type Writer interface {
Write(p []byte) (n int, err error)
}
Everything in Go's standard library implements these interfaces:
| Type | Package | Reader | Writer |
|---|---|---|---|
| Byte slice | bytes |
✅ | ✅ |
| String | strings |
✅ | — |
| Network connection |
net (net.Conn) |
✅ | ✅ |
| File handle |
os (os.File) |
✅ | ✅ |
Because everything speaks Reader/Writer, utilities like io.Copy, bufio.NewReader, and io.TeeReader work universally across all I/O sources.
3. How netpoller Works: epoll Under the Hood
Go's netpoller is the component that converts OS-level non-blocking I/O into goroutine-friendly blocking I/O.
The epoll Primitives
epoll_create() → creates an epfd (event poll file descriptor)
epoll_ctl() → registers a fd (e.g. socket) with the epfd
epoll_wait() → blocks until one or more registered fds are ready
netpoller's Virtual Interface
To abstract across Linux (epoll), Windows (IOCP), macOS/BSD (kqueue), netpoller defines a platform-agnostic interface:
func netpollinit() // initialize the poller
func netpollopen(fd uintptr, pd *pollDesc) int32 // register a fd
func netpoll(delta int64) gList // poll for ready events
func netpollBreak() // wake up the poller
func netpollIsPollDescriptor(fd uintptr) bool // check if fd is managed
Each platform provides its own implementation. On Linux, netpollopen calls epoll_ctl; netpoll calls epoll_wait.
Goroutine Lifecycle During I/O
When a goroutine performs a read/write on a non-ready file descriptor:
Goroutine calls Read() on net.Conn
↓
fd not ready → poll_runtime_pollWait()
↓
netpollblock() → gopark()
↓
Goroutine suspended (Waiting state)
M is released back to run other goroutines
↓
epoll_wait detects fd is ready
↓
netpoll() returns list of ready goroutines
↓
Goroutines moved back to P's LRQ (Runnable)
↓
Goroutine resumes execution on M ✅
4. GMP Interaction During Network I/O
This is what makes Go's model elegant — the scheduler and netpoller work together seamlessly.
Step 1: Normal execution
P ──► M ──► G1 (running)
LRQ: [G2, G3, G4]
netpoller: idle
Step 2: G1 makes a network syscall
G1 ──► moved to netpoller (waiting for I/O)
M ──► picks up G2 from LRQ
P ──► M ──► G2 (running)
LRQ: [G3, G4]
netpoller: watching G1's fd
G1 is parked. The M is not blocked — it continues running other goroutines.
Step 3: I/O completes
netpoller detects G1's fd is ready
G1 ──► moved back to P's LRQ
P ──► M ──► G2 (still running)
LRQ: [G3, G4, G1]
Step 4: G1 resumes
G1 scheduled onto M, resumes from where it left off ✅
No extra M is needed for network I/O. The netpoller runs on a system thread with its own event loop. This is why Go can handle hundreds of thousands of concurrent connections without spawning OS threads.
5. Buffered Network I/O
Reading directly from net.Conn on every byte is expensive — each Read may trigger a syscall. The solution is buffering:
┌─────────────────────────────────────────────────────────┐
│ Buffered Network I/O Pattern │
│ │
│ netpoller goroutine │
│ reads from socket → fills RingBuffer │
│ ↓ │
│ business goroutine │
│ reads from RingBuffer → decodes → processes │
│ │
│ Producer: socket ──write──► [ RingBuffer ] │
│ Consumer: ◄──read── [ RingBuffer ] │
└─────────────────────────────────────────────────────────┘
Each Read from net.Conn attempts to fill the internal buffer (not just the caller's requested size). Subsequent reads are served from memory, reducing syscall frequency significantly.
Lock-Free RingBuffer Design
A naive RingBuffer has a problem: when full, it needs to grow, which requires copying — causing data races between the read and write pointers.
The solution used in high-performance Go servers:
| Challenge | Solution |
|---|---|
| Resize without copy | Use a linked list of fixed-size buffers instead of a single array |
| Node allocation overhead | Reuse nodes via sync.Pool
|
| Read/write pointer race | Maintain a length field updated with atomic operations |
6. Object Reuse with sync.Pool
High-throughput servers allocate and discard the same types of objects millions of times per second. Each allocation puts pressure on the GC. The fix: pool and reuse.
Without pooling (allocates on every request):
func handleSubmit() {
s := &Submit{} // heap allocation every time
// ... process s
}
With sync.Pool:
var submitPool = sync.Pool{
New: func() interface{} {
return &Submit{}
},
}
func handleSubmit() {
s := submitPool.Get().(*Submit) // reuse from pool
defer submitPool.Put(s) // return to pool when done
// ... process s
}
How to find what to pool: Use go tool pprof to identify the functions and lines with the highest heap allocation rates — those are your pooling candidates.
Caveat: Always reset pooled objects before reuse.
sync.Poolobjects may be cleared by the GC between GC cycles, so never rely on them for persistent state.
7. Summary
| Concept | What It Does |
|---|---|
| goroutine-per-connection | Write blocking-style code; runtime handles async scheduling |
| netpoller | Wraps epoll/kqueue/IOCP; parks goroutines waiting for I/O without blocking M |
| GMP + netpoller | M is never blocked by network I/O; goroutines are re-queued when fd is ready |
| Buffered I/O | Reduces syscall frequency by reading ahead into memory buffers |
| Lock-free RingBuffer | Linked list + sync.Pool + atomic length for high-throughput I/O queues |
| sync.Pool | Reuses short-lived objects to reduce GC pressure in hot paths |
Developer writes: Runtime does:
───────────────── ──────────────────────────────────────
conn.Read(buf) → gopark() → epoll_wait → goready()
conn.Write(buf) → gopark() → epoll_wait → goready()
// looks blocking // actually async, zero extra threads
Go's I/O model is one of its greatest engineering achievements: the full power of epoll-based multiplexing, with the simplicity of sequential code.
Next in this series: Go System Calls & Blocking: syscall Wrapping, Async vs Sync & GMP Separation (Part 4)
Follow the series for more deep dives into Go's runtime internals.
Top comments (0)