James Lee

Posted on May 19

Go I/O Optimization: goroutine-per-connection, netpoller & the Reader/Writer Interface

#computerscience #go #networking #performance

Go's I/O model is deceptively simple from the outside: you write blocking-style code, and the runtime handles the async machinery underneath. In this article we'll peel back that abstraction — from the evolution of network programming models, to how netpoller bridges epoll and goroutines, to practical patterns for high-performance I/O.

1. The Core Philosophy: Hide Complexity, Expose Simplicity

The history of network programming is a story of trading simplicity for scale:

Era 1: One process per connection
    → Simple, but doesn't scale (process overhead)
        ↓
Era 2: One thread per connection
    → Better, but thread context-switch cost limits concurrency
        ↓
Era 3: Non-blocking I/O + multiplexing (epoll / kqueue / IOCP)
    → Scales to millions of connections
    → But: callback-based, control flow is fragmented, hard to reason about
        ↓
Go: goroutine-per-connection + runtime-managed netpoller
    → Scales like epoll
    → Reads like blocking code  ✅

Go's designers recognized that callback-based async I/O (like Node.js or libevent) breaks the natural flow of logic. Their solution: move the complexity into the runtime, and let developers write straightforward blocking-style code.

Key insight: Go developers never touch epoll_create, epoll_ctl, or epoll_wait directly. The runtime handles all of it transparently.

2. The `io` Package: A Universal Interface

At the heart of Go's I/O system is a minimal, composable interface:

type Reader interface {
    Read(p []byte) (n int, err error)
}

type Writer interface {
    Write(p []byte) (n int, err error)
}

Everything in Go's standard library implements these interfaces:

Type	Package	Reader	Writer
Byte slice	`bytes`	✅	✅
String	`strings`	✅	—
Network connection	`net` (`net.Conn`)	✅	✅
File handle	`os` (`os.File`)	✅	✅

Because everything speaks Reader/Writer, utilities like io.Copy, bufio.NewReader, and io.TeeReader work universally across all I/O sources.

3. How netpoller Works: epoll Under the Hood

Go's netpoller is the component that converts OS-level non-blocking I/O into goroutine-friendly blocking I/O.

The epoll Primitives

epoll_create()  → creates an epfd (event poll file descriptor)
epoll_ctl()     → registers a fd (e.g. socket) with the epfd
epoll_wait()    → blocks until one or more registered fds are ready

netpoller's Virtual Interface

To abstract across Linux (epoll), Windows (IOCP), macOS/BSD (kqueue), netpoller defines a platform-agnostic interface:

func netpollinit()                              // initialize the poller
func netpollopen(fd uintptr, pd *pollDesc) int32 // register a fd
func netpoll(delta int64) gList                 // poll for ready events
func netpollBreak()                             // wake up the poller
func netpollIsPollDescriptor(fd uintptr) bool   // check if fd is managed

Each platform provides its own implementation. On Linux, netpollopen calls epoll_ctl; netpoll calls epoll_wait.

Goroutine Lifecycle During I/O

When a goroutine performs a read/write on a non-ready file descriptor:

Goroutine calls Read() on net.Conn
      ↓
fd not ready → poll_runtime_pollWait()
      ↓
netpollblock() → gopark()
      ↓
Goroutine suspended (Waiting state)
M is released back to run other goroutines
      ↓
epoll_wait detects fd is ready
      ↓
netpoll() returns list of ready goroutines
      ↓
Goroutines moved back to P's LRQ (Runnable)
      ↓
Goroutine resumes execution on M  ✅

4. GMP Interaction During Network I/O

This is what makes Go's model elegant — the scheduler and netpoller work together seamlessly.

Step 1: Normal execution

P ──► M ──► G1 (running)
LRQ: [G2, G3, G4]
netpoller: idle

Step 2: G1 makes a network syscall

G1 ──► moved to netpoller (waiting for I/O)
M ──► picks up G2 from LRQ
P ──► M ──► G2 (running)
LRQ: [G3, G4]
netpoller: watching G1's fd

G1 is parked. The M is not blocked — it continues running other goroutines.

Step 3: I/O completes

netpoller detects G1's fd is ready
G1 ──► moved back to P's LRQ
P ──► M ──► G2 (still running)
LRQ: [G3, G4, G1]

Step 4: G1 resumes

G1 scheduled onto M, resumes from where it left off  ✅

No extra M is needed for network I/O. The netpoller runs on a system thread with its own event loop. This is why Go can handle hundreds of thousands of concurrent connections without spawning OS threads.

5. Buffered Network I/O

Reading directly from net.Conn on every byte is expensive — each Read may trigger a syscall. The solution is buffering:

┌─────────────────────────────────────────────────────────┐
│              Buffered Network I/O Pattern               │
│                                                         │
│  netpoller goroutine                                    │
│      reads from socket → fills RingBuffer               │
│                              ↓                          │
│  business goroutine                                     │
│      reads from RingBuffer → decodes → processes        │
│                                                         │
│  Producer:  socket  ──write──► [ RingBuffer ]           │
│  Consumer:           ◄──read──  [ RingBuffer ]          │
└─────────────────────────────────────────────────────────┘

Each Read from net.Conn attempts to fill the internal buffer (not just the caller's requested size). Subsequent reads are served from memory, reducing syscall frequency significantly.

Lock-Free RingBuffer Design

A naive RingBuffer has a problem: when full, it needs to grow, which requires copying — causing data races between the read and write pointers.

The solution used in high-performance Go servers:

Challenge	Solution
Resize without copy	Use a linked list of fixed-size buffers instead of a single array
Node allocation overhead	Reuse nodes via `sync.Pool`
Read/write pointer race	Maintain a `length` field updated with `atomic` operations

6. Object Reuse with `sync.Pool`

High-throughput servers allocate and discard the same types of objects millions of times per second. Each allocation puts pressure on the GC. The fix: pool and reuse.

Without pooling (allocates on every request):

func handleSubmit() {
    s := &Submit{}   // heap allocation every time
    // ... process s
}

With `sync.Pool`:

var submitPool = sync.Pool{
    New: func() interface{} {
        return &Submit{}
    },
}

func handleSubmit() {
    s := submitPool.Get().(*Submit)  // reuse from pool
    defer submitPool.Put(s)          // return to pool when done
    // ... process s
}

How to find what to pool: Use go tool pprof to identify the functions and lines with the highest heap allocation rates — those are your pooling candidates.

Caveat: Always reset pooled objects before reuse. sync.Pool objects may be cleared by the GC between GC cycles, so never rely on them for persistent state.

7. Summary

Concept	What It Does
goroutine-per-connection	Write blocking-style code; runtime handles async scheduling
netpoller	Wraps epoll/kqueue/IOCP; parks goroutines waiting for I/O without blocking M
GMP + netpoller	M is never blocked by network I/O; goroutines are re-queued when fd is ready
Buffered I/O	Reduces syscall frequency by reading ahead into memory buffers
Lock-free RingBuffer	Linked list + sync.Pool + atomic length for high-throughput I/O queues
sync.Pool	Reuses short-lived objects to reduce GC pressure in hot paths

Developer writes:          Runtime does:
─────────────────          ──────────────────────────────────────
conn.Read(buf)     →       gopark() → epoll_wait → goready()
conn.Write(buf)    →       gopark() → epoll_wait → goready()
// looks blocking          // actually async, zero extra threads

Go's I/O model is one of its greatest engineering achievements: the full power of epoll-based multiplexing, with the simplicity of sequential code.

Next in this series: Go System Calls & Blocking: syscall Wrapping, Async vs Sync & GMP Separation (Part 4)

Follow the series for more deep dives into Go's runtime internals.

DEV Community

Go I/O Optimization: goroutine-per-connection, netpoller & the Reader/Writer Interface

1. The Core Philosophy: Hide Complexity, Expose Simplicity

2. The `io` Package: A Universal Interface

3. How netpoller Works: epoll Under the Hood

The epoll Primitives

netpoller's Virtual Interface

Goroutine Lifecycle During I/O

4. GMP Interaction During Network I/O

Step 1: Normal execution

Step 2: G1 makes a network syscall

Step 3: I/O completes

Step 4: G1 resumes

5. Buffered Network I/O

Lock-Free RingBuffer Design

6. Object Reuse with `sync.Pool`

Without pooling (allocates on every request):

With `sync.Pool`:

7. Summary

Top comments (0)

1. The Core Philosophy: Hide Complexity, Expose Simplicity

2. The io Package: A Universal Interface

3. How netpoller Works: epoll Under the Hood

The epoll Primitives

netpoller's Virtual Interface

Goroutine Lifecycle During I/O

4. GMP Interaction During Network I/O

Step 1: Normal execution

Step 2: G1 makes a network syscall

Step 3: I/O completes

Step 4: G1 resumes

5. Buffered Network I/O

Lock-Free RingBuffer Design

6. Object Reuse with sync.Pool

Without pooling (allocates on every request):

With sync.Pool:

7. Summary

2. The `io` Package: A Universal Interface

6. Object Reuse with `sync.Pool`

With `sync.Pool`: