DEV Community

James Lee
James Lee

Posted on

Go I/O Optimization: goroutine-per-connection, netpoller & the Reader/Writer Interface

Go's I/O model is deceptively simple from the outside: you write blocking-style code, and the runtime handles the async machinery underneath. In this article we'll peel back that abstraction — from the evolution of network programming models, to how netpoller bridges epoll and goroutines, to practical patterns for high-performance I/O.


1. The Core Philosophy: Hide Complexity, Expose Simplicity

The history of network programming is a story of trading simplicity for scale:

Era 1: One process per connection
    → Simple, but doesn't scale (process overhead)
        ↓
Era 2: One thread per connection
    → Better, but thread context-switch cost limits concurrency
        ↓
Era 3: Non-blocking I/O + multiplexing (epoll / kqueue / IOCP)
    → Scales to millions of connections
    → But: callback-based, control flow is fragmented, hard to reason about
        ↓
Go: goroutine-per-connection + runtime-managed netpoller
    → Scales like epoll
    → Reads like blocking code  ✅
Enter fullscreen mode Exit fullscreen mode

Go's designers recognized that callback-based async I/O (like Node.js or libevent) breaks the natural flow of logic. Their solution: move the complexity into the runtime, and let developers write straightforward blocking-style code.

Key insight: Go developers never touch epoll_create, epoll_ctl, or epoll_wait directly. The runtime handles all of it transparently.


2. The io Package: A Universal Interface

At the heart of Go's I/O system is a minimal, composable interface:

type Reader interface {
    Read(p []byte) (n int, err error)
}

type Writer interface {
    Write(p []byte) (n int, err error)
}
Enter fullscreen mode Exit fullscreen mode

Everything in Go's standard library implements these interfaces:

Type Package Reader Writer
Byte slice bytes
String strings
Network connection net (net.Conn)
File handle os (os.File)

Because everything speaks Reader/Writer, utilities like io.Copy, bufio.NewReader, and io.TeeReader work universally across all I/O sources.


3. How netpoller Works: epoll Under the Hood

Go's netpoller is the component that converts OS-level non-blocking I/O into goroutine-friendly blocking I/O.

The epoll Primitives

epoll_create()   creates an epfd (event poll file descriptor)
epoll_ctl()      registers a fd (e.g. socket) with the epfd
epoll_wait()     blocks until one or more registered fds are ready
Enter fullscreen mode Exit fullscreen mode

netpoller's Virtual Interface

To abstract across Linux (epoll), Windows (IOCP), macOS/BSD (kqueue), netpoller defines a platform-agnostic interface:

func netpollinit()                              // initialize the poller
func netpollopen(fd uintptr, pd *pollDesc) int32 // register a fd
func netpoll(delta int64) gList                 // poll for ready events
func netpollBreak()                             // wake up the poller
func netpollIsPollDescriptor(fd uintptr) bool   // check if fd is managed
Enter fullscreen mode Exit fullscreen mode

Each platform provides its own implementation. On Linux, netpollopen calls epoll_ctl; netpoll calls epoll_wait.

Goroutine Lifecycle During I/O

When a goroutine performs a read/write on a non-ready file descriptor:

Goroutine calls Read() on net.Conn
      ↓
fd not ready → poll_runtime_pollWait()
      ↓
netpollblock() → gopark()
      ↓
Goroutine suspended (Waiting state)
M is released back to run other goroutines
      ↓
epoll_wait detects fd is ready
      ↓
netpoll() returns list of ready goroutines
      ↓
Goroutines moved back to P's LRQ (Runnable)
      ↓
Goroutine resumes execution on M  ✅
Enter fullscreen mode Exit fullscreen mode

4. GMP Interaction During Network I/O

This is what makes Go's model elegant — the scheduler and netpoller work together seamlessly.

Step 1: Normal execution

P ──► M ──► G1 (running)
LRQ: [G2, G3, G4]
netpoller: idle
Enter fullscreen mode Exit fullscreen mode

Step 2: G1 makes a network syscall

G1 ──► moved to netpoller (waiting for I/O)
M ──► picks up G2 from LRQ
P ──► M ──► G2 (running)
LRQ: [G3, G4]
netpoller: watching G1's fd
Enter fullscreen mode Exit fullscreen mode

G1 is parked. The M is not blocked — it continues running other goroutines.

Step 3: I/O completes

netpoller detects G1's fd is ready
G1 ──► moved back to P's LRQ
P ──► M ──► G2 (still running)
LRQ: [G3, G4, G1]
Enter fullscreen mode Exit fullscreen mode

Step 4: G1 resumes

G1 scheduled onto M, resumes from where it left off  ✅
Enter fullscreen mode Exit fullscreen mode

No extra M is needed for network I/O. The netpoller runs on a system thread with its own event loop. This is why Go can handle hundreds of thousands of concurrent connections without spawning OS threads.


5. Buffered Network I/O

Reading directly from net.Conn on every byte is expensive — each Read may trigger a syscall. The solution is buffering:

┌─────────────────────────────────────────────────────────┐
│              Buffered Network I/O Pattern               │
│                                                         │
│  netpoller goroutine                                    │
│      reads from socket → fills RingBuffer               │
│                              ↓                          │
│  business goroutine                                     │
│      reads from RingBuffer → decodes → processes        │
│                                                         │
│  Producer:  socket  ──write──► [ RingBuffer ]           │
│  Consumer:           ◄──read──  [ RingBuffer ]          │
└─────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Each Read from net.Conn attempts to fill the internal buffer (not just the caller's requested size). Subsequent reads are served from memory, reducing syscall frequency significantly.

Lock-Free RingBuffer Design

A naive RingBuffer has a problem: when full, it needs to grow, which requires copying — causing data races between the read and write pointers.

The solution used in high-performance Go servers:

Challenge Solution
Resize without copy Use a linked list of fixed-size buffers instead of a single array
Node allocation overhead Reuse nodes via sync.Pool
Read/write pointer race Maintain a length field updated with atomic operations

6. Object Reuse with sync.Pool

High-throughput servers allocate and discard the same types of objects millions of times per second. Each allocation puts pressure on the GC. The fix: pool and reuse.

Without pooling (allocates on every request):

func handleSubmit() {
    s := &Submit{}   // heap allocation every time
    // ... process s
}
Enter fullscreen mode Exit fullscreen mode

With sync.Pool:

var submitPool = sync.Pool{
    New: func() interface{} {
        return &Submit{}
    },
}

func handleSubmit() {
    s := submitPool.Get().(*Submit)  // reuse from pool
    defer submitPool.Put(s)          // return to pool when done
    // ... process s
}
Enter fullscreen mode Exit fullscreen mode

How to find what to pool: Use go tool pprof to identify the functions and lines with the highest heap allocation rates — those are your pooling candidates.

Caveat: Always reset pooled objects before reuse. sync.Pool objects may be cleared by the GC between GC cycles, so never rely on them for persistent state.


7. Summary

Concept What It Does
goroutine-per-connection Write blocking-style code; runtime handles async scheduling
netpoller Wraps epoll/kqueue/IOCP; parks goroutines waiting for I/O without blocking M
GMP + netpoller M is never blocked by network I/O; goroutines are re-queued when fd is ready
Buffered I/O Reduces syscall frequency by reading ahead into memory buffers
Lock-free RingBuffer Linked list + sync.Pool + atomic length for high-throughput I/O queues
sync.Pool Reuses short-lived objects to reduce GC pressure in hot paths
Developer writes:          Runtime does:
─────────────────          ──────────────────────────────────────
conn.Read(buf)     →       gopark() → epoll_wait → goready()
conn.Write(buf)    →       gopark() → epoll_wait → goready()
// looks blocking          // actually async, zero extra threads
Enter fullscreen mode Exit fullscreen mode

Go's I/O model is one of its greatest engineering achievements: the full power of epoll-based multiplexing, with the simplicity of sequential code.


Next in this series: Go System Calls & Blocking: syscall Wrapping, Async vs Sync & GMP Separation (Part 4)


Follow the series for more deep dives into Go's runtime internals.

Top comments (0)