DEV Community

Naz Quadri
Naz Quadri

Posted on • Originally published at nazquadri.dev

The Event Loop You're Already Using

The Event Loop You're Already Using

select, poll, epoll, and the System Calls Behind Every Async Framework

Reading time: ~13 minutes


You wrote await fetch(url). Your Node.js server handled ten thousand simultaneous connections while it waited. Your CPU usage barely moved.

Here's what actually happened: your code called into a JavaScript engine, which called libuv, which called epoll_wait, which asked the kernel to wake it up when any of ten thousand file descriptors had data ready. The kernel said nothing for 40 milliseconds. Then it said: "three of them are ready." Your event loop woke up and processed exactly those three. The other 9,997 connections cost you nothing while they waited.

That's the whole trick. One syscall. The kernel does the waiting. You do the work.


Why This Matters Beyond "Async Is Fast"

You've heard that async I/O is efficient. You may have accepted that on faith, or from a benchmark someone posted. But without understanding the layer underneath, you're flying on instruments you don't know how to read.

Here's the bug you'll eventually hit: you write some async Python, everything looks right, and it's still blocking. Your entire server stalls for 300 milliseconds every time a particular function runs. You add more workers. It keeps happening. The problem is a single synchronous call buried in a dependency — a time.sleep, a blocking DNS lookup, an open() on a network filesystem. That call doesn't yield to the event loop. It holds the thread hostage until it returns.

Understanding the mechanism is the only way to understand why that's catastrophic — and why the fix isn't "just add async," it's "figure out where you're not actually doing non-blocking I/O."


The Problem That Needed Solving

Before we get to the solutions, let's understand what problem they solve.

It's 1983. You're writing a server. A client connects. You read() from the socket. If there's no data yet, your process blocks — it goes to sleep, the CPU runs something else, and you wake up when data arrives. This is called blocking I/O, and for one client, it's totally fine.

Scale it up. A thousand clients. Each read() call could block. Your single-threaded process blocks on the first client who has nothing to say yet, while the other 999 who have data are sitting there waiting. The obvious fix is threads — one thread per client. But a thousand threads is a thousand stacks (usually 8MB each by default), a thousand kernel scheduling contexts, and constant context switching overhead.

In 1983, you couldn't afford that. In 2024, the math still gets ugly fast. A modern web server at scale handles hundreds of thousands of connections. You cannot have hundreds of thousands of threads.

What you want is a way to say: "Here's a list of a hundred thousand file descriptors. Tell me when any of them have something interesting." One call. The kernel blocks until there's work. You wake up, process exactly what's ready, go back to sleep.

That's the problem. Here are the solutions, in chronological order of how well they work.


select: The 1983 Hammer

select was the first answer, and it arrived with 4.2BSD in 1983 — the Berkeley team's first attempt to solve the multiplexing problem in a standard way. The interface looks like this:

int select(int nfds, fd_set *readfds, fd_set *writefds,
           fd_set *exceptfds, struct timeval *timeout);
Enter fullscreen mode Exit fullscreen mode

You give it three sets of file descriptors — ones you want to read, ones you want to write, ones where you care about exceptions — and a timeout. It blocks until something is ready or the timeout expires. When it returns, the sets have been modified in place to show you which ones are ready.

The fd_set is a bitmask. On most systems, it's 1024 bits. That's your limit: 1024 file descriptors, max. (You can recompile with a larger FD_SETSIZE, but you can't escape the O(n) scan.) If you need more, select literally cannot help you.

But it gets worse. Every time you call select, you have to rebuild the set of descriptors you care about, pass it to the kernel, and the kernel has to walk every bit of that mask to figure out which ones changed. For 1000 connections, that's 1000 checks on every call, even if only one descriptor became ready.

select is O(n) where n is the number of file descriptors you're watching, regardless of how many are actually active. At scale, this becomes expensive.

select() O(n) scan — checking all 1024 bits for 3 ready fds


poll: Slightly Less Wrong

poll arrived as a POSIX standardization of the same concept, without the 1024 limit. Instead of a bitmask, you pass an array of struct pollfd structures:

struct pollfd {
    int   fd;       // the file descriptor
    short events;   // what you're interested in
    short revents;  // what actually happened (filled by kernel)
};

int poll(struct pollfd *fds, nfds_t nfds, int timeout);
Enter fullscreen mode Exit fullscreen mode

No arbitrary limit on the number of fds. Better event granularity. Same fundamental problem: you still rebuild the entire array on every call, pass it to the kernel, and the kernel still has to walk every entry to check what's ready.

poll is O(n) in the same way select is. It fixed the 1024 limit and cleaned up the API, but it didn't fix the performance cliff at high connection counts.

Both select and poll have another problem: every call copies the entire list of descriptors from user space to kernel space. For 100,000 connections, that's 100,000 copies of a struct on every call, in a tight loop. The data movement alone becomes your bottleneck — an O(n) copy cost layered on top of the O(n) scan.


epoll: The Linux Answer (And Why Linux Won)

epoll landed in Linux 2.5.44 in 2002. It rethinks the whole interface.

Instead of passing the full list of descriptors on every call, you create an epoll instance — a kernel-managed data structure that persists between calls — and add file descriptors to it once. Then you just ask "what's ready?", and the kernel has the state it needs already.

Three syscalls:

// Create the epoll instance
int epfd = epoll_create1(0);

// Register a file descriptor with it (once, not every loop)
struct epoll_event ev = { .events = EPOLLIN, .data.fd = sockfd };
epoll_ctl(epfd, EPOLL_CTL_ADD, sockfd, &ev);

// Wait for events (this is the blocking call in your event loop)
int n = epoll_wait(epfd, events, MAX_EVENTS, timeout_ms);
Enter fullscreen mode Exit fullscreen mode

The key insight: epoll_wait returns only the descriptors that are actually ready. If you're watching 100,000 connections and 3 have data, epoll_wait returns 3. You process 3. The kernel doesn't enumerate the other 99,997.

epoll is O(1) for the waiting — adding and removing descriptors is O(log n) once, not O(n) per call (epoll stores its interest set in a red-black tree). The difference between select and epoll at 100,000 connections is the difference between 100,000 operations per loop iteration and approximately 3.

This is why Node.js became credible at high connection counts. Node runs on libuv, which uses epoll on Linux. One thread, one event loop, one epoll_wait call, and the kernel does the heavy lifting.

select/poll vs epoll — copy everything vs return only what's ready


kqueue: BSD Did It Too

If you're on macOS or FreeBSD, the equivalent is kqueue, which appeared in FreeBSD 4.1 in 2000 — actually two years before epoll. Different API, same idea: persistent kernel state, O(1) wakeups, batch event delivery.

int kq = kqueue();

struct kevent change = {
    .ident  = sockfd,
    .filter = EVFILT_READ,
    .flags  = EV_ADD | EV_ENABLE,
};
kevent(kq, &change, 1, NULL, 0, NULL);

// Wait for events
struct kevent events[MAX_EVENTS];
int n = kevent(kq, NULL, 0, events, MAX_EVENTS, NULL);
Enter fullscreen mode Exit fullscreen mode

kqueue is more elegant than epoll — a single syscall handles both registration and waiting — and it watches more than just file descriptors. You can use the same interface to watch for process exits, signals, timers, and file system changes. The event model is unified. But it's BSD-only, so it doesn't run on Linux.

This is the proliferation problem that every cross-platform runtime faces. You want epoll on Linux, kqueue on macOS/BSD, and IOCP on Windows (which is a completely different model). The next section is about how each major runtime solves this.


O_NONBLOCK: The Prerequisite Nobody Explains

Here's the part that trips people up. Having an efficient waiting mechanism isn't enough. The individual I/O operations themselves have to be non-blocking, or the whole thing falls apart.

By default, read() blocks. If you call read() on a socket with no data available, your thread sleeps until data arrives. epoll told you the socket was ready, so in normal operation this doesn't happen — but edge cases and bugs can still get you here. More importantly, some operations — connect(), write() on a full buffer — can block even when epoll says you're good to go.

O_NONBLOCK is a flag you set on a file descriptor to change this behavior:

int flags = fcntl(fd, F_GETFL, 0);
fcntl(fd, F_SETFL, flags | O_NONBLOCK);
Enter fullscreen mode Exit fullscreen mode

With O_NONBLOCK set, I/O operations never block. If a read() would have blocked (no data available), it returns immediately with EAGAIN or EWOULDBLOCK. If a write() would have blocked (send buffer full), same thing.

Your code then handles EAGAIN by going back to the event loop — "okay, nothing ready, I'll wait for epoll to tell me when to try again."

Without O_NONBLOCK, even a single blocking operation inside your event loop stalls everything. The whole async model breaks down. This is where that "buried synchronous call" bug comes from: some dependency opens a file descriptor, forgets to set O_NONBLOCK, calls read(), and your entire event loop freezes while it waits.

"Use async all the way down" isn't style advice. It's a correctness requirement. Call time.sleep(1) inside an event loop and you've blocked the only thread — no callbacks fire, no connections are served, the whole system goes dark for one second. Python's asyncio.sleep exists for exactly this reason: it yields back to the event loop instead of blocking the thread. Same reason you never call requests.get in async code — it's a blocking HTTP client that holds the thread hostage while it waits for a response. Use aiohttp or httpx instead.


How the Frameworks Plug In

Let's trace how the high-level abstractions land on these syscalls.

Node.js / libuv

libuv runs a loop: check timers, check I/O callbacks, call epoll_wait (Linux) or kqueue (macOS) with whatever timeout makes sense given pending timers. When it wakes up, it dispatches callbacks. Your await fetch(url) eventually becomes a socket, which gets registered with epoll, which wakes up libuv, which calls your callback, which resolves the Promise. The "event loop" you've heard about? It's this loop.

Python asyncio

Same pattern, different language. asyncio's default SelectorEventLoop uses Python's selectors module, which picks the best available backend for the platform — epoll on Linux, kqueue on macOS, select as a last resort. On Linux, you're already on epoll out of the box. uvloop wraps libuv for even better performance, but the default is not as slow as people assume. Windows gets ProactorEventLoop backed by IOCP. The await keyword doesn't make I/O async — the underlying selector does. await just lets the event loop know your coroutine is willing to pause.

Tokio (Rust)

Tokio uses mio, a thin safe wrapper around epoll/kqueue/IOCP. Its async runtime is more explicit about what it's doing than Node.js — you can see the reactor and the executor as separate components. The reactor watches file descriptors; the executor schedules tasks. An .await in Tokio suspends a task and hands control back to the executor, which runs other tasks until the reactor reports that the descriptor is ready. Then the task is rescheduled.

Go goroutines

Go's runtime is the most opaque of the four. You write blocking-looking code — conn.Read(buf) — and the runtime makes it non-blocking behind your back. When a goroutine would block on I/O, the runtime parks the goroutine, registers the descriptor with the network poller (which uses epoll on Linux), and continues running other goroutines on the same OS thread. When the data arrives, the poller wakes up the parked goroutine. From your perspective, it blocked. In reality, epoll_wait was called and the goroutine was context-switched away.

That's why Go can have a million goroutines — they're not OS threads, and they don't block OS threads when they wait on I/O.

Four runtimes, same primitives — epoll_wait, kevent, IOCP


What The Kernel Is Actually Doing

It's worth briefly understanding how epoll does its job efficiently.

When you register a file descriptor with epoll_ctl, the kernel attaches a callback to the descriptor's wait queue. This is a list of sleeping tasks that should be woken up when the descriptor becomes ready.

When a packet arrives, it follows a specific path: the NIC triggers a hardware interrupt, which causes the CPU to run the network driver's interrupt handler, which feeds the packet up through the kernel's network stack, which places data in the socket's receive buffer. At that point, the socket's wait queue callbacks fire — including the one epoll registered. The callback adds the descriptor to epoll's ready list.

When epoll_wait returns, it's just reading off that ready list. No scanning. No iteration over your 100,000 connections. The work was done when the packet arrived, not when you asked.

That's why EAGAIN isn't an error. When O_NONBLOCK is set and read() returns -1 with errno == EAGAIN, the socket is politely saying "nothing ready yet." Your event loop re-registers with epoll and waits. Every async I/O library handles this for you, which is why you've probably never seen it directly.


The Same Kernel, The Same Primitives

The abstractions you use every day — async/await, goroutines, green threads, futures — are different UX choices on top of the same three or four kernel primitives. The kernel API hasn't changed much since epoll landed in 2002. What's changed is how well we've wrapped it.

Node's innovation wasn't non-blocking I/O. The kernel had that. Node's innovation was making it the default — putting it in a single-threaded event loop and forcing the programming model to accommodate it. Python's asyncio brought the same model to a language that was threading-first. Rust's Tokio gave you the same power with compile-time correctness guarantees. Go hid the whole thing behind a familiar synchronous-looking syntax.

All roads lead to epoll_wait. The question is just how many layers of abstraction are between you and the call.


Further Reading

  • man 7 epoll — the Linux epoll interface, with edge-triggered vs. level-triggered details (which we glossed over — it's worth reading)
  • man 2 select, man 2 poll — the originals, for historical grounding
  • The C10K Problem — Dan Kegel's 2001 writeup that defined the problem and surveyed every solution available at the time. A historical artifact and still essential reading.
  • libuv design overview — how Node's I/O layer works, with diagrams
  • Tokio internals — the Tokio scheduler design post, remarkably readable

I'm writing a book about what makes developers irreplaceable in the age of AI. Join the early access list →


Naz Quadri once blocked the entire event loop with a single time.sleep(0.1) and spent two hours blaming the network. He blogs at nazquadri.dev. Rabbit holes all the way down 🐇🕳️.

Top comments (0)