Naz Quadri

Posted on Mar 31 • Originally published at nazquadri.dev

File Descriptors: The Numbers Behind Everything

#linux #programming #systems #tutorial

File Descriptors: The Numbers Behind Everything

The Integers That Run Your System

Reading time: ~13 minutes

You called open("config.toml") and got back the number 3.

Not a file handle. Not a stream object. Not a path. A small integer. The language runtime probably wrapped it in something friendlier — a File object, a BufferedReader, an io.TextIOWrapper — but that wrapping happens after the kernel gave you a number. The number is the real thing.

That number is a file descriptor, and it's the single abstraction that holds together files, sockets, pipes, terminals, timers, signals, and /dev/null. They are all just integers pointing into a kernel table. Understanding that table — and what the kernel does with it — will explain a dozen things that have probably confused you at some point.

The Three That Are Already There

Every Unix process starts with three file descriptors already open.

stdin is 0. stdout is 1. stderr is 2.

You didn't open them. They were inherited from your parent process, which inherited them from its parent, all the way back up the chain to whatever launched the first process when the system booted. They've been there the whole time.

When you write print("hello") in Python, the runtime writes to file descriptor 1. But what does that actually mean? Let's peel it back:

# The version you write
print("hello")

# What Python actually does (simplified)
import sys
sys.stdout.write("hello\n")

# What sys.stdout.write actually does
import os
os.write(1, b"hello\n")

# What os.write(1, ...) actually does: the write(2) syscall
# write(1, "hello\n", 6)  →  kernel writes 6 bytes to whatever fd 1 points at

That last line is where the abstraction ends and the kernel takes over. File descriptor 1. Six bytes. That's it. print() is four layers of wrapping around "write these bytes to integer 1."

You can prove this to yourself. Close stdout and reopen it by hand:

import os

os.close(1)                              # slam stdout shut
fd = os.open("/dev/tty", os.O_WRONLY)    # reopen the terminal — gets fd 1 (lowest available)
os.write(fd, b"I'm back\n")             # write directly to the fd
# prints: I'm back

That works because /dev/tty is your controlling terminal, and os.open() returns the lowest available file descriptor — which is 1, because you just closed it. You've manually reconstructed stdout.

When ls decides whether to use color, it calls isatty(1) — "is file descriptor 1 connected to a terminal?" When a program crashes and prints an error message, it writes to 2. When read() blocks waiting for keyboard input, it's blocking on 0.

The numbers 0, 1, and 2 aren't conventions from a standard library. They're baked into the kernel ABI. Every Unix program on the planet agrees on them.

The Table Behind the Numbers

When you call open(), the kernel doesn't give you a random integer. It gives you the lowest available slot in a per-process table.

That per-process table holds references to file description objects in the kernel — note the singular, not "descriptor." (This is a real POSIX term, not a typo.) The file descriptor is the index. The file description is the actual thing: an open file object with a current position, a set of flags, a reference count, and a pointer to whatever the underlying thing actually is.

This distinction matters. Two file descriptors can point to the same file description. That's what dup() does — it copies the table entry, so both integers refer to the same underlying object. Same position. Same flags. One write() through either fd advances the position for both.

It's Not Just Files

The name "file descriptor" is a historical lie. Or at least a historical simplification.

File descriptors aren't for files. They're for anything the kernel wants to expose as a readable/writable thing. The same integer that points to a regular file on disk might instead point to:

A socket — the kernel TCP/IP stack, waiting for bytes from the network
A pipe — a shared kernel buffer with one write end and one read end
A PTY — the pseudo-terminal covered in coming soon
A /dev/null — a kernel sink that accepts all writes and returns EOF on reads
An epoll instance — an event-watching mechanism that is itself a file descriptor
A timerfd — a timer that becomes readable when it fires
A signalfd — signals delivered as readable bytes instead of asynchronous interrupts (covered in Signals)
A memfd — anonymous memory that lives in RAM and has no path on disk

The genius of this design, and the reason Dennis Ritchie and Ken Thompson (the same Thompson who later co-designed UTF-8 — the man keeps showing up) made it a core abstraction in Unix in the early 1970s (Version 1–4 Unix, 1971–1973), is uniformity. You don't need to learn ten different APIs for ten different kernel resources. You call read(), write(), poll(), close() — and those calls work on all of them.

Your web server's event loop calling epoll_wait() on a socket is doing the exact same thing as a script calling read() on a file. Different underlying objects, same interface.

What `dup2` Actually Does

You've done shell redirection. ./program > output.txt. ./program 2>&1. ./program < input.txt. You know what it is, but have you thought about how it works? Oh and the 2>&1, for the longest time this was just an incantation I had to memorize, it was 50/50 if I'd write 2&>1

The shell does it with dup2. It's one of the most important system calls you've never had to call directly.

dup2(oldfd, newfd) says: "make file descriptor newfd point to whatever oldfd is pointing at." If newfd is already open, close it first. Atomically.

The shell implements ./program > output.txt with these three lines, in the child process after fork() but before exec():

int outfile = open("output.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644);
// outfile is probably 3 — lowest available slot

dup2(outfile, 1);   // make fd 1, stdout, point to output.txt
close(outfile);     // don't need fd 3 anymore

// now exec() the program
// it will inherit fd 1 pointing to output.txt
// it has no idea it's not a terminal
execvp("program", argv);

After dup2(outfile, 1), file descriptor 1 — stdout — points to the file. The program calls printf(), the runtime writes to fd 1, and the bytes go to disk. The program never knew.

That's why ./program > output.txt works for every program — not just ones that know about redirection. The program writes to fd 1. It doesn't know what fd 1 is. The parent set it up before exec().

And 2>&1? Now that you know file descriptors, read it literally: "make fd 2 point to where fd 1 points." That's dup2(1, 2) — "make stderr point to whatever stdout is pointing at." The & isn't some magic shell operator. It's saying "this is a file descriptor number, not a filename called 1." Without the &, 2>1 would redirect stderr to a file named 1. With the &, it redirects to fd 1. That's the whole mystery. Years of memorizing an incantation, and it was just "duplicate this fd."

And 2&>1? The version I kept accidentally writing? That's not a thing. Bash silently parses it as something else entirely. No error. Wrong behavior. The kind of bug you debug for an hour before you spot the character order.

Redirection lives not in the shell, not in the program, but in a three-line manipulation of the file descriptor table between fork() and exec(). The child process inherits whatever table the parent set up.

The Inheritance Problem

Here is the bug you've hit, or you will hit.

You open a database connection. Internally that creates a socket, which is fd 7. Your server runs fine. Then you fork() to spawn a child process — maybe a CGI handler, maybe a subprocess to run some external tool.

The child inherits fd 7. The kernel's reference count on the socket goes up. The child runs, finishes, exits. The socket's reference count goes down. But your server still has it open, so that's fine.

Except now you have ten workers. Each one forked. Each one inherited fd 7. Each one's copy is still open. If a network hiccup causes the server to close the socket and reconnect — it opens a new socket on fd 7, but the old fd 7 is still referenced by the worker processes. The old connection doesn't fully close. The database server sees it as still alive.

This is the kind of bug that only appears under load, only happens with specific shutdown sequences, and takes a long time to find.

The fix is a flag that's been in the kernel since the 1980s: close-on-exec.

Close-On-Exec: The Flag Everyone Forgets

When you mark a file descriptor with the FD_CLOEXEC flag, the kernel closes it automatically when the process calls exec(). Not when fork() happens — that still copies the table. But when the child replaces itself with a new program via exec(), all close-on-exec file descriptors are gone.

import socket, os

s = socket.socket()
# set close-on-exec
flags = fcntl.fcntl(s.fileno(), fcntl.F_GETFD)
fcntl.fcntl(s.fileno(), fcntl.F_SETFD, flags | fcntl.FD_CLOEXEC)

# or, the modern way: open with O_CLOEXEC from the start
fd = os.open("file", os.O_RDONLY | os.O_CLOEXEC)

Modern languages and runtimes set FD_CLOEXEC by default on most things they open — Python's socket.socket() does it (since Python 3.4, per PEP 446), Rust's std::fs::File does it, Go does it (though syscall.Open has caveats). But "most things" isn't "all things," and the cases where it doesn't happen tend to be the surprising ones: PTY file descriptors, custom socket creation, file descriptors inherited across a fork() you didn't expect.

That's why your child process doesn't inherit your database connection — when the parent ran with a clean environment, everything got close-on-exec, and exec() cleaned up automatically. When you hit the bug, something slipped through.

`/proc/self/fd` — The Directory That Shows You Everything

Linux keeps a live view of your process's open file descriptors in the filesystem.

ls -la /proc/self/fd

Run that in a shell. You'll see something like:

lrwxrwxrwx  0 -> /dev/pts/2
lrwxrwxrwx  1 -> /dev/pts/2
lrwxrwxrwx  2 -> /dev/pts/2
lrwxrwxrwx  10 -> /dev/pts/2
lrwxrwxrwx  255 -> /dev/pts/2

Each entry is a symlink. The link target tells you exactly what the file descriptor points to: a path on disk, a socket described as socket:[12345], a pipe as pipe:[67890], a PTY device. The number after the colon in brackets is the kernel's inode number for the internal object.

This is the layer below "I opened a file." /proc/self/fd/3 is what 3 actually is.

Run it on a running server process with ls -la /proc/<pid>/fd and you'll see every connection it has open. This is also where you find file descriptor leaks — a process that should have 20 fds open but has 20,000 is leaking something, and /proc/<pid>/fd will show you exactly what.

I once debugged a production memory leak by noticing that /proc/<pid>/fd had several hundred entries all pointing to the same log file. A log rotation handler had been closing the old file path but not the fd, and each rotation was leaving a leaked descriptor behind. Six months of rotation, hundreds of leaked fds, all pointing into a deleted file that the kernel kept alive because the reference count was nonzero.

That's the other thing about the reference count: a file you delete from the filesystem stays alive until the last file descriptor pointing to it is closed. The directory entry is gone, ls can't find it, but the kernel still has the file, and anything holding an open fd can still read and write it.

While you're in `/proc` — the whole process is in there

/proc/<pid>/fd is just one directory. The kernel exposes the entire process state through /proc/<pid>/. Drop this function in your .bashrc and you'll never debug a process blind again:

# Dump everything useful about a running process
proc-inspect() {
    local pid=${1:?usage: proc-inspect <pid>}
    [[ -d /proc/$pid ]] || { echo "No such process: $pid"; return 1; }

    echo "=== Binary ==="
    readlink /proc/$pid/exe

    echo -e "\n=== Command Line ==="
    cat /proc/$pid/cmdline | tr '\0' ' '; echo

    echo -e "\n=== Working Directory ==="
    readlink /proc/$pid/cwd

    echo -e "\n=== Owner ==="
    grep -E '^(Uid|Gid|Groups)' /proc/$pid/status

    echo -e "\n=== Open FDs ==="
    ls -la /proc/$pid/fd 2>/dev/null | tail -20

    echo -e "\n=== Environment ==="
    cat /proc/$pid/environ 2>/dev/null | tr '\0' '\n' | sort
}

proc-inspect $$ inspects your own shell. proc-inspect $(pgrep -f nginx) tells you exactly what config nginx loaded, what user it's running as, what directory it thinks it's in, and every fd it has open. No guessing — the kernel recorded everything.

The environ section is particularly useful when debugging "it works on my machine" issues — you can see exactly what PATH, LD_LIBRARY_PATH, DATABASE_URL, or any other variable looked like when the process started. And cmdline has caught me more than once — a process you thought was running with --config /etc/app/prod.conf turns out to be running with --config /tmp/test.conf from three deploys ago.

What `lsof` Is Actually Doing

The lsof command — "list open files" — is largely a tool for reading /proc/<pid>/fd and /proc/<pid>/fdinfo and making the output human-readable.

lsof -i :8080 finds the process listening on port 8080 by scanning every process's file descriptors, looking for sockets, and cross-referencing the socket's port with TCP/UDP tables in /proc/net/.

lsof +D /some/directory finds every process with an open file descriptor pointing inside that directory. It's how you figure out why umount is telling you "device is busy" — something has a file open in there.

There's nothing magical about lsof. It's a very thorough /proc reader. The information it shows you was always there, in those tables, you just didn't know where to look.

The Limit You'll Eventually Hit

Every process can only have so many file descriptors open at once.

The default on most Linux systems is 1024 per process. You can check it:

ulimit -n        # soft limit (enforced)
ulimit -Hn       # hard limit (ceiling for the soft limit)
cat /proc/sys/fs/file-max   # system-wide limit across all processes

If you write a program that opens a lot of connections — a proxy, a load balancer, anything doing lots of concurrent I/O — you'll hit this. The kernel returns EMFILE ("too many open files"), and if you're not checking errors correctly, the program starts doing strange things. Most languages throw an exception. Some, in older codebases, silently swallow the error and the program limps along in a broken state.

The fix is ulimit -n 65536 before starting the process, or setting LimitNOFILE in your systemd unit file. High-performance servers typically run at 65536 or higher.

This is why you occasionally see "too many open files" in production logs and nobody can reproduce it locally — your production server is handling ten times more connections, and it hit the limit that local testing never approaches.

`epoll` — Watching Many at Once

The file descriptor model wouldn't be as powerful without the ability to watch many of them simultaneously.

The original call is select(): give a list of file descriptors, tell when any of them become readable or writable. It works, but it copies the list from userspace to kernel space on every call, and the list is bounded by a hard-coded constant (FD_SETSIZE, usually 1024). poll() improved on this: no arbitrary limit, slightly cleaner API. But it still copies the entire list every time.

epoll is the Linux answer to doing this efficiently at scale. You create an epoll instance (which is itself a file descriptor — 🐢 🐢 🐢 ...), then epoll_ctl() to add or remove descriptors from it, and epoll_wait() to block until something is ready. The kernel maintains the watch list. It tells you exactly which file descriptors are ready, not "here's the full list, go check which ones fired."

Your Python asyncio event loop, your Node.js event loop, your Rust tokio runtime — they all sit on top of epoll (or kqueue on macOS). The reactor at the core of every modern async runtime is a call to epoll_wait() in a loop, dispatching callbacks to whatever registered interest in each file descriptor. Async I/O isn't magic — it's a tight loop asking the kernel "what's ready now?"

The One-Sentence Version

Sockets, pipes, files, terminals, timers, signals — the kernel turns all of them into file descriptors because then every tool ever built for working with file descriptors works with all of them.

cat, read(), write(), poll(), sendfile(), splice(). Anything that knows how to work with a file descriptor can work with a socket. Anything that can work with a socket can work with a pipe. That uniformity isn't accidental — it's the design.

Ritchie and Thompson bet that a handful of abstractions, composed uniformly, would outlast any number of specialised interfaces. That was fifty years ago. And damn were those gents right on the money.

In the next post, we trace what happens when you actually use one of those file descriptors to read a file — through the kernel VFS, the page cache, an NVMe controller that is itself a complete computer, and DMA hardware that moves your data without the CPU touching a single byte.

Quick Recap

A file descriptor is an index into a per-process kernel table. The table holds references to file descriptions — open objects with position, flags, and a pointer to the underlying resource.
The underlying resource can be anything: file, socket, pipe, PTY, epoll instance, timer.
dup2(oldfd, newfd) is how shell redirection works — repoint stdout before exec(), and the child never knows.
FD_CLOEXEC closes file descriptors automatically when the process execs, preventing accidental inheritance.
/proc/self/fd shows you every open file descriptor as a symlink to whatever it actually points at.
A deleted file stays alive until all file descriptors pointing to it are closed.
epoll watches many file descriptors at once without copying the list — it's the foundation of every async runtime.

DEV Community

File Descriptors: The Numbers Behind Everything

File Descriptors: The Numbers Behind Everything

The Integers That Run Your System

The Three That Are Already There

The Table Behind the Numbers

It's Not Just Files

What `dup2` Actually Does

The Inheritance Problem

Close-On-Exec: The Flag Everyone Forgets

`/proc/self/fd` — The Directory That Shows You Everything

While you're in `/proc` — the whole process is in there

What `lsof` Is Actually Doing

The Limit You'll Eventually Hit

`epoll` — Watching Many at Once

The One-Sentence Version

Quick Recap

Further Reading

Top comments (0)

File Descriptors: The Numbers Behind Everything

The Integers That Run Your System

The Three That Are Already There

The Table Behind the Numbers

It's Not Just Files

What dup2 Actually Does

The Inheritance Problem

Close-On-Exec: The Flag Everyone Forgets

/proc/self/fd — The Directory That Shows You Everything

While you're in /proc — the whole process is in there

What lsof Is Actually Doing

The Limit You'll Eventually Hit

epoll — Watching Many at Once

The One-Sentence Version

Quick Recap

Further Reading

What `dup2` Actually Does

`/proc/self/fd` — The Directory That Shows You Everything

While you're in `/proc` — the whole process is in there

What `lsof` Is Actually Doing

`epoll` — Watching Many at Once