fork() and exec(): The Weird and Elegant Idea Behind Unix Process Creation

#linux #systems #operatingsys #c

At first glance, Unix’s fork() + exec() model feels… wrong.

Why would an operating system copy an entire process, only to immediately replace it with something else?

It seems wasteful, indirect, and unnecessarily complicated.

And yet, this design has survived for over 50 years, and still powers modern systems today.

In this post, we’ll explore what fork() and exec() actually do, why this design exists, and why it remains so hard to replace.

fork()

fork() is a system call in POSIX operating systems. It creates an almost exact replica of the process that calls it. The calling process is known as the parent, and the new one is the child.

fork() takes no arguments and returns the PID of the child to the parent and 0 to the child. This return value is used to determine whether a process is the parent or the child.

When fork() was first introduced, the parent and child ran in separate memory spaces with identical contents. This was expensive, especially for large processes with large memory footprints.

To optimize this, modern implementations of fork() use copy-on-write (COW). The parent and child have separate virtual address spaces, but initially they point to the same physical memory. Instead of copying all memory eagerly, the kernel duplicates pages only when one of the processes modifies them. This significantly reduces memory overhead.

exec()

The exec() family of functions replaces the current process image with a new one. It does not create a new process; it transforms the existing one.

In essence, it wipes the current address space and loads a new program into it. On success, it never returns.

fork() + exec() pattern

So what is the fork() + exec() pattern?

The parent calls fork() to create a child process
The child calls exec() to replace itself with a new program

This is where the design initially feels counterintuitive. We just duplicated the address space (or at least created new page tables in modern implementations) only to discard it immediately. Why not just create a new process directly?

This is where the genius of the design becomes clear.

The key idea behind the fork() + exec() pattern is the separation of process creation from program execution. By separating these two concerns, we gain a powerful capability: a configurable “gap” between creation and execution.

Why this matters

This “gap” allows you to modify the execution environment before calling exec(). For example:

file descriptors (via dup2)
working directory
user/group IDs
signal handlers

Here is what that looks like in C:

pid_t pid = fork();

if (pid == 0) {
    // We are inside the child process.
    // This is the "gap" where we can redirect I/O, change directories, etc.
    char *args[] = {"ls", NULL};
    execvp("ls", args); 

    // If exec succeeds, the code below here NEVER runs!
} else if (pid > 0) {
    // We are the parent. We wait for the child to finish executing.
    wait(NULL); 
}

If you've ever typed ls > output.txt in your terminal, you've used this gap. The shell calls fork(), the child process uses dup2() to overwrite its standard output with output.txt, and then it calls exec('ls'). The ls program has no idea its output is being redirected to a file, it just writes to standard output as usual. The shell sets the stage before the actor even arrives.

Another important benefit is inheritance as a composition mechanism. Instead of passing large configuration objects (looking at you, Windows 👀), processes inherit resources, making composition natural.

A key example is pipes. A pipe is a unidirectional data channel used for inter-process communication (IPC). It is extensively used in shells (via the | operator) to pass the output of one process as input to another.

Calling pipe() returns two file descriptors representing the ends of the pipe. Because file descriptors are inherited, each process simply closes the ends it doesn’t use and voilà IPC just works.

This design reflects the Unix philosophy:

small, composable primitives instead of one large, complex API

Instead of one complex interface, Unix provides two simple APIs:

one to copy a process
one to replace a process

This simplicity creates a flexible and easy-to-reason-about system.

Trade-offs

While the fork() + exec() pattern is an inspired design that still forms the backbone of modern systems, it does have trade-offs.

The most obvious one is the overhead of fork(). Even with COW, the kernel must create new page tables for the child process. This can be expensive, especially for large processes like servers or language runtimes. It can also introduce page faults when memory is modified.

A more subtle but significant drawback is its interaction with multi-threaded programs. When fork() is called, only the calling thread survives in the child process, while the state of other threads (such as locks) may remain. This can easily lead to deadlocks.

For this reason, in multi-threaded programs, fork() is almost always followed immediately by exec(). Anything more complex can lead to inconsistent state.

Additionally, this model can be inefficient for high-frequency process creation workloads. Modern systems often prefer threads, async/event loops, or worker pools for such cases. Because of this, alternatives like posix_spawn() exist.

Final thought

Even after 50+ years, the fork + exec model remains dominant. Not because it’s perfect, but because it offers a level of flexibility that is still hard to match.

If you enjoyed this, I’m currently diving deep into systems programming—building a Unix shell from scratch, implementing a custom malloc allocator, and more.

I’ll be sharing what I learn along the way, so feel free to follow along.

You can also check out the code for the shell project is here:

👉 https://github.com/IsbatBInHossain/ishell