Ernesto Enriquez

Posted on Mar 18

Lifecycle of a process

#linux #programming #assembly #c

With Linux version 7 just around the corner, I thought it would be interesting to trace process creation in as much painstaking detail as one weekend and 6 cups of coffee would allow.

Namely, I want to answer the question of what exactly happens when we open a new browser tab or start our favorite video game. I’ll be focusing on processes spawned by other processes¹ through libc, since the C standard library is about as close as we're going to get to the bedrock of the very universe. If you use CPython, this mechanism applies as well.

We'll be taking the following journey together:

I like to think of this as expanding on Thorsten Ball’s “Where did fork go?”. In his post, Thorsten gives insight into what happens when you fork a process. It’s a great read, you should check it out.

The mechanism I’ll be describing also includes processes started by a programmer. Indeed, this is what most people see when they use a computer to create some application. However, under the hood there is always a process required to start another process, such as a shell (e.g, ./<process> on bash) or an init system (e.g, creating a service with Systemd).

Simple.c

Consider a C program whose only purpose in life is to spawn a child process (id est, it passes butter).
Why so simple? I figured this program would impose the least cognitive load. I do this for your understanding. I don’t think you’re stupid. I think you’re smart, and beautiful, and precious, and worth letting merge into my lane during rush hour.

// Simple.c
#include <unistd.h>

int main() {
  fork();
  return 0;
}

Before continuing, I recommend reading the fork man page, at least up to the errors section. It’s a quite short read, and If I recall correctly, there was a Harvard health study that claimed that programmers that read linux man pages are 78% less likely to develop carpal tunnel [//TODO: citation needed]

The tldr is that fork() is a function in libc² that spawns a child process from the parent process that called it. This child is almost identical to the parent.
Like the parent, the child will capture the return value from the fork() and execute any code after fork().

Unlike the parent, however, the child will not execute fork() or any code before it. Moreover, the child’s fork() will return a different value, the child will have a different process ID than the parent, and a couple other things you’d know if you read the manual.

Fork is a system call (not to be confused with fork(), the libc function). System calls are services the linux kernel provides to processes in user space. You can also think of system calls as an API provided by the Kernel. The API contract in this sense includes things like what values are expected in what registers and what assembly instruction (e.g, int 0x80, syscall, etc.) to execute. The Standard C Library (libc) also provides an API that simplifies working with system calls, in the form of functions that wrap the Kernel’s system call API.

You could in theory provision these system call services directly from the Kernel, bypassing libc completely. If you’re particularly keen on managing thread safety, thinking about register values, and programming in assembly then please, by all means, knock yourself out.

Now here’s the kicker: fork(), the libc function, does not call Fork, the system call. Rather, fork() eventually calls Clone, yet another system call.

Why is this?

Let’s dive into fork() and see what we find.

A journey of a thousand miles

// unistd.h
/* Clone the calling process, creating an exact copy.
   Return -1 for errors, 0 to the new process,
   and the process ID of the new process to the old process.  */
extern __pid_t fork (void) __THROWNL;

This is the prototype for the fork() function. If you’re on linux (congrats!) the location of this file is typically found in

/usr/include/unistd.h

The implementation of this peculiar little function takes up just over 140 lines of code at the time of this writing. You won't find said implementation in your file system. By the time you use it, it’s already a shared object file, ready to be linked to your lovely programs at runtime. This object file in question is libc.so, found in /usr/lib/.

Check out the fork.c file in Posix standard library in glibc if you want to take a look at the implementation.

The first thing you might notice is that fork.c does not implement:

__pid_t fork (void)

Towards the bottom of the file you’ll find find:

weak_alias (__libc_fork, fork)

Okay, so calling the fork() function in your code actually executes:

pid_t __libc_fork (void)

You’ll also notice inside the __libc_fork function that line 75 makes a call to _Fork().

pid_t pid = _Fork ();

So fork() in your code actually runs __libc_fork(), whose job is (among other things) to handle threading and eventually call _Fork().

Funny enough, _Fork() actually exists in a few places within the standard library. During buildtime, the _Fork() you execute is decided by your machine’s OS and CPU³.

We’ll be focusing on the _Fork.c file within sysdeps/nptl. Here, we see a call is made to:

pid_t pid = arch_fork (&THREAD_SELF->tid);

arch_fork() is defined in its own header file, and here is where we start to see how the sausage is made.

/* Call the clone syscall with fork semantic.  The CTID address is used
   to store the child thread ID at its location, to erase it in child memory
   when the child exits, and do a wakeup on the futex at that address.

   The architecture with non-default kernel abi semantic should correctly
   override it with one of the supported calling convention (check generic
   kernel-features.h for the clone abi variants).  */
static inline pid_t
arch_fork (void *ctid)
{
  const int flags = CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD;
  long int ret;
#ifdef __ASSUME_CLONE_BACKWARDS
# ifdef INLINE_CLONE_SYSCALL
  ret = INLINE_CLONE_SYSCALL (flags, 0, NULL, 0, ctid);
# else
  ret = INLINE_SYSCALL_CALL (clone, flags, 0, NULL, 0, ctid);
# endif
#elif defined(__ASSUME_CLONE_BACKWARDS2)
  ret = INLINE_SYSCALL_CALL (clone, 0, flags, NULL, ctid, 0);
#elif defined(__ASSUME_CLONE_BACKWARDS3)
  ret = INLINE_SYSCALL_CALL (clone, flags, 0, 0, NULL, ctid, 0);
#elif defined(__ASSUME_CLONE_DEFAULT)
  ret = INLINE_SYSCALL_CALL (clone, flags, 0, NULL, ctid, 0);
#else
# error "Undefined clone variant"
#endif
  return ret;
}

Remember what I was saying about fork() calling Clone? Well, behold!

Clone is used since it’s a more versatile system call. You can read more about their differences here.

At this point it seems what lies before us is a maze of macros. I encourage you to try making your way through, starting at the arch_fork function. See if you can find your way to the assembly. If we keep following this thread, starting at INLINE_SYSCALL_CALL, you’ll eventually arrive at the internal_syscall0, internal_syscall1, …, all the way up to internal_syscall6.

The clone system call has 5 arguments, so we’re dealing with this guy

#undef internal_syscall5
#define internal_syscall5(number, arg1, arg2, arg3, arg4, arg5) \
({                                  \
    unsigned long int resultvar;                    \
    TYPEFY (arg5, __arg5) = ARGIFY (arg5);              \
    TYPEFY (arg4, __arg4) = ARGIFY (arg4);              \
    TYPEFY (arg3, __arg3) = ARGIFY (arg3);              \
    TYPEFY (arg2, __arg2) = ARGIFY (arg2);              \
    TYPEFY (arg1, __arg1) = ARGIFY (arg1);              \
    register TYPEFY (arg5, _a5) asm ("r8") = __arg5;            \
    register TYPEFY (arg4, _a4) asm ("r10") = __arg4;           \
    register TYPEFY (arg3, _a3) asm ("rdx") = __arg3;           \
    register TYPEFY (arg2, _a2) asm ("rsi") = __arg2;           \
    register TYPEFY (arg1, _a1) asm ("rdi") = __arg1;           \
    asm volatile (                          \
    "syscall\n\t"                           \
    : "=a" (resultvar)                          \
    : "0" (number), "r" (_a1), "r" (_a2), "r" (_a3), "r" (_a4),     \
      "r" (_a5)                             \
    : "memory", REGISTERS_CLOBBERED_BY_SYSCALL);            \
    (long int) resultvar;                       \
})

There you have it. We’re loading up our registers and executing the syscall instruction directly in assembly.

What a ride, huh?

I think that’s enough digging through our boy Stallman et al’s magnum opus. It’s a wonderful repository, to say the least, and I implore you to spend some time exploring it time forbid. I recommend starting with the table of contents in the manual, picking what seems interesting, and dig dig dig!

At this point, you might be asking: “Okay dude, but what does the assembly actually look like?”

Were you actually thinking that? What a nerd.

But fair enough, looking at this from a lower level of abstraction should help solidify what in Davey Jone’s locker is actually happening. Let’s look at some assembly.

A lower level of abstraction

gcc -S simple.c -o simple.S
cat simple.S

Outputs:

    .file    "simple.c"
    .text
    .globl    main
    .type    main, @function
main:
.LFB0:
    .cfi_startproc
    pushq    %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    call    fork@PLT        ; look at me, look at me, im mr meeseeks look at me!!
    movl    $0, %eax
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size    main, .-main
    .ident    "GCC: (GNU) 15.2.1 20260209"
    .section    .note.GNU-stack,"",@progbits

At runtime, when the call assembly instruction is executed, our CPU will start executing fork in libc. Where do we find fork? We can use the ldd command.

gcc simple.c -o simple
ldd simple

output:

    linux-vdso.so.1 (0x00007f7edfa0f000)
    libc.so.6 => /usr/lib/libc.so.6 (0x00007f7edf7f2000)
    /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f7edfa11000)

Awesome! Next, let’s disassemble the libc.so.6 object file and find exactly where the system call is made. Remember the _Fork() function? That should give us a clue of where to look. I’ll spare you the trouble of finding the precise location of the system call. This is what worked for me, feel free to tweak it to your heart’s content:

objdump -d /usr/lib/libc.so.6 | grep -m 1 "<_Fork@@GLIBC_2.34>" -A 56 | nl

output:

     1    00000000000e4840 <_Fork@@GLIBC_2.34>:
     2       e4840:    f3 0f 1e fa              endbr64
     3       e4844:    55                       push   %rbp
     …
    20       e4881:    b8 38 00 00 00           mov    $0x38,%eax
    21       e4886:    0f 05                    syscall
     …

See line 21? Lovely, isn't it?

How do we know that's the system call we're looking for? You can tell what system call a program makes by looking at the value in the EAX register right before the syscall assembly instruction (or int 0x80 on a 32-bit ISA).

In our case, we're moving the value 0x38 into our EAX register.

When the syscall assembly instruction is executed, a couple of things happen very, very quickly. The important bits are:

Read the value of some MSR register into the CS register for privilege escalation from user mode to kernel mode.
Save the instruction pointer in the RIP register to the RCX register for returning from our syscall.
Read the values of some MSR register into the RIP register to point to the kernel's virtual address where we’ll resume execution.

These are just the highlights, if you want to learn more I recommend taking a look at your CPU manufacturer’s Software Developer Manual. I also recommend a dark roast triple shot espresso – you’ll need it. For my ISA, the manual is provided by intel.

That pretty much wraps things up! At this point you’ll find yourself executing code within the kernel itself. What code exactly depends on your architecture but for 64 bit systems on X86 you’ll want to take a look at the entry_64.S assembly file.

Here is that diagram again, should make a lot more sense now!

Kernel POV

For the curious, this is what things look like from a very, very high level on the kernel side:

From the entry point, you’re routed to the system call handler
From the system call handler you’re routed to the function for cloning your parent process: kernel_clone
From kernel_clone you’ll first clone your process and then ship it off to the scheduler.
The schedule will run the child process eventually.

Thanks for reading, and keep digging!

Another approach worth mentioning, but beyond our scope, are processes created through the OS at boot time (i.e, the idle process with PID 0 and the init system with PID 1). The mechanism behind these are quite interesting and if you’d like to learn more I recommend this excellent chapter from the Linux Insides online textbook. ↩
To be more precise, fork() is a function in the posix standard library, which is part of libc. Libc is part of glibc; also known as Gnu libc (or as I've recently taken to call it, GNU plus libc). ↩
Glibc has an algorithm for figuring out what code to compile for your system beforehand. It’s not too complicated. ↩

DEV Community

Lifecycle of a process

Simple.c

A journey of a thousand miles

A lower level of abstraction

Kernel POV

Top comments (0)