OS Fundamentals 101: Process and Syscalls

#linux #process #os #syscall

Hola! Let's dive into the world of operating systems. Today, we will discuss processes and syscalls.

Process

First, let's quote the bookish answer.

A process is a program in execution.

When we want to execute any program, the OS creates an entity called a process. A process is a kind of container to hold everything needed by the computer to execute it. Each process is given its own address space, a list of memory locations from 0 to some maximum depending on available physical & virtual memory, which the process can read and write to. The address space contains the executable code of the program (mostly assembly), the program's data, and its stack. A process is also associated with allocated resources like registers (for things like program counter, stack counter, etc.), file descriptors, list of related processes, etc. All this information of the process is stored by the OS in the process table.

Address Space

The address space is a map of memory locations from 0 to some max value to actual physical locations on the RAM. When the OS creates a process it allocates an address space where it first stores the location of OS procedures in the beginning, as the program will need them to talk to the Kernel (read-only). Next, it reads the executable code and stores it (read-only), followed by the data segment which contains things like the globals and is both read and writable by the process. Then, the remaining space is divided by heap and stack. The heap is allocated after the data segment whereas the stack is allocated at the end. So the heap grows upwards whereas the stack grows downwards with free space in between.

This address space is also referred to as the core-image.

System Calls

System calls or syscalls is an interface provided by the OS to allow user-space code to talk to the Kernel and instruct it to do specific tasks.

Let's take an example of reading a file. If a process is running in user-mode and it wants to read a file from hard disk which is a system service (takes place in kernel-mode), it issues a read system call. How? To make a system call, the process first sets a few parameters for the system call, then calls the read procedure call which sets the code for read in the register and then issues a trap command, which causes the OS to trap the control to the Kernel. The Kernel then reads the code and figures out the syscall to be executed, and dispatches it to Sys call handler which executes it, and then the control flows back to user-space process.

Learning to draw these kinds of diagrams, pls bear with this one 😜.

Parent/Child Process

Let's take an example where we open a shell and type "htop" and press enter.

Firstly, we only have the sh (shell) process in the process table.

After we type "htop" and press enter, the shell process uses a syscall fork() to create a copy of itself. Fork also copies the registers, fds (file descriptors), address space, etc. This creates a new shell process in the process table which is a child of the shell that used the fork command (the parent).

Fork creates a new process while copying the resources of the parent process. It is to be noted that the address space content is copied and not the locations, i.e., changing any value in the child process won't affect the parent process. Some systems may not copy all the content and rather implement COW (Copy-On-Write) but that's a different topic altogether.

Now, this forked shell process searches the path of this command and if found, it calls the exec() syscall with the user-entered command as the argument. Exec now takes the executable code of "htop" and puts it into the memory replacing the shell code. In a nutshell, it replaces the shell with htop while maintaining the context of the existing process, that is the data remains, address space remains, and it even maintains the PID (Process Identifier).

Now, htop does its thing and after it exists, the entry is removed from the process table, all the resources are released and the parent shell wakes up.

Zombie/Orphan Process

A zombie process is a process that has completed its execution but is still present in the process table. This happens when the parent doesn't "reap" the process. This generally happens when the parent process doesn't call waitpid() syscall to wait for the child to complete its execution, and read the exit status of the child. During the waitpid() call the OS also removes the entry of this child from the process table.

When a child process dies, the parent receives a SIGCHLD signal.

Kill command doesn't work on the zombie process.

An orphan process is a process that is running even when the parent process has terminated. This may happen if the parent process doesn't wait() for the child to finish after fork()ing or system crashes.

A zombie process has completed execution but its parent is still running, whereas in the case of an orphan process, the parent has already exited but the child is still running.

A zombie process doesn't hold any resources except for the data needed to store its entry process table, whereas an orphan process is still running and holds the resources.

Let's end here for today. In the next article let's discuss storage and filesystems.

Please comment below suggesting any changes, asking for any topic, or just hanging out in general. Also, pls reach out to me on my social channels.

[GitHub] [LinkedIn] [Instagram] [YouTube]