DEV Community

Satoru Takeuchi
Satoru Takeuchi

Posted on • Updated on

How Linux Works: Chapter 3 Process Scheduler (Part 1)

In Chapter 2, we mentioned that most processes in the system are in the sleep state. So, how does the kernel make each process run on the CPU when there are multiple executable processes in the system? In this chapter, we will discuss the Linux kernel's process scheduler (hereinafter referred to as the scheduler), which is responsible for allocating CPU resources to processes.

In books about computer science, the scheduler is explained as follows:

  • Only one process can run at a time on a single logical CPU
  • It allows multiple executable processes to use the CPU in turn, in units called "time slices"

For example, if there are three processes, p0, p1, and p2, it would look like the following figure.

Image description

Let's confirm whether this is actually the case in Linux by conducting an experiment.

Elapsed Time and CPU Time

To understand the content of this chapter, it is essential to understand the concepts of elapsed time and CPU time related to processes. This section explains these times. Their definitions are as follows:

  • Elapsed Time: The time that passes from the start to the end of a process. It is like the time measured with a stopwatch from the start to the end of a process.
  • CPU Time: The time that a process actually uses a logical CPU.

These terms probably won't make much sense just by looking at the explanation, so let's understand them through an experiment. If you run a process using the time command, you can get the elapsed time and CPU time from the start to the end of the target process. For example, let's run the following "load.py" which terminates after consuming some CPU resources.

#!/usr/bin/python3

# This parameter is to change the amount of CPU time used by this program. It is convenient to change this parameter to make the CPU time consumption several seconds.
NLOOP=100000000

for _ in range(NLOOP):
    pass
Enter fullscreen mode Exit fullscreen mode

The result is as follows.

$ time ./load

real    0m2.357s
user    0m2.357s
sys     0m0.000s
Enter fullscreen mode Exit fullscreen mode

The output includes three lines starting with real, user, and sys. Among these, real indicates the elapsed time, and user and sys indicate the CPU time. user refers to the time when the process was running. In contrast, sys refers to the time when the kernel was operating as a result of system calls issued by the process. The load program continues to use the CPU from the start to the end of its execution and does not issue any system calls during that time, so real and user are almost the same, and sys is almost zero. The reason why it is "almost" is because the Python interpreter calls a few system calls at the beginning and end of the process.

Let's also run an experiment with the sleep command, which sleeps most of the time.

$ time sleep 3

real    0m3.009s
user    0m0.002s
sys     0m0.000s
Enter fullscreen mode Exit fullscreen mode

Since it terminates after sleeping for 3 seconds, "real" is nearly 3 seconds. On the other hand, this command gives up the small mount of CPU time and goes to sleep right after starting, and when it begins to use only a little CPU time again 3 seconds later before terminating. So "user" and "sys" are almost 0. The differences of two values are shown in the following figure.

Image description

When Using Only One Logical CPU

To simplify the discussion, let's first consider the case of a single logical CPU. The "multiload.py" program will be used for the experiment. This program performs the following actions:

Usage: ./multiload [-m] <number of processes>
  Executes a specified number of load processing processes for a set period of time and waits for all to finish.
  The time it took to execute each process is outputted.
  By default, all processes run only on a single logical CPU.

  Meaning of the option:
  -m: Allows each process to run on multiple CPUs.
Enter fullscreen mode Exit fullscreen mode

Here is its source code.

#!/bin/bash

MULTICPU=0
PROGNAME=$0
SCRIPT_DIR=$(cd $(dirname $0) && pwd)

usage() {
    exec >&2
    echo Usage: ./multiload [-m] <number of processes>
    Executes a specified number of load processing processes for a set period of time and waits for all to finish.
    The time it took to execute each process is outputted.
    By default, all processes run only on a single logical CPU.

      Meaning of the option:
      -m: Allows each process to run on multiple CPUs."
    exit 1
}

while getopts "m" OPT ; do
    case $OPT in
        m)
            MULTICPU=1
            ;;
        \?)
            usage
            ;;
    esac
done

shift $((OPTIND - 1))

if [ $# -lt 1 ] ; then
    usage
fi

CONCURRENCY=$1

if [ $MULTICPU -eq 0 ] ; then
    # Pin the "load.py" program to CPU0
    taskset -p -c 0 $$ >/dev/null
fi

for ((i=0;i<CONCURRENCY;i++)) do
    time "${SCRIPT_DIR}/load.py" &
done

for ((i=0;i<CONCURRENCY;i++)) do
    wait
done
Enter fullscreen mode Exit fullscreen mode

Let's first set "" to 1 and run it. This is almost the same as running the load program alone.

$ ./multiload 1

real    0m2.359s
user    0m2.358s
sys     0m0.000s
Enter fullscreen mode Exit fullscreen mode

In my environment, the elapsed time was 2.359 seconds. What about when the parallelism is 2 and 3?

$ ./multiload 2

real    0m4.730s
user    0m2.360s
sys     0m0.004s

real    0m4.739s
user    0m2.374s
sys     0m0.000s
$ ./multiload 3

real    0m7.095s
user    0m2.360s
sys     0m0.004s

real    0m7.374s
user    0m2.499s
sys     0m0.000s

real    0m7.541s
user    0m2.676s
sys     0m0.000s
Enter fullscreen mode Exit fullscreen mode

As the level of parallelism increased by 2 times, 3 times, the CPU time did not change much, but the elapsed time increased by about 2 times, 3 times. This is because, as mentioned at the beginning, only one process can run at the same time on one logical CPU, and the scheduler gives CPU resources to each process in turn.

When Using Multiple Logical CPUs

Next, let's also take a look at the case of multiple logical CPUs. When the multiload program is run with the "-m" option, the scheduler tries to distribute the multiple load processes evenly across all logical CPUs. As a result, for instance, if there are two logical CPUs and two load processes, as shown in the following figure, the two load processes can each monopolize the resources of a logical CPU.

Image description

The logic of load balancing is very complex, so this book avoids detailed explanation.

Let's actually verify this. The results of running the multiload program with the -m option and parallelism from 1 to 3 are shown below.

$ ./multiload -m 1

real    0m2.361s
user    0m2.361s
sys     0m0.000s
$ ./multiload -m 2

real    0m2.482s
user    0m2.482s
sys     0m0.000s

real    0m2.870s
user    0m2.870s
sys     0m0.000s
$ ./multiload -m 3

real    0m2.694s
user    0m2.693s
sys     0m0.000s

real    0m2.857s
user    0m2.853s
sys     0m0.004s

real    0m2.936s
user    0m2.935s
sys     0m0.000s
Enter fullscreen mode Exit fullscreen mode

For all processes, the values of real and user+sys were almost the same. In other words, we can see that each process was able to monopolize the resources of a logical CPU.

For all processes, the values of real and user+sys were almost the same. In other words, we can see that each process was able to monopolize the resources of a logical CPU.

Cases where "user + sys" is larger than "real"

Intuitively, you might think that "real >= user + sys" will always hold, but in reality, there can be cases where the value of "user + sys" is slightly larger than that of "real". This is due to the different methods used to measure each time and the fact that the precision of the measurement is not that high. There's no need to worry too much about this; it's enough to be aware that such things can happen.

Furthermore, there are cases where "user + sys" can be significantly larger than "real". For example, this occurs when you run the "multiload.py" program with the "-m" option and set the number of processes to 2 or more. Now, let's try running "./multiload -m 2" through the time command.

$ time ./multiload -m 2

real    0m2.510s
user    0m2.502s
sys     0m0.008s

real    0m2.725s
user    0m2.716s
sys     0m0.008s

real    0m2.728s
user    0m5.222s
sys     0m0.016s
Enter fullscreen mode Exit fullscreen mode

The first and second entries are data about the load processing processes of the "multiload.py" program. The third entry is data about the "multiload.py" program itself. As you can see, the value of user is about twice that of "real". In fact, the "user" and "sys" values obtained by the "time" command are the sums of the values for the target process and its terminated child processes. Therefore, if a process generates child processes and they each run on a different logical CPU, the value of "user+sys" could be greater than real. The "multiload.py" program fits exactly this condition.

previous part
next part

NOTE

This article is based on my book written in Japanese. Please contact me via satoru.takeuchi@gmail.com if you're interested in publishing this book's English version.

Top comments (0)