Ahmet Can Gulmez

Posted on Sep 16

The Linux Programming Interface - An Overview

#c #programming #linux #lowcode

I'm starting the new journey. In this series, I'll talk about Linux programming!

In this post, I wanna give you an overview about the key topics and terms in Linux. These are

The Kernel
The Shell
Users and Groups
Directories, Links, and Files
File I/O Model
Programs
Processes
Memory Mappings
Static and Dynamic Libraries
IPC and Synchronization
Signals
Threads
Process Groups and Shell Job Control
Sessions and Controlling Terminals
Pseudoterminals
Date and Time
Client-Server Architecture
Realtime
The /proc File System

The Kernel

The term operating system is commonly used with two different meanings:

To denote the entire package consisting of the central software managing a computer's resources.
More narrowly, to refer to the central software that manages and allocates computer resources (i.e., the CPU, RAM, and devices).

The kernel is often used as a synonym for the second meaning. Among other things, the kernel performs the following tasks:

Process scheduling: Linux is a preemptive multitasking kernel. In here, multitasking means that multiple processes can reside in memory and each may receive use of the CPU(s) and preemptive means that the rules governing which processes receive use of the CPU and for how long are determined by the kernel process scheduler.
Memory management: Linux employs virtual memory management, a technique that confers two main advantages. First is that processes are isolated from one another and from the kernel, so that one process can't read or modify the memory of another process or the kernel. Second is that only part of a process needs to be kept in memory, thereby lowering the memory requirements of each process and allowing more processes to be held in RAM always.
Provision of a file system: The kernel provides a file system on disk, allowing files to be created, retrieved, updated, deleted and so on.
Creation and termination of processes: The kernel can load a new program into memory, providing it with the resources and, when it finished its execution, terminate.
Access to devices: The kernel provides programs with an interface that standardizes and simplifies access to devices.
Networking: The kernel transmits and receives network messages (packets) on behalf of user processes.
Provision of a system call application programming interface (API): Processes can request the kernel to perform various tasks using kernel entry points known as system calls.

Modern processors allow CPU to operate in at least two different modes: user mode and kernel mode. Hardware instructions allow switching from one mode to other. Certain operations can be performed only while the processor is operating in kernel mode. Examples include executing the halt instruction to stop the system, accessing the memory-management hardware, and initiating device I/O operations.

The Shell

A shell is a special purpose program designed to read commands typed by a user and execute appropriate programs in response to those commands. Such a program is sometimes known as a command interpreter. Some popular shells:

Bourne shell (sh)
C shell (csh)
Korn shell (ksh)
Bourne again shell (bash)

The shells also are designed for the interpretation of shell scripts, which are text files containing shell commands.

Users and Groups

Every user in Linux environment has a unique login name and a corresponding numeric user ID. These information are defined under password file (/etc/passwd).

Also for administrative purposes, like for controlling access to files and other system resources, it is useful to organize users into groups. Each group is identified in its corresponding group file (/etc/group).

In the system, one user (superuser) has special privileges. The superuser account has user ID 0, and normally has the login name root. The superuser can bypass all permission checks in the system. Typically, system administrator uses this account.

Directories, Links, and Files

The kernel maintains a single hierarchical directory structure to organize all files in the system. At the base of this hierarchy is the root directory, named / (slash).

Within the file system, each file is marked with a type, indicating what kind of file it is.

A directory is a special file whose contents take the form of a table of filenames coupled with references to the corresponding files. This filename-plus-reference association is called a link, and files may have multiple links, and thus multiple names. Each directory contains at least two entries: . (dot), referencing the directory itself and .. (dot-dot), referencing the parent directory. This type of link is called hard.

Beside the hard link, symbolic link is specially marked file containing the name of another file.

Each process has a current working directory. This is the process's "current location" within the single directory hierarchy. A process inherits its current working directory from its parent process.

Each file also has an associated user ID and group ID that define the owner of the file and the group to which it belongs. The ownership of a file is used to determine the access rights available to users of the file.

File I/O Model

UNIX systems provide a special concept called as universality of I/O. That means that similar I/O system calls (open(), read(), write(), close(), and so on) can be applied on all types of files.

The kernel essentially provides one file type: a sequential stream of bytes.

The I/O system calls refer to open files using a file descriptor, a small non-negative integer. A file descriptor is typically obtained by a call to open(). Normally, a process inherits three open file descriptors when it is started the shell: standard input (0), standard output (1), and standard error (2).

To perform file I/O, C programs typically employ I/O functions contained in the standard C stdio library, includes fopen(), fclose() and so on. These are layered on top of the I/O system calls.

Programs

Programs normally exist in two forms. The first form is source code, human-readable text written by C/C++ languages. To be executed, source code must be converted to the second form: binary machine-language instructions that the computer can understand.

Processes

Put most simply, a process is an instance of an executing program. When a program is executed, the kernel loads the code of the program into virtual memory, allocates space for program variables, and sets up kernel bookkeeping data structures to recors various information like process ID, open file descriptors or so on.

A process is logically divided into the following parts, known as segments:

Text: the instructions of the program
Data: the static variables used by the program
Heap: an area from which programs can dynamically allocate extra memory
Stack: a piece of memory that grows and shrinks as functions are called and return and that is used to allocate storage for local variables and functions call linkage information

A process can create a new process using the fork() system call. The process that calls fork() is referred to as the parent process, and the new process is referred to as the child process.

The child process goes on either to execute a different set of functions in the same code as the parent, or, frequently, to use the execve() system call to load and execute an entirely new program.

Each process has a unique integer process identifier (PID) and parent process identifier (PPID) attributes which are generally used for other system calls like kill().

A process can terminate in one of two ways: by requesting its own termination using the exit() or by being killed by the delivery of a signal. In either case, the process yields a termination status, a small non-negative integer value that is available for inspection by the parent process using the wait() system call.

Linux divides the privileges traditionally accorded to the superuser into a set of distinct called capabilities. Each privileged operation is associated with a particular capability, and a process can perform an operation only if it has the corresponding capability. A traditional superuser process corresponds to a process with all capabilities enabled.

When booting the system, the kernel creates a special process called init, the "parent of all processes", which is derived from the program file /sbin/init. All processes on the system are created either by init or by one of its descendants.

A daemon is a special-purpose process that is created and handled by the system it the same way as other processes, but which is distinguished by the following characteristics:

It is long-lived. A daemon process is often started at system boot and remains in existence until the system is shut down.
It runs in the background, and has no controlling terminal from which it can read input or to which it can write output.

Memory Mappings

The mmap() system call creates a new memory mapping in calling process's virtual address space. Mappings fall into two categories:

A file mapping maps a region of a file into the calling process's virtual memory.
By contrast, an anonymous mapping doesn't have a corresponding file. Instead, the pages of the mapping are initialized to 0.

Static and Shared Libraries

A static library (archive) is essentially a structured bundle of compiled object modules. To use functions from a static library, the linker extracts copies of the required object modules from the library and copies these into the resulting executable file. We say that such a program is statically linked. The major problem about static linking is that dublication of object code in different executable files wastes disk spaces.

Shared libraries were designed to solve this problem. If a program is linked against a shared library, then, instead of copying object modules from the library into the executable, the linker just writes a record into the executable to indicate that at run time the executable needs to use that shared library.

IPC and Synchronization

Processes need methods of communicating with one another and synchronizing their actions. Therefore, Linux provides a rich set of mechanisms for interprocess communication (IPC) including the following:

signals
pipes
sockets
file locking
message queues
semaphores
shared memory

Signals

Although signals can be used for IPC, main usage is employed in a wide range of other contexts. Signals are often described as "software interrupts". The arrival of a signal informs a process that some event or exceptional condition has occurred.

Signals are sent to a process by the kernel, by another process, or by the process itself.

For most signal types, instead of accepting the default signal action, a program can choose to ignore the signal, or to establish a signal handler. A signal handler is a programmer-defined function that is automatically invoked when the signal is delivered to the process.

Threads

Each process can have multiple threads of execution. Each thread is executing the same program code and shares the same data area and heap.

Threads can communicate with each other via the global variables that they share. The threading API provides condition variables and mutexes, which are primitives that enable the threads of a process to communicate and synchronize their actions, in particular, their use of shared variables.

The primary advantages of using threads are that they make it easy to share data between cooperating threads.

Process Groups and Shell Job Control

All major shells, except the Bourne shell, provide an interactive feature called job control, which allows the user to simultaneously execute and manipulate multiple commands or pipelines.

Sessions and Controlling Terminals

A session is a collection of process groups (jobs). Sessions are used mainly by job-control shells. Sessions usually have an associated controlling terminal. The controlling terminal is established when the session leader process first opens a terminal device.

Pseudoterminals

A pseudoterminal is a pair of connected virtual devices, known as the master and slave. This device pair provides an IPC channel allowing data to be transferred in both directions between the two devices.

Date and Time

Two types of time are of interest to a process:

Real time is measured either from some standard point (calendar time) or from some fixed point. On UNIX systems, calendar time is measured in seconds since midnight on the morning of January 1, 1970, Universal Coordinated Time.
Process time also called CPU time, is the total amount of CPU time that a process has used since starting. CPU time is further divided into system CPU time (kernel mode) and user CPU time (user mode).

Client-Server Architecture

A client-server application is one that is broken into two component processes:

a client, which asks the server to carry out some service by sending it a request message; and
a server, which examines the client's request, performs appropriate actions, and then sends a response message back to the client.

Typically, the client application interacts with a user, while the server application provides access to some shared resource. Commonly, there are multiple instances of client processes communicating with one or a few instances of the server process.

Realtime

Realtime applications are those that need to respond in a timely fashion to input. Frequently, such input comes from an external sensor or a specialized input device, and output takes the form of controlling some external hardware.

The provision of realtime responsiveness, especially where short response times are demanded, requires support from the underlying operating system. Most operating systems don’t natively provide such support because the requirements of realtime responsiveness can conflict with the requirements of multiuser time-sharing operating systems.

The /proc File System

The /proc file system is a virtual file system that provides an interface to kernel data structures in a form that looks like files and directories on a file system. This provides an easy mechanism for viewing and changing various system attributes.

DEV Community