DEV Community

Shivakumar
Shivakumar

Posted on

Linux Kernel Basics: User Space vs. Kernel Space, System Calls, strace (debugging processes).

Here is the definitive, "Zero to Hero" master guide. It starts with the absolute fundamentals and progressively drills down into the advanced internal mechanisms used by Senior Site Reliability Engineers (SREs) and Kernel developers.


The Definitive Guide to Linux Internals: From Kernel Architecture to Advanced Debugging

Target Audience: DevOps Engineers, SREs, Developers, and anyone tired of treating Linux like a black box.
Prerequisites: None.


1. The Foundation: What Actually Is Linux?

Before we debug it, we must define it.

Most people use the term "Linux" loosely.

  • Strictly speaking: Linux is a Kernel. It is a piece of low-level software that acts as a hardware resource manager.
  • Practically speaking: Linux is an Operating System (OS). This includes the Kernel plus the "Userland"—the GNU tools, shells (bash), libraries (glibc), and applications that make the computer usable.

The Role of the Kernel

Think of the Kernel as the dictator of the computer.

  1. Memory Management: Who gets RAM? (If Chrome asks for 100GB, the Kernel says no).
  2. Process Scheduling: Who gets the CPU? (The Kernel pauses your MP3 player 1000 times a second to let your mouse move).
  3. Hardware Abstraction: Developers write code to "save file"; the Kernel translates that into electrical signals for a specific NVMe SSD model.

2. The Architecture: User Space vs. Kernel Space

This is the most critical concept in OS design. Modern CPUs (like x86_64) provide hardware-level security features called Protection Rings.

Ring 0-2: Kernel Space (Privileged Mode)

  • Who lives here: The Linux Kernel, Device Drivers, and kernel modules.
  • Powers: Unlimited. Code here can execute any CPU instruction (including halting the processor) and access any memory address.
  • The Stakes: A crash here is fatal. It causes a Kernel Panic (the Linux "Blue Screen of Death") and the entire machine reboots.

Ring 3: User Space (Restricted Mode)

  • Who lives here: Your web browser, Python scripts, Docker containers, and even the root shell.
  • Powers: Restricted. Code here creates a "Virtual Memory" sandbox.
  • It cannot access hardware directly.
  • It cannot access memory belonging to other processes.

  • The Safety Net: If a program here crashes (e.g., divides by zero or tries to touch Kernel memory), the Kernel intervenes, sends a signal (like SIGSEGV), and kills just that one process. The server stays up.

Advanced Note: Why not Rings 1 and 2?
The x86 architecture offers 4 rings. However, Linux (and Windows) only use Ring 0 (Kernel) and Ring 3 (User) to maintain portability across different CPU architectures (like ARM or RISC-V) that might not have 4 rings.


3. The Bridge: System Calls (Syscalls)

If User Space (Ring 3) cannot touch the hardware, how do we write a file or send a network packet?
Answer: We must ask the Kernel to do it for us. This request is a System Call.

The Anatomy of a System Call

The System Call interface is the API of the Linux Kernel.

  1. The Wrapper (glibc): You write printf("hello") in C or print("hello") in Python. You are not calling the Kernel yet; you are calling a library function.
  2. The Register Setup: The library puts the specific Syscall ID (a number representing the function, e.g., 1 for write) into a CPU register (usually RAX).
  3. The Context Switch (The Transition):
  4. Legacy: The CPU executes interrupt int 0x80.
  5. Modern (Fast): The CPU executes the syscall instruction.
  6. Action: This instruction forces the CPU to switch from Ring 3 to Ring 0 and jump to a specific location in the Kernel code.

  7. Execution: The Kernel checks permissions. (Does User ID 1000 have permission to write to /etc/hosts?). If yes, it executes the hardware task.

  8. The Return: The Kernel writes the result (or error code) to a register and issues sysret, dropping the CPU back to Ring 3.

Advanced Concept: vDSO (Virtual Dynamic Shared Object)

Problem: Context switching is "expensive" (it takes time). Some calls, like "What time is it?" (gettimeofday), happen thousands of times a second. Switching rings every time would slow down the system.
Solution: The Kernel maps a read-only page of its own memory directly into User Space. Your app can read the current time directly from this memory without ever triggering a real system call or entering Kernel mode. This is the vDSO.


4. The Microscope: strace (Debugging Processes)

strace (System Trace) is the ultimate debugging tool for DevOps. It attaches to a process and prints every System Call it makes. It allows you to debug "Black Box" binaries where you don't have the source code.

Basic Usage

# Run a command and trace it
strace ls /tmp

# Attach to a running process (e.g., a frozen web server)
strace -p 1234

Enter fullscreen mode Exit fullscreen mode

Understanding the Output

openat(AT_FDCWD, "/etc/passwd", O_RDONLY) = 3

  1. openat: The function name.
  2. "/etc/passwd": The argument (what file?).
  3. = 3: The return value. Positive numbers are File Descriptors (handles).
  4. Note: If you see = -1, it failed. strace will print the error code, e.g., -1 ENOENT (No such file or directory).

Advanced strace Techniques

1. Performance Profiling (-c)

Is your app slow? Don't guess. Check the stats.

strace -c -p 1234

Enter fullscreen mode Exit fullscreen mode

Output:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 95.00    0.005000         500        10           futex
  2.00    0.000100          10        10         1 open

Enter fullscreen mode Exit fullscreen mode
  • Insight: If futex (Fast Userspace Mutex) is high, your application is spending all its time waiting for thread locks. It's a concurrency issue, not a disk issue.

2. Following Threads (-f)

Modern apps (Nginx, Chrome, Java) are multi-threaded. Standard strace only watches the main thread.
Always use `-f` to follow child processes and threads.

strace -f -p 1234

Enter fullscreen mode Exit fullscreen mode

3. Data Inspection (-s)

By default, strace truncates strings (e.g., write(3, "hello wor"..., 10)).
Use -s to increase the string size limit to see full payloads (like SQL queries or JSON data).

strace -s 2000 -p 1234

Enter fullscreen mode Exit fullscreen mode

4. Fault Injection (Chaos Engineering)

This is an expert feature. You can force a System Call to fail to test how your app handles errors.
Scenario: Test if your app crashes gracefully when the disk is full.

# Force the 'open' syscall to fail with ENOSPC (No space left on device)
strace -e inject=open:error=ENOSPC ./my_application

Enter fullscreen mode Exit fullscreen mode

5. Summary: The Big Picture for DevOps

Why does an SRE need to know this?

  1. Containers are NOT VMs: Docker containers share the host's Kernel. If one container triggers a Kernel Panic (Ring 0 crash), the host and all other containers die. Isolation is only logical (Namespaces), not physical.
  2. "Permission Denied" is a Kernel Logic Check: When you see this error, it is the Kernel checking the file's inode permissions against your User ID during the open system call.
  3. Latency: Every system call has a cost. High-performance code tries to minimize system calls (e.g., by buffering data before writing).

Final Cheat Sheet

Component Responsibility Privilege Crash Consequence
User Space Applications, Shells, Docker Containers Ring 3 (Restricted) Single Process Death (SIGSEGV)
System Call Interface between User & Kernel Ring 3 -> Ring 0 n/a (It's a transition)
Kernel Space Drivers, Memory Mgmt, Scheduling Ring 0 (God Mode) Total System Crash (Kernel Panic)
strace Debugging Tool User Space Reveals the "truth" of what an app is doing

If still need more indetailed basic about linux please check this website

Top comments (0)