Saiful Islam

Posted on Jan 10

Separate Stack for separate Thread.

#webdev #operatingsystm #computerscience #architecture

Threads & Processes:

Core Concept: Each thread needs its own stack, but threads within a process share code, data, and heap. The OS Kernel orchestrates everything.

What is a Process?

A process is a running program isolated from other processes. When you open Chrome or Spotify, the OS creates a process. It starts with one thread (the main thread) and has its own memory space. The kernel creates a Process Control Block (PCB) to track everything about the process.

What is a Thread?

A thread is an execution path within a process. Multiple threads can exist in one process and share code and data, but each thread must have its own stack. The kernel creates a Thread Control Block (TCB) for each thread to track its state.

Process Memory Layout

Segment	Type	Shared?	Notes
Code	Compiled program instructions	✅ All threads	Read-only, fixed size
Data	Global variables	✅ All threads	Fixed size, initialized
Heap	Dynamic memory (malloc/new)	✅ All threads	Grows upward, managed by programmer
Stack	Local variables, function calls	❌ Per thread	Grows downward, isolated per thread

Why Separate Stacks?

If two threads shared one stack, their function calls would collide and corrupt each other's data. When Thread A calls a function, it creates a stack frame. If Thread B also calls a function and adds its frame to the same stack, the frames overlap and overwrite each other. When Thread B returns and pops its frame, Thread A's data becomes corrupted.

Solution: Each thread has its own stack for its function calls and local variables. This way, Thread A and Thread B can execute different functions simultaneously without interfering with each other. Both threads execute independently and safely.

Stack Frames and LIFO

Stack frames work with LIFO (Last In, First Out) order. When main() calls functionA(), which calls functionB(), a new frame is created for each function on the stack. When functionB() completes and returns, its frame is removed from the stack, revealing functionA()'s frame. Then functionA() returns and its frame is removed. This ensures proper return flow and variable access.

Each thread has its own stack, so multiple threads can call the same functions simultaneously without interference. Thread A's stack frames remain separate from Thread B's stack frames.

Thread Creation: Main vs. Sub-threads

When a process starts, the OS automatically creates one thread—the main thread. This main thread is the entry point for the program and begins execution at the main() function.

The main thread can then create additional threads by calling pthread_create(). These new threads are called sub-threads. However, once created, all threads are treated as peers by the kernel. The main thread is not special anymore—it has no more authority than the sub-threads.

Thread Hierarchy

Process Created by OS
    ↓
Main Thread (created automatically by kernel)
    ├─→ Sub-thread 1 (created by main thread)
    ├─→ Sub-thread 2 (created by any thread)
    └─→ Sub-thread 3 (created by any thread)

Each sub-thread receives its own stack (allocated by the kernel), its own TCB (Thread Control Block), and shares the process's code, data, and heap. Sub-threads can also create more sub-threads if needed.

Music Player Example

A music player demonstrates multi-threading well. The main thread handles the UI and user interactions. But instead of blocking on audio playback or file I/O, it creates sub-threads to handle these tasks concurrently. One sub-thread decodes and plays audio continuously. Another sub-thread handles file system operations like loading songs. A third sub-thread updates the timer display.

All threads share the music player's code and data (the song list, player settings, etc.). But each thread has its own stack for local variables and function execution. The kernel rapidly switches between threads, giving each one a time slice to execute. This makes it appear that all tasks happen simultaneously.

The OS Kernel: Central Orchestrator

The kernel is the central manager of the OS. It handles all resource allocation and coordination for processes and threads.

Process Management

The kernel creates a Process Control Block (PCB) for every process. This PCB stores the process's ID, state (running, waiting, ready, etc.), memory layout information, file descriptors, and signal handlers. The kernel uses this information to manage the process, isolate it from other processes, and enforce security.

Thread Management

For each thread, the kernel creates a Thread Control Block (TCB). The TCB stores the thread's ID, state, CPU registers (Program Counter, Stack Pointer), stack address and size, thread-local storage, and scheduling information. The kernel tracks all this information so it can schedule and manage threads properly.

Memory Management

The kernel allocates memory for thread stacks using system calls like mmap(). It reserves virtual address space for each thread's stack. The kernel also creates guard pages at the stack boundaries to detect stack overflow conditions. When a thread actually uses its stack, the kernel allocates physical memory on demand.

CPU Scheduling

The kernel decides which thread runs on which CPU core and for how long. It uses scheduling algorithms to fairly distribute CPU time among all threads. When a thread's time slice expires or it needs to wait for I/O, the kernel performs a context switch.

Context Switching

During a context switch, the kernel saves the current thread's CPU state (Program Counter, Stack Pointer, and all CPU registers) into the thread's TCB. It then loads the next thread's saved state from its TCB into the CPU registers. The CPU then resumes execution from the restored Program Counter, effectively transferring control to the next thread. The thread resumes as if it was never interrupted.

Default Stack Size by OS

Different operating systems allocate different amounts of stack space per thread. This is virtual address space, not physical RAM.

Linux: 8 MB per thread. For 10 threads, that's 80 MB of reserved virtual space, but actual physical RAM used is only what the threads actually consume (typically a few kilobytes per thread).

Windows: 1 MB per thread. For 10 threads, that's 10 MB of reserved space. Windows balances stack size with resource efficiency.

macOS: 512 KB per thread. For 10 threads, that's 5 MB of reserved space. macOS is more conservative with memory.

Java: 1 MB per thread on most systems. The JVM handles stack allocation for Java threads.

Go: Approximately 2 KB per goroutine. Go's goroutines are much lighter because Go's runtime manages them, not the operating system. A single Go program can have thousands or even millions of goroutines running efficiently.

Why Different Sizes?

Linux's larger 8 MB stack supports deep recursion and large local variables in complex applications. Windows's 1 MB stack balances stability with resource usage. macOS's smaller 512 KB stack reflects its design philosophy of resource efficiency. Go's tiny goroutine stacks work because the Go runtime manages memory and context switching with more efficiency than the OS kernel.

Virtual vs. Physical Memory

When the kernel allocates 8 MB of stack space for a thread, it reserves 8 MB of virtual address space. However, it doesn't immediately use 8 MB of physical RAM. Instead, it uses demand paging. Only when the thread actually writes to stack memory does the kernel allocate physical pages. If a thread only uses 100 KB of its 8 MB stack, then only 100 KB of actual RAM is used. The remaining 7.9 MB of virtual space uses no physical memory at all.

This allows the kernel to efficiently allocate stacks without wasting memory, even for systems with hundreds or thousands of threads.

Kernel Data Structures

Process Control Block (PCB)

The PCB contains the process ID, process state, memory layout (where code, data, heap, and stacks are located), file descriptor table, signal handlers, and a list of all threads in the process. The kernel uses this information to manage the process's lifecycle and resources.

Thread Control Block (TCB)

The TCB contains the thread ID, thread state, CPU registers (Program Counter, Stack Pointer, etc.), the address and size of the thread's user-mode stack, thread-local storage information, scheduling priority, and a pointer to the kernel stack. The kernel uses the TCB during context switches to save and restore thread state.

Kernel Stack vs. User Stack

Every thread actually has two stacks: a user-mode stack and a kernel-mode stack.

The user-mode stack is what we've been discussing. It's used when the thread executes user code. It holds local variables, function parameters, and return addresses. It's located in the process's virtual address space and has a typical size like 8 MB on Linux.

The kernel-mode stack is used when the kernel executes on behalf of the thread. This happens during system calls (like reading a file), handling interrupts, or other kernel operations. The kernel-mode stack is separate and in kernel memory, which is protected from user code. This prevents user code from corrupting kernel data structures.

When a thread calls a system call like read() or write(), the CPU switches to kernel mode, loads the kernel stack pointer for that thread, and the kernel code executes on the kernel stack. After the system call completes, the CPU switches back to user mode and restores the user stack pointer.

Key Late Flourish at once!

Processes are isolated: Each process has its own memory space and is protected from other processes. The kernel enforces this isolation.

Threads share resources: All threads within a process share the code, data, and heap. They cooperate to perform the process's tasks.

Each thread has its own stack: The stack is the only memory region per thread within a process. This ensures independent execution.

Main thread is special only initially: It's the entry point (at main()), but all threads become peers once created.

Sub-threads created via pthread_create(): The main thread or any other thread can create new threads.

Kernel allocates all stacks: Stack size defaults are 8 MB (Linux), 1 MB (Windows), 512 KB (macOS).

Kernel manages scheduling: The kernel controls which thread runs, for how long, and on which CPU core.

Context switching enables multitasking: Rapid context switches between threads create the illusion of parallel execution on a single core.

Stack frames implement LIFO: Function calls push frames, returns pop frames. This is fundamental to how program control flow works.

Demand paging saves RAM: Virtual stack space doesn't consume physical memory until actually used.

TCBs track all thread state: The kernel maintains TCBs to manage threads throughout their lifetime.

Understanding This Matters

Understanding how threads work at the operating system level is fundamental to computer science. It reveals how your code actually executes at the hardware level. You're not just using threading APIs—you're understanding the real mechanisms that make concurrent programming possible. This foundation is essential for building efficient, safe multithreaded applications and for understanding more advanced concurrency concepts like goroutines in Go.

DEV Community