Sangyog Puri

Posted on Jun 27

CSAPP Chapter 9: Virtual Memory - Deep Reference

#architecture #computerscience #learning #tutorial

1. The Core Problem - Why Virtual Memory Exists

Without virtual memory, every program would directly address physical RAM. This creates three fundamental problems:

No isolation: process A could read or overwrite process B's memory. One buggy program could corrupt another or the OS itself.
No abstraction: programs would need to know exactly where in physical RAM they're loaded. The same binary couldn't run twice simultaneously.
Limited size: programs would be capped by how much physical RAM is installed. You couldn't run a program larger than your RAM.

Virtual memory solves all three by giving every process the illusion of a large, private, contiguous address space - completely independent of physical RAM layout. The hardware + OS transparently handles the mapping from virtual to physical addresses.

CORE IDEA	Virtual memory is an abstraction over physical RAM. Every address your program uses is a virtual address. The hardware (MMU) translates it to a physical address on every memory access - transparently, below the level any program can observe.

2. Physical vs Virtual Addressing

2.1 Physical Addressing - The Old Way

In early computers (and still in microcontrollers today), the CPU generates addresses that go directly onto the memory bus and access physical DRAM. What the program computes as an address is literally where in RAM the data lives.

CPU → [address bus] → DRAM

address 0x1000 → literally byte 4096 of physical RAM

2.2 Virtual Addressing - How Modern CPUs Work

The CPU generates a virtual address. Before it reaches RAM, it passes through the MMU (Memory Management Unit) - a hardware chip that translates it to a physical address using the page table.

CPU → [virtual address] → MMU → [physical address] → DRAM

virtual 0x7fff1000 → MMU → physical 0x3a2000 → RAM

Key consequence: two different processes can use the exact same virtual address (e.g. both have a stack at 0x7fffffffe000) and they map to completely different physical RAM. The MMU handles the translation per-process.

WHY THIS MATTERS	This is the exact mechanism that gives each process its own private address space - the isolation we discussed in Ch 8. Process A's virtual 0x1000 and process B's virtual 0x1000 are different physical locations. There is no way for A to address B's memory because A's page table has no entries pointing to B's physical pages.

3. VM as a Caching Tool - Pages and Page Tables

This is the most important section in Ch 9. Everything else builds on these concepts.

3.1 Pages - The Unit of Transfer

Virtual memory is divided into fixed-size chunks called pages. Physical memory is divided into matching chunks called frames (or physical pages). The page size is set by the hardware - typically 4KB on x86-64, though 2MB and 1GB 'huge pages' also exist.

Virtual address space: Physical RAM:

┌──────────────┐ ┌──────────────┐

│ VP 0 (4KB) │ │ PP 0 (4KB) │

├──────────────┤ ├──────────────┤

│ VP 1 (4KB) │ │ PP 1 (4KB) │

├──────────────┤ ├──────────────┤

│ VP 2 (4KB) │ │ PP 2 (4KB) │

├──────────────┤ ├──────────────┤

│ ... │ │ ... │

└──────────────┘ └──────────────┘

VP = virtual page PP = physical page (frame)

At any moment, a virtual page can be in one of three states:

Unallocated: the page doesn't exist yet. No memory is wasted on it. This is why a process can have a 128GB virtual address space on a machine with 16GB of RAM - most of those pages are simply unallocated.
Cached: the page is allocated AND currently resident in physical RAM. Accessing it is fast - just an MMU translation.
Uncached: the page is allocated (it exists, e.g. on disk or in a file) but NOT currently in physical RAM. Accessing it triggers a page fault.

3.2 The Page Table - The Translation Map

The page table is a per-process data structure the kernel maintains in memory. It maps virtual page numbers to physical page numbers. The MMU uses the page table on every memory access to perform the translation.

Each entry in the page table is called a PTE (Page Table Entry). Each PTE contains:

Valid bit: is this virtual page currently in physical RAM? If 1 = cached, if 0 = not in RAM (either unallocated or on disk)
Physical page number: which physical frame does this virtual page map to (only meaningful if valid bit = 1)
Permission bits: read / write / execute permissions for this page
Dirty bit: has this page been written to since it was loaded from disk? (used to decide if it needs to be written back on eviction)
Reference bit: has this page been accessed recently? (used by replacement policies)

Page Table (per process):

┌─────┬───────┬────────────────────────┬─────────────┐

│ VPN │ Valid │ Physical Page Number │ Permissions │

├─────┼───────┼────────────────────────┼─────────────┤

│ 0 │ 1 │ PP3 │ r-x │ ← in RAM, execute-only (code)

│ 1 │ 1 │ PP7 │ rw- │ ← in RAM, read-write (data)

│ 2 │ 0 │ (disk) │ rw- │ ← on disk, not in RAM

│ 3 │ 0 │ (null) │ - │ ← unallocated, doesn't exist

│ 4 │ 1 │ PP1 │ rw- │ ← in RAM (stack)

└─────┴───────┴────────────────────────┴─────────────┘

VPN = Virtual Page Number

3.3 Page Hits vs Page Faults

Page Hit: the CPU accesses a virtual address → MMU looks up the PTE → valid bit = 1 → MMU translates to physical address → reads from RAM. Fast, transparent, happens millions of times per second.

Page Fault: the CPU accesses a virtual address → MMU looks up the PTE → valid bit = 0 → MMU triggers a fault exception → OS page fault handler runs.

What the page fault handler does:

Selects a victim page to evict from RAM (using a replacement policy like LRU)
If the victim page's dirty bit = 1: writes it back to disk (swap)
Loads the requested page from disk into the now-free physical frame
Updates the page table: sets valid bit = 1, sets physical page number
Re-executes the faulting instruction - the fault handler returns, the CPU retries, and this time the PTE is valid. From the program's perspective, nothing happened - the instruction just took longer.

KEY INSIGHT	Page faults are fault-type exceptions (from Ch 8) - the handler fixes the problem and re-executes the same instruction. This is the entire mechanism. Your program never knows a page fault happened. The OS is silently moving pages between disk and RAM, keeping the illusion of an infinite address space.

3.4 Locality Makes This Practical - The Working Set

If programs accessed memory randomly, page faults would be constant and performance would collapse. What makes virtual memory practical is locality (from Ch 6):

Temporal locality: recently accessed pages will likely be accessed again soon
Spatial locality: if page N is accessed, pages N-1 and N+1 will likely be accessed soon

The set of pages a program actively uses at any moment is called the working set. As long as the working set fits in physical RAM, page fault rates stay low and performance is good. When the working set exceeds available RAM, the system starts thrashing - constantly evicting pages that are immediately needed again - and performance collapses dramatically.

4. Address Translation - How the MMU Does It

Every virtual address gets split into two parts by the MMU. The split point is determined by the page size.

Virtual Address (64 bits on x86-64):

┌────────────────────────────┬──────────────────────┐

│ Virtual Page Number │ Page Offset │

│ (VPN) │ (PO) │

└────────────────────────────┴──────────────────────┘

bits 63..12 (52 bits) bits 11..0 (12 bits)

With 4KB pages: offset = 12 bits (2^12 = 4096 bytes)

The translation process:

1. CPU generates virtual address VA

2. MMU extracts VPN = VA[63:12] (upper bits)

3. MMU extracts PO = VA[11:0] (lower 12 bits - the offset within the page)

4. MMU looks up VPN in the page table → gets PPN (Physical Page Number)

5. Physical address = PPN concatenated with PO

PA = PPN:PO

6. MMU sends PA to RAM, gets the data

KEY INSIGHT	The page offset (PO) is copied unchanged from virtual to physical address. Only the page number gets translated. This is why page size must be a power of 2 - it makes the split a simple bit operation, not arithmetic.

4.1 Multi-Level Page Tables - Why We Need Them

A naive single-level page table for a 64-bit address space would be enormous. With 4KB pages and 8-byte PTEs, a full single-level page table would be 2^52 × 8 bytes = 32 petabytes - per process. Clearly impossible.

The solution: multi-level page tables. x86-64 uses 4 levels (called PGD, PUD, PMD, PTE in Linux).

Virtual Address split across 4 levels:

┌───────┬───────┬───────┬───────┬──────────────────┐

│ L1 │ L2 │ L3 │ L4 │ Page Offset │

│ 9 bits│ 9 bits│ 9 bits│ 9 bits│ 12 bits │

└───────┴───────┴───────┴───────┴──────────────────┘

Each level table has 2^9 = 512 entries × 8 bytes = 4KB (one page!)

Only allocate lower-level tables when needed → huge memory savings

The key insight of multi-level page tables: if a large region of the virtual address space is unallocated, the entire subtree below that L1 entry simply doesn't exist - no memory wasted. A sparse process (most virtual addresses unused) only has a tiny set of page table pages actually allocated.

4.2 The TLB - Making Translation Fast

With multi-level page tables, every memory access requires 4 additional memory accesses (one per page table level) before reaching the actual data. This would make memory access 5x slower. The solution: the TLB (Translation Lookaside Buffer).

The TLB is a small, fast hardware cache built into the CPU that stores recent VPN→PPN mappings. It typically holds 64-1024 entries. On a TLB hit: the translation is done in a single CPU cycle, no memory access needed. On a TLB miss: the CPU must do the full page table walk (4 memory accesses), then caches the result in the TLB.

Memory access with TLB:

CPU generates VA

↓

Check TLB for VPN

├── HIT → get PPN directly → access RAM (1 cycle extra) ← 99%+ of accesses

└── MISS → walk page table (4 RAM accesses) → cache in TLB → access RAM

TLB hit rate in practice: 99%+ for programs with good locality

REAL WORLD

TLB shootdowns are a real performance concern in multi-core systems. When a page table is modified (e.g. during munmap, fork, or process exit), all CPU cores that might have the old mapping cached in their TLBs must be notified to invalidate it. On a 32-core machine, this requires 31 inter-processor interrupts - a measurable cost. This is one reason huge pages (2MB instead of 4KB) help performance: fewer TLB entries needed for the same amount of memory.

5. VM as a Tool for Memory Management

Virtual memory doesn't just cache RAM - it provides key abstractions that simplify the entire system.

5.1 Simplifying Linking

Every Linux process uses the same virtual address layout. The code (text) segment always starts at 0x400000. The stack always starts near the top of the address space at 0x7fffffffffff. The linker can produce binaries with fixed virtual addresses, without knowing where in physical RAM the program will load. At runtime, the OS's page tables handle the actual physical placement.

Every x86-64 Linux process virtual address space:

0xFFFFFFFFFFFFFFFF ┐

│ Kernel (not accessible to user code)

0xFFFF800000000000 ┘

┐

0x7FFFFFFFFFFF │ Stack (grows downward)

│ (shared libraries loaded here too)

│ Heap (grows upward via brk/mmap)

0x400000 │ Text (code) + Data + BSS

0x0 ┘ (unmapped - null pointer guard)

5.2 Simplifying Loading

When the OS loads a program, it doesn't actually copy the binary into RAM. It sets up page table entries pointing to the binary on disk, with valid bits = 0. As the program starts executing and accesses code/data, page faults fire, and the OS loads only the needed pages on demand. This is called demand paging - and it's why large programs start quickly even if they use much more memory than is initially loaded.

5.3 Simplifying Sharing

When multiple processes run the same program (e.g. 50 bash shells), the OS doesn't load 50 copies of the bash binary into RAM. Instead, all 50 processes have page table entries pointing to the SAME physical pages for the code segment. One copy in RAM, shared by all.

This works because code pages are read-only (no process can modify them). Data/stack pages are private per-process.

REAL WORLD	Shared libraries (.so files on Linux, .dylib on macOS, .dll on Windows) work exactly this way. libc is loaded once into physical RAM and shared by every process that uses it - potentially hundreds of processes sharing one physical copy of the same library code.

6. VM as a Tool for Memory Protection

Page table entries contain permission bits that the MMU checks on every memory access:

Permission Bit	Meaning	Example Use
r (read)	Page can be read	All pages - code, data, stack
w (write)	Page can be written	Data, stack, heap - NOT code
x (execute)	Instructions can be fetched from this page	Code segment only (W^X policy)
u (user)	Accessible in user mode	User process pages
s (supervisor)	Accessible only in kernel mode	Kernel memory pages

If a process tries to access a page with insufficient permissions, the MMU raises a protection fault → kernel handler → SIGSEGV sent to process → segfault.

Examples of what this prevents:

Code injection: data pages (stack, heap) are marked non-executable (NX bit / DEP). Even if an attacker injects malicious bytes into the stack buffer, the CPU will fault rather than execute them.
Process isolation: each process's page table only covers its own memory - no entries for other processes' physical pages exist. There is no virtual address in process A that maps to process B's memory.
Kernel protection: kernel pages are marked supervisor-only. User-mode code (your program) cannot read or write kernel memory - any attempt faults immediately.

W^X Policy	Modern OSes enforce W^X (Write XOR Execute): a page is either writable OR executable, never both simultaneously. This prevents the most common code injection attacks - you can write data but can't execute it, and you can execute code but can't modify it at runtime. Rust and most modern toolchains enable this by default.

7. The Full Address Translation Picture - Intel Core i7 / Linux

This is the most important diagram in Ch 9 - how all the pieces work together on a real system. Trace through this carefully.

7.1 The Complete Translation Flow

CPU executes instruction that accesses virtual address VA

│

▼

┌─────────────────────────┐

│ TLB │

│ (cache of VPN→PPN) │

└─────────────────────────┘

HIT ↙ ↘ MISS

↙ ↘

PPN from TLB Walk 4-level page table

↘ ↙

┌─────────────────────────┐

│ Check valid bit │

└─────────────────────────┘

valid=1 ↙ ↘ valid=0

↙ ↘

Check permissions Page Fault handler

↙ ↘

ok ↙ ↘ fail Load page from disk

↙ ↘ Update page table

PA = PPN:PO SIGSEGV Retry instruction

↓

L1 Cache

hit ↙ ↘ miss

↙ ↘

data L2 → L3 → RAM

7.2 Linux Virtual Memory Areas (VMAs)

Linux doesn't track memory at the page level in its high-level data structures. Instead it uses Virtual Memory Areas (VMAs) - contiguous regions of the virtual address space with the same permissions and backing store.

Examples of VMAs in a typical process:

Text VMA: 0x400000-0x401000, r-x, backed by the binary on disk
Data VMA: 0x600000-0x601000, rw-, backed by the binary on disk
Heap VMA: 0x... grows upward via brk() or mmap()
Stack VMA: 0x7fff...-0x7fffffffffff, rw-, anonymous (not backed by a file)
Shared library VMAs: one per shared library, mapped into the process's address space

When a page fault fires, the kernel finds which VMA the faulting address belongs to. If no VMA covers that address: SIGSEGV (invalid access). If a VMA covers it: load the page from the VMA's backing store (file or swap).

8. Memory Mapping - mmap

mmap is the most powerful and important VM-related syscall. It maps a file (or anonymous memory) directly into the process's virtual address space.

8.1 What mmap Does

void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

addr = hint for where to place the mapping (usually NULL - let OS choose)

length = how many bytes to map

prot = PROT_READ | PROT_WRITE | PROT_EXEC (permission bits)

flags = MAP_SHARED or MAP_PRIVATE (see below)

fd = file descriptor to map (or -1 for anonymous)

offset = byte offset within the file to start mapping

mmap does NOT read the file into RAM when called. It just creates a VMA entry in the process's address space. Pages are loaded on demand as the process accesses them - via page faults. This is called lazy loading.

8.2 MAP_SHARED vs MAP_PRIVATE

Flag	Writes visible to other processes?	Writes go to disk?	Use case
MAP_SHARED	Yes - all processes mapping the same file see each other's writes	Yes - writes go through to the file	IPC via shared memory, writing to files efficiently
MAP_PRIVATE	No - each process gets its own copy of modified pages (copy-on-write)	No - writes stay private	Loading shared libraries, read-only file processing

8.3 Anonymous Mappings - How malloc Works

mmap with fd = -1 and MAP_ANONYMOUS creates a mapping not backed by any file - just blank zeroed pages. This is how malloc gets large chunks of memory from the OS:

For small allocations: malloc manages a heap using brk() syscall

For large allocations (>128KB typically): malloc calls mmap(MAP_ANONYMOUS)

When you call free(): the memory is returned to malloc's free list

pages are NOT immediately returned to OS

When malloc calls munmap(): OS removes the VMA, pages returned to OS

8.4 Key Use Cases for mmap in Systems Work

File I/O without read()/write(): map the file into address space, access it like an array. Avoids an extra copy (data goes directly from page cache to user space without a kernel buffer intermediate). Used in databases, log systems.
Shared memory IPC: two processes mmap the same file with MAP_SHARED. They can communicate by reading/writing the mapped region. Used by some message queues, caches, game engines.
Shared libraries: the dynamic linker mmaps .so files into every process that uses them. All processes share the same physical pages for the code.
Large allocations: malloc falls back to mmap for large requests, since mmap can return pages to the OS (unlike brk-based heap, which can't shrink if there are allocations above the freed region).

REAL WORLD	RocksDB, LMDB, and many other storage engines use mmap for reading their data files. The OS page cache acts as an implicit buffer pool - recently accessed pages stay in RAM automatically, no separate caching layer needed. The tradeoff: you give up control of which pages are in RAM to the OS.

9. Copy-on-Write (COW) - How fork() Is Actually Fast

We touched on this in Ch 8 but now we can explain it precisely. When fork() is called:

fork() is called:

1. Kernel creates a new page table for the child

2. Copies the parent's page table entries into the child's page table

3. Marks ALL pages in BOTH parent and child as read-only

4. Returns - child and parent now share all physical pages

Later, either process writes to a shared page:

1. Write attempt → protection fault (page is marked read-only)

2. Kernel fault handler sees it's a COW page (not a real protection violation)

3. Kernel allocates a NEW physical page

4. Copies the content of the shared page into the new page

5. Updates the writing process's page table to point to the new page

6. Marks the new page as read-write

7. Re-executes the write instruction - succeeds this time

8. Other process still points to the original page - unaffected

Why this makes fork() fast: no physical memory is copied at fork() time. A process with 1GB of heap can be forked in microseconds, because only the page table (a few KB) is actually copied. Physical pages are only duplicated one-by-one, on demand, as writes occur.

REAL WORLD	This is why Redis (which does copy-on-write fork() for background saves / RDB snapshots) can fork a multi-GB dataset nearly instantly. The parent keeps serving requests while the child writes the snapshot. Pages modified by the parent after the fork get copy-on-write duplicated, but unmodified pages are shared. Memory usage only grows proportional to what's been modified since the fork.

10. Dynamic Memory Allocation - How malloc/free Work

The heap is the region of virtual memory used for dynamic allocation (malloc/free in C, Box::new() in Rust, new in Go/Java). The heap grows upward from a base address.

10.1 The Allocator's Job

The allocator manages a chunk of virtual memory (the heap) and satisfies allocation requests by finding free blocks. It must:

Track free blocks: know which parts of the heap are free and which are in use
Find a suitable block: when malloc(n) is called, find a free block of at least n bytes
Handle fragmentation: the heap can become fragmented even if total free bytes is sufficient

10.2 Fragmentation - The Core Problem

Type	What it is	Example	Solution
Internal fragmentation	Allocated block is larger than requested - wasted space inside the block	malloc(5) returns an 8-byte block. 3 bytes wasted inside.	Minimize padding, use size classes
External fragmentation	Total free memory is sufficient but no single free block is large enough	Two free 50-byte blocks but malloc(80) fails	Coalescing adjacent free blocks

10.3 Free Lists - How the Allocator Tracks Free Blocks

Allocators maintain a data structure tracking free blocks. The simplest is an implicit free list - a linked list embedded within the heap itself, where each block stores its size and status (free/allocated) in a header.

Heap layout with implicit free list:

┌────────────┬──────────────┬────────────┬──────────────┐

│ Header(8B) │ Payload(32B) │ Header(8B) │ Payload(16B) │ ...

│ size=40 │ (in use) │ size=24 │ (free) │

│ alloc=1 │ │ alloc=0 │ │

└────────────┴──────────────┴────────────┴──────────────┘

malloc() scans the list for a free block of sufficient size

free() marks the block's header alloc=0, coalesces with neighbors

10.4 Placement Policies

Policy	How it finds a free block	Tradeoff
First fit	Scan from start, return first block that fits	Fast, but fragments the start of the heap
Next fit	Scan from where last search ended	Faster, more uniform fragmentation
Best fit	Scan entire list, return smallest block that fits	Lowest fragmentation, but slow (full scan)

10.5 Coalescing - Merging Adjacent Free Blocks

When a block is freed, the allocator checks if adjacent blocks are also free. If so, it merges them into a single larger free block. Without coalescing, you'd accumulate many small free blocks (false fragmentation) that can't satisfy larger requests even though the total free space is sufficient.

Before free(middle block):

[allocated|8B] [allocated|16B] [free|32B]

After free, before coalescing:

[allocated|8B] [free|16B] [free|32B]

After coalescing:

[allocated|8B] [free|48B] ← merged into one big free block

REAL WORLD

Memory allocator performance matters enormously in high-throughput systems. jemalloc (used by Firefox, Meta's servers) and tcmalloc (used by Google) use size-class segregated free lists and per-thread caches to avoid contention. In Rust, the global allocator is jemalloc by default in some configurations, and you can swap it. Understanding how allocators work explains why allocation patterns (many small allocs vs few large ones, allocation lifetime) affect both performance and memory usage.

11. How Ch 9 Connects to Everything Else

Virtual memory is the foundation that makes everything else in the book possible. Here's how each subsequent chapter builds on it:

Ch 9 Concept	Where it appears later
Page faults (fault exception)	Foundation of lazy loading, mmap, COW. Directly from Ch 8's fault exception type.
mmap	Ch 10 (System I/O) - the page cache and file-backed mappings. Basis for zero-copy I/O.
Address space layout	Ch 10 (I/O) - file descriptors map to kernel objects in a separate address space. Ch 11 (networking) - socket buffers in kernel space.
Process isolation via page tables	Ch 12 (Concurrency) - threads SHARE the same address space (same page table), unlike processes. This is why data races are possible between threads but not processes.
Copy-on-write	Ch 12 - COW is used in some concurrent data structures. Also why fork() in a multi-threaded process is dangerous (the child inherits the parent's memory but only one thread - a classic deadlock trap).
Shared memory / mmap MAP_SHARED	Ch 12 - one form of inter-process communication for concurrent systems. Also used in distributed systems for shared memory message passing.
malloc/free internals	Ch 12 - why malloc is not thread-safe by default and why lock contention on the global allocator is a real scalability bottleneck in multi-threaded servers.

12. Relevance to Distributed Systems & Backend Work

Ch 9 Concept	Real-world distributed systems relevance
Page faults & working set	Why RAM matters for your service. If your working set (active data) exceeds RAM, you start swapping to disk. A 1ms DB query becomes 10ms+ because pages fault in from disk. Understanding this lets you size caches correctly.
mmap for I/O	Databases (LMDB, RocksDB, SQLite WAL mode) use mmap to read data files. Zero-copy - the OS page cache IS the buffer pool. Tradeoff: OS controls eviction policy, not you.
Copy-on-write fork()	Redis RDB snapshots, some background processing patterns. Fork a process, let it write a snapshot while parent keeps serving. COW means memory isn't doubled - only modified pages are copied.
TLB and huge pages	High-throughput servers with large working sets benefit from 2MB huge pages. Fewer TLB entries needed for same memory → fewer TLB misses → lower latency. Linux transparent huge pages (THP) does this automatically but can cause latency spikes.
Shared libraries	Every service process on your server shares one physical copy of libc, OpenSSL, your framework. Understanding this helps reason about memory usage: 100 worker processes don't each need 100 copies of the same library code.
malloc internals	Allocation pressure in hot paths. High allocation rates → allocator lock contention in multi-threaded servers → scalability cliff. Solution: arena allocators, slab allocators, avoid allocation in hot paths entirely.
Address space layout (ASLR)	Security feature: kernel randomizes where code, heap, stack, libraries are placed in the address space. Makes exploits harder because addresses aren't predictable. Enabled by default on Linux/macOS/Windows.

13. Quick Reference - Things to Remember Cold

The fundamental virtual memory facts

Page size: 4KB (4096 bytes) on x86-64. 12-bit page offset.
Virtual address split: VPN (upper bits) + page offset (lower 12 bits)
Translation: PA = PPN (from page table) concatenated with PO (copied unchanged from VA)
Page table: per-process, maps VPN→PPN. Each entry (PTE) has: valid bit, PPN, permission bits, dirty bit
TLB: hardware cache of recent VPN→PPN translations. Makes translation ~free on hits (99%+ of accesses)
x86-64 page table levels: 4 levels. Each table fits in one 4KB page (512 entries × 8 bytes)

Page fault behavior

Valid = 0, address in a VMA: load page from disk/file, update PTE, re-execute instruction
Valid = 0, address NOT in any VMA: SIGSEGV → segfault
Permission violation: SIGSEGV → segfault
COW write: allocate new page, copy, update PTE, re-execute write

mmap flags

MAP_SHARED: writes visible to all, go to file/disk
MAP_PRIVATE: writes private (COW), don't go to disk
MAP_ANONYMOUS: not backed by a file (used by malloc for large allocations)
PROT_READ | PROT_WRITE | PROT_EXEC: permission bits on the mapping

malloc key concepts

Internal fragmentation: waste inside allocated blocks (alignment padding)
External fragmentation: free space exists but not contiguous enough
Coalescing: merge adjacent free blocks on free() to fight external fragmentation
Placement: first fit (fast), best fit (low fragmentation), next fit (balanced)

One-liner summaries

Virtual memory: abstraction giving each process a private address space, backed by physical RAM via MMU translation
Page fault: valid=0 in PTE → OS loads the page, re-executes the instruction. Program never notices.
COW: fork() copies page table only, marks all pages read-only. First write to a shared page causes a fault → OS copies just that page
mmap: maps a file (or anonymous memory) into the virtual address space. Pages loaded lazily on fault.
TLB: hardware cache of VPN→PPN translations. Makes address translation practically free.
Thrashing: working set > physical RAM → constant page faults → performance collapse

CSAPP Ch 9 Reference • Virtual Memory