Md. Monowarul Amin

Posted on Apr 8

How Does a 100 GB Game Run on 16 GB of RAM?

#pagetable #os #kernel #linux

How Does a 100 GB Game Run on 16 GB of RAM?

You just downloaded a massive open-world RPG — 120 GB on disk. Your PC has 16 GB of RAM. You hit Play, and somehow it works. No crash. No "not enough memory" error. Just the game running.

This is not a coincidence. This is one of the most beautifully engineered illusions in computer science — and today we are going to pull back the curtain on every layer of it.

The Chef's Counter

Before we go technical, let's build the right picture in your head.

Imagine your computer is a professional kitchen. Your 120 GB game is a giant grocery warehouse out back. Your 16 GB of RAM is the chef's counter — the small active workspace where actual cooking happens.

The chef never drags the entire warehouse onto the counter. She brings only what she needs for the dish she is making right now. The moment she is done with the swamp-biome ingredients, she sends them back and fetches the snowy-mountain ingredients.

You never notice the swap — unless the kitchen gets overwhelmed and your dish goes cold while she scrambles for ingredients. That cold dish? That is the stutter you feel in RAM-starved games.

RAM is your counter. Storage is your warehouse. The OS is the chef.

Everything we cover below is just the engineering that makes this chef extraordinarily fast and smart.

1. Virtual Address Space — Every Process Lives in Its Own Dream

When your game runs, it genuinely believes it owns all the memory in the machine. It does not.

What it has is a virtual address space — a private illusion created by the operating system and the CPU together. Each process gets its own clean, flat map of memory addresses. On a 64-bit system this map is enormous — hundreds of gigabytes of addressable space — even if your physical RAM is just 16 GB.

Why does this illusion matter?

Isolation — Your game cannot accidentally read or corrupt another process's memory. Each lives in its own bubble.
Overcommit — A process can claim far more address space than physically exists, because not all of it needs to live in RAM at once.
Simplicity — The game's code can reference assets at predictable addresses without caring where they physically sit in RAM, or whether they are even loaded yet.

Think of virtual addresses like seat numbers in a stadium. Your ticket says "Section C, Row 4, Seat 12" — but behind the scenes, the stadium manager can rearrange the actual chairs however they want, as long as you end up sitting down when you arrive.

2. Pages and Frames — The Unit of Currency

The OS does not manage memory one byte at a time — that would be impossibly slow. Instead, it cuts everything into fixed-size chunks.

A page is a 4 KB chunk of virtual address space.
A frame is a 4 KB chunk of physical RAM.

The OS maps pages to frames. That mapping — which virtual page lives in which physical frame right now — is the core bookkeeping job of the memory system.

4 KB is a deliberate sweet spot: small enough to be granular (you only load what you need), large enough that the bookkeeping overhead stays manageable.

A page is a seat number. A frame is an actual chair. The OS decides which number gets which chair.

3. The Page Table — The Address Book

For every running process, the OS maintains a data structure called a page table. It is the address book that translates every virtual page number to a physical frame number.

But here is the problem: if you have a 512 GB virtual address space and every page is 4 KB, you would need over 134 million entries in that address book. Storing that flat table for every process would consume gigabytes of RAM just for bookkeeping — before any actual program data.

The solution is a hierarchical page table — a tree structure where branches that are unused simply do not exist.

4. The Three-Level Page Table — A Map of Maps of Maps

The page table is organized as a three-level tree. Each level is a table of 512 entries. You use parts of the virtual address to index into each level, drilling down until you reach the physical frame.

Here is how a virtual address is split:

 Bits:  [38 ──── 30] [29 ──── 21] [20 ──── 12] [11 ──── 0]
        ──────────── ──────────── ──────────── ────────────
           L2 index    L1 index    L0 index    page offset
            9 bits      9 bits      9 bits      12 bits

Each 9-bit index can address 2⁹ = 512 entries. Three levels of 512 entries gives 512 × 512 × 512 = 134 million pages — enough to cover the full address space, without ever allocating the parts you do not use.

5. The Diagram — How the Walk Actually Happens

This is the heart of it. Every time your game reads or writes a memory address, the CPU does this walk:

  Virtual Address (39 bits used)
  ┌─────────────┬───────────┬───────────┬───────────┬──────────────┐
  │   unused    │  L2 idx   │  L1 idx   │  L0 idx   │ page offset  │
  │  (25 bits)  │  (9 bits) │  (9 bits) │  (9 bits) │  (12 bits)   │
  └─────────────┴─────┬─────┴─────┬─────┴─────┬─────┴──────────────┘
                      │           │           │
                      │           │           │
  satp ─────────►  L2 Table       │           │
                  ┌──────────┐    │           │
                  │  ...     │    │           │
                  │  entry ──┼────►  L1 Table │
                  │  ...     │   ┌──────────┐ │
                  └──────────┘   │  ...     │ │
                                 │  entry ──┼─►  L0 Table
                                 │  ...     │  ┌──────────┐
                                 └──────────┘  │  ...     │
                                               │  entry ──┼──► Physical Frame
                                               │  ...     │    + page offset
                                               └──────────┘         │
                                                                     ▼
                                                            Physical Address

Let's walk through each step:

Step 1 — Start at the root.
The CPU reads a special register (called satp on RISC-V, CR3 on x86) that holds the physical address of the top-level table (L2) for the currently running process. Every process has its own L2 root — this is how each process gets its private address space.

Step 2 — Index into L2 using bits [38:30] of the virtual address.
This selects one of 512 entries in the L2 table. That entry contains the physical address of the L1 table beneath it.

Step 3 — Index into L1 using bits [29:21].
Same process — select an entry, get the physical address of the L0 table.

Step 4 — Index into L0 using bits [20:12].
This gives you the final entry — the actual Page Table Entry (PTE) — which contains the Physical Page Number (PPN) of the real data frame sitting in RAM.

Step 5 — Form the physical address.
Combine the PPN from the PTE with the page offset (bits [11:0] of the original virtual address, copied unchanged):

  Physical Address = (PPN << 12)  |  page offset (VA[11:0])

If any entry along the way is marked not valid — meaning that page is not currently in RAM — the CPU raises a page fault and hands control to the OS to fix it. We will get to that soon.

6. What's Inside a Page Table Entry?

Every leaf entry (L0) is 64 bits wide and carries two things: where the page lives, and what you are allowed to do with it.

  63        54  53                    10  9           0
  ┌──────────┬──────────────────────────┬─────────────┐
  │ reserved │   PPN  (44 bits)         │   flags     │
  │ 10 bits  │   Physical Page Number   │   10 bits   │
  └──────────┴──────────────────────────┴─────────────┘

The flags are what make memory protection and lazy loading possible:

Flag	Name	Meaning
`V`	Valid	Is this page in RAM right now? If 0, any access triggers a page fault.
`R`	Read	The process may read this page.
`W`	Write	The process may write to this page. If 0 on a write → fault (used by Copy-on-Write).
`X`	Execute	The CPU may execute instructions from this page.
`U`	User	User-mode code may access this page.
`A`	Accessed	CPU sets this automatically on any read or write. Used by page replacement to find cold pages.
`D`	Dirty	CPU sets this on any write. A dirty page must be saved before eviction.

The V flag is the master switch. When the OS wants to evict a page from RAM, it clears V = 0. The next access triggers a fault. The OS loads the page back, sets V = 1, and the program resumes — never knowing anything happened.

7. Why Three Levels? The Sparse Tree Advantage

A flat single-level table for a 512 GB address space would need 134 million entries × 8 bytes = ~1 GB per process, just for the address book. Completely impractical.

With three levels, a process that only uses a small part of its address space needs almost nothing:

  1 × L2 table    (always required — one page, 4 KB)
  1 × L1 table    (only for the one L2 entry that is valid)
  1 × L0 table    (only for the one L1 entry that is valid)
  ─────────────────────────────────────────────────────────
  3 pages = 12 KB total   (instead of 1 GB)

All other L2 entries have V = 0 — no subtable exists beneath them, and the OS never allocates one. The tree is sparse by design.

The snowy mountain region of your game has no L1 subtable until you walk toward it. The moment you approach, the OS builds it on the fly.

8. Demand Paging — The Lazy Loader

This is the core trick that lets a 100 GB game run in 16 GB of RAM.

When your game launches, the OS does not load anything into RAM. It creates the virtual address space, builds a skeleton page table with every V flag set to 0, and returns control to the game immediately. Zero bytes of actual game content are in RAM yet.

The first time the game tries to access a texture or a piece of code, the CPU walks the page table, finds V = 0, and raises a page fault. The OS handler reads that one 4 KB page from disk into a free physical frame, updates the PTE with V = 1 and the correct PPN, then resumes the game from the exact instruction that faulted.

Your game never knew any of this happened. To it, the memory was just... there.

  Game accesses address 0x40001000
          │
          ▼
  CPU walks page table
          │
          ├── V = 1 ? ──► translation completes ──► done ✓
          │
          └── V = 0 ? ──► PAGE FAULT
                              │
                              ▼
                     OS page fault handler
                              │
                    1. Find a free physical frame
                    2. Read the page from disk / swap
                    3. Update PTE: set PPN + V = 1
                    4. Invalidate TLB entry
                              │
                              ▼
                   Resume game at the same instruction
                   (now succeeds) ✓

This is called demand paging — pages are loaded on demand, only when first touched.

9. The TLB — The CPU's Cheat Sheet

Walking three table levels on every single memory access would require three extra RAM reads before touching the actual data you wanted. For a CPU doing billions of operations per second, this would be catastrophic.

The CPU solves this with a tiny, blazingly fast hardware cache called the Translation Lookaside Buffer (TLB). It stores recent virtual-to-physical translations so the full three-level walk only happens on a miss.

  CPU wants to access virtual address VA
            │
            ▼
      Check TLB
            │
      ┌─────┴──────┐
    HIT             MISS
      │               │
      ▼               ▼
  PA ready        Walk all 3 table levels
  (~1 cycle)      Load result into TLB
                  (~10–100 cycles)
                        │
                        ▼  (or page fault if V = 0)
                    PA ready

In a healthy running game, TLB hit rates are above 99%. The full walk is the exception, not the rule.

When the OS evicts a page and clears V = 0 in the PTE, it also invalidates the TLB entry — otherwise the CPU would keep using the stale cached translation. On a multicore machine this means sending interrupts to every CPU core to flush their individual TLBs simultaneously. This is called a TLB shootdown and it is one of the most expensive operations a kernel ever performs.

10. Page Replacement — Who Gets Evicted?

When RAM is full and a new page must be loaded, the OS must evict an existing page to make room. Choosing which page is the page replacement problem.

The goal is to evict the page you are least likely to need soon. The most common real-world approach is an approximation of LRU (Least Recently Used) — evict whichever page has not been touched in the longest time.

This is where the A (Accessed) flag in the PTE becomes critical. The CPU sets it automatically on every read or write. The OS periodically scans PTEs, and pages whose A flag has stayed 0 are candidates for eviction.

  RAM is full. New page needed.
          │
          ▼
  Scan PTEs for cold pages (A = 0 for a long time)
          │
          ▼
  Select a victim page
          │
     ┌────┴────┐
   D = 0        D = 1
  (clean)      (dirty)
     │              │
     ▼              ▼
  Discard       Write to swap / disk first,
  it freely     then discard
     │              │
     └──────┬────────┘
            ▼
  Load the new page into the freed frame
  Update PTE → V = 1

Clean pages (D = 0) — pages that were only read, never written — can be discarded for free. They can always be reloaded from the original file. Dirty pages (D = 1) must be written to swap space first, which takes time and is why running out of RAM causes slowdowns rather than instant crashes.

Thrashing is what happens when RAM is so overwhelmed that the OS spends more time swapping pages in and out than actually running your game. The disk light stays permanently on, the game freezes, and the CPU is maxed out doing nothing useful.

11. Copy-on-Write — Sharing Without Spoiling

Here is a beautiful optimization. When the OS creates a copy of a process — like when your game engine spawns a child process to write a save file — naively copying every page in RAM would be incredibly slow.

Instead, the OS uses Copy-on-Write (CoW):

  Before any write — both processes share the same physical frames:

  Process A's PTE ────────────────────► Physical Frame X
  Process B's PTE ────────────────────► Physical Frame X
                                        (shared, both W = 0)

  Process B tries to write to this page:
           │
           ▼
       PAGE FAULT  (W = 0)
           │
           ▼
  OS allocates a new Frame Y
  Copies the content of Frame X into Frame Y
  Updates Process B's PTE → Frame Y, W = 1

  After the write:

  Process A's PTE ────────────────────► Physical Frame X  (untouched)
  Process B's PTE ────────────────────► Physical Frame Y  (private copy)

As long as both processes only read, they share the same frames — no copying needed. The copy happens lazily, one page at a time, only for pages that are actually modified.

This means forking a process is nearly instant, regardless of how much memory it uses. It is also how shared libraries work — every process that loads libc shares the same physical frames for its code pages, saving potentially hundreds of MB when many processes run side by side.

12. Page Migration — Moving Pages Transparently

Sometimes the OS needs to physically move a page from one location in RAM to another — without the running process noticing at all. This is called page migration.

Why would it ever need to do this?

Memory compaction — Free frames are scattered across RAM (fragmentation). To allocate a large contiguous block for device DMA or huge pages, the OS migrates pages to pack free space together.
NUMA optimization — On multi-socket servers, RAM physically closer to a CPU core is faster. The OS migrates pages to the memory bank nearest the thread that uses them most.
Memory hotplug — In cloud VMs, RAM can be added or removed while the machine runs. Pages must be migrated off a DIMM before it is unplugged.

The migration process:

  1. Isolate the source page
     (unlink it from LRU lists, pin it in place)
           │
           ▼
  2. Allocate a destination frame
           │
           ▼
  3. Copy page content
     source frame  ──────────────────►  destination frame
           │
           ▼
  4. Atomically update all PTEs
     that pointed to source → now point to destination
           │
           ▼
  5. TLB shootdown
     (invalidate stale translations on all CPU cores)
           │
           ▼
  6. Free the source frame

  The process that owns this page never noticed a thing.

The key guarantee is atomicity at the PTE level — the moment of switching old frame to new frame is instantaneous. Any access either sees the old location or the new one, never a half-migrated state.

13. Putting It All Together

Let's trace the full journey one more time, from you pressing Play to your game running smoothly.

  You launch the game
          │
          ▼
  OS creates the virtual address space
  Builds a skeleton page table — all V = 0
  No RAM actually allocated yet
          │
          ▼
  CPU executes first instruction → PAGE FAULT
  OS loads the first pages of code from disk
  Game begins
          │
          ▼
  You run through the game world
  Each new area triggers page faults
  OS loads pages on demand (demand paging)
  TLB caches translations for hot pages
          │
          ▼
  RAM fills up
  OS scans A / D flags, finds cold pages
  Evicts them — dirty ones go to swap first
  Makes room for new pages
          │
          ▼
  You revisit an old area
  Those pages were evicted → page fault again
  OS reloads from disk / swap → game continues
          │
          ▼
  You save the game
  OS forks a child process (Copy-on-Write)
  Child writes the save file
  Only the pages the child actually modifies get copied
  Parent process continues untouched
          │
          ▼
  The game runs smoothly in 16 GB RAM
  Even though the world is 120 GB on disk

The entire illusion is maintained by five interlocking systems — virtual address spaces, page tables, demand paging, the TLB, and the page fault handler — working in concert at hardware speed, completely invisible to your game.

Quick Reference

Concept	What it does
Virtual address space	Gives every process its own private, flat map of memory
Page / Frame	4 KB chunk of virtual memory / physical RAM
Page table (3-level tree)	Maps virtual page numbers to physical frame numbers
PTE	One leaf entry — holds the Physical Page Number + permission flags
V flag	Master switch — 0 means "not in RAM", triggers a page fault on access
Demand paging	Load pages from disk only when first accessed, not at launch
Page fault	CPU's signal that a page is missing or a permission was violated
TLB	Hardware cache of recent VA → PA translations — avoids re-walking the tree
TLB shootdown	Invalidating TLB entries across all CPU cores after a PTE change
Page replacement	Choosing which page to evict when RAM is full (LRU approximation)
Dirty page (D = 1)	A written page — must be saved to swap before eviction
Copy-on-Write	Share physical frames between processes until one writes — then copy just that page
Page migration	Physically move a page to a different frame, transparently, while the process runs

The next time your game pauses for half a second loading a new area, you know exactly what is happening: the chef is running to the warehouse.

DEV Community

How Does a 100 GB Game Run on 16 GB of RAM?

How Does a 100 GB Game Run on 16 GB of RAM?

The Chef's Counter

1. Virtual Address Space — Every Process Lives in Its Own Dream

2. Pages and Frames — The Unit of Currency

3. The Page Table — The Address Book

4. The Three-Level Page Table — A Map of Maps of Maps

5. The Diagram — How the Walk Actually Happens

6. What's Inside a Page Table Entry?

7. Why Three Levels? The Sparse Tree Advantage

8. Demand Paging — The Lazy Loader

9. The TLB — The CPU's Cheat Sheet

10. Page Replacement — Who Gets Evicted?

11. Copy-on-Write — Sharing Without Spoiling

12. Page Migration — Moving Pages Transparently

13. Putting It All Together

Quick Reference

Top comments (0)