DEV Community

Cover image for Cache Organization: How SQLite Actually Holds Pages in Memory
Athreya aka Maneshwar
Athreya aka Maneshwar

Posted on

Cache Organization: How SQLite Actually Holds Pages in Memory

Hello, I'm Maneshwar. I'm working on FreeDevTools online currently building "one place for all dev tools, cheat codes, and TLDRs" — a free, open-source hub where developers can quickly find and use tools without any hassle of searching all over the internet.

So far, we’ve talked about the pager as a transaction manager and a gatekeeper.
Today’s learning zooms in further, into the physical organization of the page cache itself, the place where database pages actually live while SQLite is working.

This is the level where “a page” stops being an abstract idea and becomes a real chunk of memory with structure, metadata, and rules.

One Pager, One Cache, One Manager

Every pager owns exactly one page cache, and that cache is managed through a handler object called PCache.

The pager doesn’t directly manage cache internals. Instead, it holds a reference to a PCache object and talks to it through a well-defined interface. This indirection exists for a reason: SQLite supports pluggable cache implementations.

If a user supplies their own cache module, SQLite will use it.
If not, SQLite falls back to its default implementation (pcache1.c), which is what we’re effectively studying here.

Inside the PCache object, there’s a pointer (pCache) that refers to the actual cache engine being used. From the pager’s point of view, the cache is a black box fast, predictable, and replaceable.

How Pages Are Organized: Fully Associative, Hash-Based

Searching a cache efficiently is critical, so SQLite doesn’t store pages in a flat list.

Instead, cached pages are stored in slots, indexed through a hash table.

The hash table starts empty. As pages are requested, slots are created and inserted into the table. The total number of slots is bounded by PCache.nMax.

By default:

  • 2000 pages for the main and attached databases
  • 500 pages for the temp database
  • Unlimited for in-memory databases (bounded only by available address space)

This design keeps cache lookups fast while preventing unbounded memory growth.

image

PgHdr: The Pager’s View of a Page

Each cached page is represented by a PgHdr object.

This object is only visible to the pager, not to the tree module or higher layers. The tree module sees only raw page memory.

A PgHdr carries everything the pager needs to enforce correctness:

  • pgno → which database page this is
  • dirty → whether it has been modified
  • needSync → whether the journal must be flushed before writing it back
  • nRef → reference count (pinning)
  • pDirtyNext / pDirtyPrev → links in the dirty-page list

If nRef > 0, the page is pinned — actively in use and untouchable.
If nRef == 0, the page is unpinned and eligible for reuse.

image

Inside SQLite’s Default Cache (PCache1)

SQLite’s built-in cache (PCache1) adds its own layer.

Each hash-table slot is represented by a PgHdr1 object. The slot layout in memory looks roughly like this:

[ PgHdr | page image | private space | (optional recovery pointers) ]
Enter fullscreen mode Exit fullscreen mode
  • The page image holds the raw database page
  • The private space is used by the tree module for per-page in-memory state
  • For in-memory databases, recovery metadata is also stored here

This entire block is zero-initialized when a page enters the cache, ensuring no stale state leaks across page reuse.

All slots are reachable through the PCache1.apHash array, where each entry points to a bucket implemented as an unordered singly linked list.

Simple. Fast. No clever tricks.

Cache Groups: Sharing Memory Under Pressure

SQLite also supports an optional cache group mode.

When enabled:

  • Multiple PCache1 instances are placed into a group
  • Unpinned pages from one cache can be recycled by another
  • Memory pressure is handled globally instead of per-cache

This is particularly useful when many connections exist in the same process and memory is tight.

Pinned pages are still sacred only unpinned slots are shared.

Cache Reads: Content-Addressed, Not Location-Based

A cache is not an array you index into.

Clients never know:

  • Where a page lives
  • Which slot it occupies
  • Whether it was recently evicted

The cache is content-addressed.

Pages are requested by page number, not by memory address.

When the tree module wants a page, it calls sqlite3PagerGet(P) — and the pager does the rest.

What Happens When a Page Is Read

SQLite follows a strict fetch-on-demand policy.
No speculative reads. No read-ahead. No clever guessing.

Pin, Use, Unpin: The Core Contract

Once the pager returns a page:

  • The pager does not track how long it’s used
  • The tree module owns it temporarily

The contract is simple:

  1. Pager pins the page
  2. Client uses the page
  3. Client calls sqlite3PagerUnref
  4. Page becomes recyclable again

Pinned pages cannot be evicted.
To avoid deadlock-by-design, SQLite enforces a minimum cache size (10 pages as of SQLite 3.7.8) so there’s always at least some room to maneuver.

What’s Next

Now that pages can be fetched safely, the next logical question is:

What happens when they change?

In the next post, we’ll cover:

  • Cache update rules
  • How dirty pages are tracked
  • Replacement policies
  • And how SQLite balances performance with safety

That’s where caching stops being passive and starts shaping execution.

My experiments and hands-on executions related to SQLite will live here: lovestaco/sqlite

References:

SQLite Database System: Design and Implementation. N.p.: Sibsankar Haldar, (n.d.).

FreeDevTools

👉 Check out: FreeDevTools

Any feedback or contributors are welcome!

It’s online, open-source, and ready for anyone to use.

⭐ Star it on GitHub: freedevtools

Top comments (0)