Athreya aka Maneshwar

Posted on Feb 1 • Edited on Mar 7

Cache Organization: How SQLite Actually Holds Pages in Memory

#webdev #programming #database #architecture

Hello, I'm Maneshwar. I'm building git-lrc, an AI code reviewer that runs on every commit. It is free, unlimited, and source-available on Github. Star Us to help devs discover the project. Do give it a try and share your feedback for improving the product.

So far, we’ve talked about the pager as a transaction manager and a gatekeeper.
Today’s learning zooms in further, into the physical organization of the page cache itself, the place where database pages actually live while SQLite is working.

This is the level where “a page” stops being an abstract idea and becomes a real chunk of memory with structure, metadata, and rules.

One Pager, One Cache, One Manager

Every pager owns exactly one page cache, and that cache is managed through a handler object called PCache.

The pager doesn’t directly manage cache internals. Instead, it holds a reference to a PCache object and talks to it through a well-defined interface. This indirection exists for a reason: SQLite supports pluggable cache implementations.

If a user supplies their own cache module, SQLite will use it.
If not, SQLite falls back to its default implementation (pcache1.c), which is what we’re effectively studying here.

Inside the PCache object, there’s a pointer (pCache) that refers to the actual cache engine being used. From the pager’s point of view, the cache is a black box fast, predictable, and replaceable.

How Pages Are Organized: Fully Associative, Hash-Based

Searching a cache efficiently is critical, so SQLite doesn’t store pages in a flat list.

Instead, cached pages are stored in slots, indexed through a hash table.

The hash table starts empty. As pages are requested, slots are created and inserted into the table. The total number of slots is bounded by PCache.nMax.

By default:

2000 pages for the main and attached databases
500 pages for the temp database
Unlimited for in-memory databases (bounded only by available address space)

This design keeps cache lookups fast while preventing unbounded memory growth.

PgHdr: The Pager’s View of a Page

Each cached page is represented by a PgHdr object.

This object is only visible to the pager, not to the tree module or higher layers. The tree module sees only raw page memory.

A PgHdr carries everything the pager needs to enforce correctness:

pgno → which database page this is
dirty → whether it has been modified
needSync → whether the journal must be flushed before writing it back
nRef → reference count (pinning)
pDirtyNext / pDirtyPrev → links in the dirty-page list

If nRef > 0, the page is pinned — actively in use and untouchable.
If nRef == 0, the page is unpinned and eligible for reuse.

Inside SQLite’s Default Cache (`PCache1`)

SQLite’s built-in cache (PCache1) adds its own layer.

Each hash-table slot is represented by a PgHdr1 object. The slot layout in memory looks roughly like this:

[ PgHdr | page image | private space | (optional recovery pointers) ]

The page image holds the raw database page
The private space is used by the tree module for per-page in-memory state
For in-memory databases, recovery metadata is also stored here

This entire block is zero-initialized when a page enters the cache, ensuring no stale state leaks across page reuse.

All slots are reachable through the PCache1.apHash array, where each entry points to a bucket implemented as an unordered singly linked list.

Simple. Fast. No clever tricks.

Cache Groups: Sharing Memory Under Pressure

SQLite also supports an optional cache group mode.

When enabled:

Multiple PCache1 instances are placed into a group
Unpinned pages from one cache can be recycled by another
Memory pressure is handled globally instead of per-cache

This is particularly useful when many connections exist in the same process and memory is tight.

Pinned pages are still sacred only unpinned slots are shared.

Cache Reads: Content-Addressed, Not Location-Based

A cache is not an array you index into.

Clients never know:

Where a page lives
Which slot it occupies
Whether it was recently evicted

The cache is content-addressed.

Pages are requested by page number, not by memory address.

When the tree module wants a page, it calls sqlite3PagerGet(P) — and the pager does the rest.

What Happens When a Page Is Read

SQLite follows a strict fetch-on-demand policy.
No speculative reads. No read-ahead. No clever guessing.

Pin, Use, Unpin: The Core Contract

Once the pager returns a page:

The pager does not track how long it’s used
The tree module owns it temporarily

The contract is simple:

Pager pins the page
Client uses the page
Client calls sqlite3PagerUnref
Page becomes recyclable again

Pinned pages cannot be evicted.
To avoid deadlock-by-design, SQLite enforces a minimum cache size (10 pages as of SQLite 3.7.8) so there’s always at least some room to maneuver.

What’s Next

Now that pages can be fetched safely, the next logical question is:

What happens when they change?

In the next post, we’ll cover:

Cache update rules
How dirty pages are tracked
Replacement policies
And how SQLite balances performance with safety

That’s where caching stops being passive and starts shaping execution.

My experiments and hands-on executions related to SQLite will live here: lovestaco/sqlite

References:

SQLite Database System: Design and Implementation. N.p.: Sibsankar Haldar, (n.d.).

*AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.

git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.*

Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.

⭐ Star it on GitHub:

HexmosTech / git-lrc

Free, Unlimited AI Code Reviews That Run on Commit

git-lrc

Free, Unlimited AI Code Reviews That Run on Commit

AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.

git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

See It In Action

See git-lrc catch serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements

git-lrc-intro-60s.mp4

Why

🤖 AI agents silently break things. Code removed. Logic changed. Edge cases gone. You won't notice until production.
🔍 Catch it before it ships. AI-powered inline comments show you exactly what changed and what looks wrong.
🔁 Build a habit, ship better code. Regular review → fewer bugs → more robust code → better results in your team.
🔗 Why git? Git is universal. Every editor, every IDE, every AI…

View on GitHub

DEV Community

Cache Organization: How SQLite Actually Holds Pages in Memory

One Pager, One Cache, One Manager

How Pages Are Organized: Fully Associative, Hash-Based

PgHdr: The Pager’s View of a Page

Inside SQLite’s Default Cache (`PCache1`)

Cache Groups: Sharing Memory Under Pressure

Cache Reads: Content-Addressed, Not Location-Based

What Happens When a Page Is Read

Pin, Use, Unpin: The Core Contract

What’s Next

References:

HexmosTech / git-lrc

Free, Unlimited AI Code Reviews That Run on Commit

See It In Action

Why

Top comments (0)

One Pager, One Cache, One Manager

How Pages Are Organized: Fully Associative, Hash-Based

PgHdr: The Pager’s View of a Page

Inside SQLite’s Default Cache (PCache1)

Cache Groups: Sharing Memory Under Pressure

Cache Reads: Content-Addressed, Not Location-Based

What Happens When a Page Is Read

Pin, Use, Unpin: The Core Contract

What’s Next

References:

HexmosTech / git-lrc

Free, Unlimited AI Code Reviews That Run on Commit

See It In Action

Why

Inside SQLite’s Default Cache (`PCache1`)