Athreya aka Maneshwar

Posted on Feb 5

The Tree Module: Where Pages Turn Into Rows in SQLite

#webdev #database #programming #architecture

Hello, I'm Maneshwar. I'm working on FreeDevTools online currently building "one place for all dev tools, cheat codes, and TLDRs" — a free, open-source hub where developers can quickly find and use tools without any hassle of searching all over the internet.

In the previous chapter, we stopped at the pager. The pager gave SQLite something extremely important but also very limited: a page-oriented view of a byte stream.

You could ask for page 17, modify it, pin it, unpin it—but the pager had no idea what a row, column, or record was.

That gap is exactly where the tree module steps in.

The tree module takes this flat, page-oriented world and turns it into something the VM can reason about: a tuple-oriented, key-sequenced file.

From this point onward, the VM no longer thinks in terms of pages. It thinks in terms of tuples—ordered, searchable, insertable units of data.

What Exactly Is a Tuple in SQLite?

Before going any further, it’s worth slowing down and clearing up a very common confusion.

A tuple in SQLite is not a datatype.

It’s also not a C struct, not a row object, and not something the tree module tries to interpret semantically.

A tuple is simply:

A logical row of a relation as seen by the VM
Represented internally as a variable-length byte sequence
Treated as opaque data by the tree module

In other words, the tree module does not care whether a tuple represents:

A table row with columns
An index entry with key + rowid
Metadata from sqlite_master

To the tree module, all of these are just byte strings with an associated key.

Meaning is imposed above the tree module, primarily by the VM and the record format logic.

Why Tuples Need Organization at All

A relation is conceptually a set of tuples, but sets alone are not enough for efficient access. The VM needs guarantees:

It must be able to find tuples by key
It must be able to scan tuples in order
It must be able to insert and delete tuples efficiently
It must keep tuples of different relations separate, even though they live in the same file

There are many classic ways to organize tuples:

Entry sequence (append-only)
Relative positioning
Hash-based organization
Key-sequenced organization

Each of these choices affects how inserts, lookups, and deletes behave.

SQLite makes a very opinionated choice here.

SQLite’s One Big Design Decision: Everything Is a Tree

SQLite uses exactly one organization technique for persistent data:

B+-trees for tables
B-trees for indexes

No hash tables.
No heap files.
No clustered secondary structures.

Every relation—whether it’s a table or an index—gets its own tree.

A few important consequences fall out of this design:

Each tree corresponds to exactly one relation or index
All trees live inside one database file
Pages from different trees can be interspersed on disk
No single page ever contains tuples from multiple trees

That last point is crucial. Even though pages are shared at the file level, ownership is strict at the tree level.

Enforcing this is one of the primary responsibilities of the tree module.

From Page-Oriented to Tree-Oriented

The pager exposes a file as a collection of numbered pages.

The tree module builds on top of that and exposes something much richer:

A tree-oriented file
Supporting ordered access
Supporting tuple-level insert, delete, and lookup

Internally, the tree module:

Chooses which pages belong to which tree
Maintains parent/child relationships
Ensures balance properties of B+-trees
Decides when pages split or merge

To the pager, pages are just pages.

To the tree module, pages are nodes.

The Tree Module Is Deliberately Passive

One subtle but important design choice: the tree module is structurally active but semantically passive.

What it does care about are Keys, Ordering, Page layout, Tree balance and Navigation

What it does not care about are Column types, SQL schemas, Constraints, Meanings of fields inside a tuple

It simply stores and retrieves variable-length byte strings and keeps them in order.

This separation of concerns is what allows SQLite’s VM to evolve independently of its storage engine.

Where Tree Metadata Lives

SQLite still needs a way to answer questions like:

Which tree belongs to which table?
Where is the root page of this tree?
Is this tree a table or an index?

That mapping information is stored in a special, well-known relation:

sqlite_master (or sqlite_temp_master for temp objects)

This catalog itself is stored in a predetermined B+-tree, just like everything else.

SQLite does not cheat by storing metadata “outside” the system—it eats its own dog food.

Where We’re Heading Next

So far, we’ve established the big picture:

Pager → pages
Tree module → tuples
VM → SQL semantics

In the next few days, we’ll zoom in further and get concrete:

The Tree Interface Functions — how the VM actually talks to trees
B+-Tree Structure — interior nodes, leaf nodes, separators
Page Structure — how tuples are packed, indexed, and navigated inside a single page

That’s where diagrams start making sense and the real elegance of SQLite’s design shows up.

For now, the key idea to carry forward is this: