Hello, I'm Maneshwar. I'm working on git-lrc: a Git hook for Checking AI generated code.
AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.
Today we finally reach the most granular unit of the B+-tree:
The cell — the actual byte-level representation of keys and rows.
Everything we’ve studied so far page headers, pointer arrays, overflow chains, balancing — exists to manage these variable-length byte strings safely and efficiently.
Structure of a Cell
A cell is a variable-length byte string stored inside the cell content area of a tree page.
Each cell represents (a part of) a single payload. That payload is composed of: a key and an optional data portion
The exact layout of a cell depends on:
- Whether the page is internal or leaf
- Whether the tree is a table tree or an index tree
The generic structure is shown here:
One of the most important things to notice is that several fields are marked as var. These are variable-length integers, called variants.
Cells are not fixed-size structures. Their layout adapts depending on what kind of tree they belong to.
Four Variations of a Cell
Let’s connect the abstract structure to SQLite’s four tree page types.
1. Table Tree — Internal Page
A cell contains:
- 4-byte left child page number
- Variant storing the rowid value
There is no payload stored here. Internal table nodes only route searches based on rowid.
2 Table Tree — Leaf Page
A cell contains:
- Variant for total length of the row record
- Variant for rowid
- Part of the row record
- 4-byte overflow page number (if needed)
This is where actual table rows live.
Index Tree — Internal Page
A cell contains:
- 4-byte left child page number
- Variant for total key length
- Part of the key
- 4-byte overflow page number (if needed)
Internal index pages route based on key values rather than rowid.
4. Index Tree — Leaf Page
A cell contains:
- Variant for total key length
- Part of the key
- 4-byte overflow page number (if needed)
Leaf index pages store only keys. No separate data portion exists.
Notice that keys or data may be partially stored in the cell, with the remainder spilling to overflow pages.
Variant Integers: SQLite’s Compact Encoding
SQLite uses a variable-length integer encoding to represent:
- Key sizes
- Data sizes
- Rowid values
A variant integer uses between 1 and 9 bytes.
Each byte contributes:
- 7 bits for the integer value
- 1 bit (the highest bit) indicating continuation
The integer is formed by reading consecutive bytes whose high bit is set, stopping at the first byte whose high bit is clear.
The most significant byte appears first.
As a special case, if 9 bytes are used, all 8 bits of the 9th byte are part of the value
This encoding allows:
- Small integers (very common) to be stored in 1 or 2 bytes
- Large 64-bit integers to be stored in at most 9 bytes
It is essentially a Huffman-style encoding, optimized for common small values while still supporting full 64-bit range.
This design dramatically reduces space overhead for rowids and payload sizes.
Payload Restrictions and Embedded Fractions
You might assume that if a page has enough free space, a payload would always be stored fully inside it.
SQLite does not do that.
Instead, it enforces maximum embedded payload fractions, which are stored in the file header:
- Offset 21 → max embedded payload fraction
- Offset 22 → min embedded payload fraction
- Offset 23 → min leaf payload fraction
These values control how much of a payload is allowed to reside directly on the page.
If a payload exceeds the allowed embedded fraction:
- SQLite stores part of it in the cell
- The remainder spills into overflow pages
Even if there is technically enough space, SQLite may still spill data to overflow to maintain page balance and future insertion flexibility.
Overflow Pages
Large payloads are stored across overflow pages, which form a singly linked list.
Each overflow page:
- Uses its first 4 bytes to store the next overflow page number
- Uses the rest of the page for payload data
- Except the last page, which may contain unused trailing space
Overflow pages never mix content from multiple payloads.
The chain continues until the entire payload is stored.
This ensures:
- Large rows do not monopolize tree pages
- Tree pages remain balanced and efficient
- Insert/delete operations remain predictable
The Overflow Calculation Rules
SQLite uses specific formulas to decide how much payload to keep on the page and how much to spill.
For example, on a leaf table-tree page:
- If payload size
p < (u - 36), store it fully on page - Otherwise compute a minimum local payload
M - Store a calculated portion locally
- Spill the rest to overflow pages
The formulas are carefully chosen to:
- Maintain minimum occupancy
- Avoid pathological fragmentation
- Keep cells within allowed bounds
Internal table-tree pages never store payload or overflow.
Index trees follow slightly different thresholds, tuned for key storage.
The Big Picture
At this point, we have reached the lowest structural level of SQLite’s B+-tree system:
- Pages
- Headers
- Pointer arrays
- Free blocks
- Cells
Every SQL operation — insert, update, delete — eventually becomes:
- Modify a cell
- Update pointer arrays
- Possibly allocate overflow pages
- Mark pages dirty in the pager
- Journal changes for recovery
The abstraction layers now fully connect:
We’ve now finished dissecting the physical structure of B+-tree pages.
In the next post, we step back up one level and examine:
The Tree Module Functionalities
- Control data structures
- The
Btreestructure- The
BtSharedstructure
That’s where the runtime control layer meets the structural layout we’ve just explored.
👉 Check out: git-lrc
Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.
⭐ Star it on GitHub:
HexmosTech
/
git-lrc
Free, Unlimited AI Code Reviews That Run on Commit
git-lrc
Free, Unlimited AI Code Reviews That Run on Commit
AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.
See It In Action
See git-lrc catch serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements
git-lrc-intro-60s.mp4
Why
- 🤖 AI agents silently break things. Code removed. Logic changed. Edge cases gone. You won't notice until production.
- 🔍 Catch it before it ships. AI-powered inline comments show you exactly what changed and what looks wrong.
- 🔁 Build a habit, ship better code. Regular review → fewer bugs → more robust code → better results in your team.
- 🔗 Why git? Git is universal. Every editor, every IDE, every AI…



Top comments (0)