DEV Community

Cover image for Structure of a Cell in SQLite
Athreya aka Maneshwar
Athreya aka Maneshwar

Posted on

Structure of a Cell in SQLite

Hello, I'm Maneshwar. I'm working on git-lrc: a Git hook for Checking AI generated code.
AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

Today we finally reach the most granular unit of the B+-tree:

The cell — the actual byte-level representation of keys and rows.

Everything we’ve studied so far page headers, pointer arrays, overflow chains, balancing — exists to manage these variable-length byte strings safely and efficiently.

Structure of a Cell

A cell is a variable-length byte string stored inside the cell content area of a tree page.

Each cell represents (a part of) a single payload. That payload is composed of: a key and an optional data portion

The exact layout of a cell depends on:

  • Whether the page is internal or leaf
  • Whether the tree is a table tree or an index tree

The generic structure is shown here:

fig 6.8 structure of a cell

One of the most important things to notice is that several fields are marked as var. These are variable-length integers, called variants.

Cells are not fixed-size structures. Their layout adapts depending on what kind of tree they belong to.

Four Variations of a Cell

Let’s connect the abstract structure to SQLite’s four tree page types.

1. Table Tree — Internal Page

A cell contains:

  • 4-byte left child page number
  • Variant storing the rowid value

There is no payload stored here. Internal table nodes only route searches based on rowid.

2 Table Tree — Leaf Page

A cell contains:

  • Variant for total length of the row record
  • Variant for rowid
  • Part of the row record
  • 4-byte overflow page number (if needed)

This is where actual table rows live.

Index Tree — Internal Page

A cell contains:

  • 4-byte left child page number
  • Variant for total key length
  • Part of the key
  • 4-byte overflow page number (if needed)

Internal index pages route based on key values rather than rowid.

4. Index Tree — Leaf Page

A cell contains:

  • Variant for total key length
  • Part of the key
  • 4-byte overflow page number (if needed)

Leaf index pages store only keys. No separate data portion exists.

fig 6.9 cell organization

Notice that keys or data may be partially stored in the cell, with the remainder spilling to overflow pages.

Variant Integers: SQLite’s Compact Encoding

SQLite uses a variable-length integer encoding to represent:

  • Key sizes
  • Data sizes
  • Rowid values

A variant integer uses between 1 and 9 bytes.

Each byte contributes:

  • 7 bits for the integer value
  • 1 bit (the highest bit) indicating continuation

The integer is formed by reading consecutive bytes whose high bit is set, stopping at the first byte whose high bit is clear.

The most significant byte appears first.

As a special case, if 9 bytes are used, all 8 bits of the 9th byte are part of the value

This encoding allows:

  • Small integers (very common) to be stored in 1 or 2 bytes
  • Large 64-bit integers to be stored in at most 9 bytes

It is essentially a Huffman-style encoding, optimized for common small values while still supporting full 64-bit range.

This design dramatically reduces space overhead for rowids and payload sizes.

Payload Restrictions and Embedded Fractions

You might assume that if a page has enough free space, a payload would always be stored fully inside it.

SQLite does not do that.

Instead, it enforces maximum embedded payload fractions, which are stored in the file header:

  • Offset 21 → max embedded payload fraction
  • Offset 22 → min embedded payload fraction
  • Offset 23 → min leaf payload fraction

These values control how much of a payload is allowed to reside directly on the page.

If a payload exceeds the allowed embedded fraction:

  • SQLite stores part of it in the cell
  • The remainder spills into overflow pages

Even if there is technically enough space, SQLite may still spill data to overflow to maintain page balance and future insertion flexibility.

Overflow Pages

Large payloads are stored across overflow pages, which form a singly linked list.

Each overflow page:

  • Uses its first 4 bytes to store the next overflow page number
  • Uses the rest of the page for payload data
  • Except the last page, which may contain unused trailing space

Overflow pages never mix content from multiple payloads.

The chain continues until the entire payload is stored.

This ensures:

  • Large rows do not monopolize tree pages
  • Tree pages remain balanced and efficient
  • Insert/delete operations remain predictable

The Overflow Calculation Rules

SQLite uses specific formulas to decide how much payload to keep on the page and how much to spill.

For example, on a leaf table-tree page:

  • If payload size p < (u - 36), store it fully on page
  • Otherwise compute a minimum local payload M
  • Store a calculated portion locally
  • Spill the rest to overflow pages

The formulas are carefully chosen to:

  • Maintain minimum occupancy
  • Avoid pathological fragmentation
  • Keep cells within allowed bounds

Internal table-tree pages never store payload or overflow.

Index trees follow slightly different thresholds, tuned for key storage.

The Big Picture

At this point, we have reached the lowest structural level of SQLite’s B+-tree system:

  • Pages
  • Headers
  • Pointer arrays
  • Free blocks
  • Cells

Every SQL operation — insert, update, delete — eventually becomes:

  • Modify a cell
  • Update pointer arrays
  • Possibly allocate overflow pages
  • Mark pages dirty in the pager
  • Journal changes for recovery

The abstraction layers now fully connect:

We’ve now finished dissecting the physical structure of B+-tree pages.

In the next post, we step back up one level and examine:

The Tree Module Functionalities

  • Control data structures
  • The Btree structure
  • The BtShared structure

That’s where the runtime control layer meets the structural layout we’ve just explored.

git-lrc

👉 Check out: git-lrc
Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.
⭐ Star it on GitHub:

GitHub logo HexmosTech / git-lrc

Free, Unlimited AI Code Reviews That Run on Commit

git-lrc

Free, Unlimited AI Code Reviews That Run on Commit

AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.

git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

See It In Action

See git-lrc catch serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements

git-lrc-intro-60s.mp4

Why

  • 🤖 AI agents silently break things. Code removed. Logic changed. Edge cases gone. You won't notice until production.
  • 🔍 Catch it before it ships. AI-powered inline comments show you exactly what changed and what looks wrong.
  • 🔁 Build a habit, ship better code. Regular review → fewer bugs → more robust code → better results in your team.
  • 🔗 Why git? Git is universal. Every editor, every IDE, every AI…




Top comments (0)