Create account

DEV Community

Athreya aka Maneshwar

Posted on Mar 2

Table Record and Key Format in SQLite

#webdev #programming #database #architecture

Hello, I'm Maneshwar. I'm working on git-lrc: a Git hook for Checking AI generated code.

Yesterday we discussed internal datatypes and how the VM composes values into records. Today we look at what those records actually look like on disk.

When SQLite stores a row inside a table B+-tree, it does not write columns one by one in some naive layout. It constructs a carefully structured byte sequence.

Table Record Format

A table record (row data) consists of two logical parts:

Header
Record Image (Data Section)

The header begins with a size field. This value tells us how many bytes exist before the first data item (Data1). Importantly, this size includes the bytes used to encode the size field itself.

The size is stored as a variable-length 64-bit integer using SQLite’s varint encoding (a Huffman-style compact representation). In effect, the size field acts as a pointer to where the actual data begins.

After the size field, the header contains one type field per column, in the same order as declared in the CREATE TABLE statement. SQLite never rearranges column order internally.

Each Type i field is also stored as a variable-length unsigned integer. This value encodes the storage class and size of the corresponding Data i.

So the header is essentially:

[header size][type1][type2][type3]...

Then comes the record image:

[data1][data2][data3]...

The VM reads the header first, learns how many bytes to skip to reach the data section, and understands how to interpret each column based on its type code.

Zero-Length Data

Certain type codes represent values that occupy zero bytes in the data section.

For example, type values:

do not store any payload in the record image.

This is possible because the type code itself fully determines the value. For example:

NULL requires no data bytes.
Some integer constants have special encodings.

This design saves space and keeps records compact.

Why Column Order Matters

Since SQLite does not reorder columns internally, the physical order of columns in the record exactly matches the schema definition.

Because records may overflow onto additional pages, placing:

Smaller
Frequently accessed

columns earlier in the schema can reduce the need to chase overflow chains.

This is a practical storage optimization — one that many developers overlook.

Table Key Format

Every B+-tree in SQLite must have a unique key.

Relational theory says tables do not contain duplicate rows. But in practice, users may insert identical rows unless a UNIQUE constraint exists.

SQLite must still differentiate them internally.

To do this, every table has a unique primary key:

Either explicitly defined by the schema
Or automatically created by SQLite

If not specified, SQLite assigns a hidden 64-bit signed integer key called the rowid.

The rowid guarantees uniqueness for every row in the table B+-tree.

So even if two rows contain identical column values, their rowids will differ — and therefore their keys in the B+-tree will differ.

This rowid is not just a logical identifier. It is physically used as the key in the table’s B+-tree structure.

What Comes Next

We’ve now seen:

How table records are structured
How headers encode type and size
Why variable-length encoding keeps files compact
Why every table must have a unique key

In the next post, we will dive into the rowid itself:

The Rowid column
How rowid values are generated
How rowid is represented on disk

AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.
⭐ Star it on GitHub:

HexmosTech / git-lrc

Free, Micro AI Code Reviews That Run on Commit

git-lrc

Free, Micro AI Code Reviews That Run on Commit

AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.

git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

See It In Action

See git-lrc catch serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements

git-lrc-intro-60s.mp4

Why

🤖 AI agents silently break things. Code removed. Logic changed. Edge cases gone. You won't notice until production.
🔍 Catch it before it ships. AI-powered inline comments show you exactly what changed and what looks wrong.
…

View on GitHub