You fire up a MySQL client, connect to port 3306, send off your SQL, and the server parses, optimizes, hits an index, fetches rows, and packs the result back to you. You can picture that entire pipeline.
SQLite has none of that. No server process, no port, no wire protocol. Just a single file: my.db.
So the real question is — what exactly is stuffed inside that file that makes SELECT * FROM apples WHERE color='Yellow' return the right answer?
TinySqlite takes this apart across 740 lines of C++23. It doesn't link against the official SQLite library. It opens a .db file's raw binary and pries the data directly out of the disk bytes. We'll follow its code path, peeling back SQLite's file format layer by layer.
This article covers: file header → B-tree pages → varint encoding → the schema table → full table scans → index scans.
What SQLite Actually Is
Let's get the definition straight first.
SQLite is an embedded relational database engine — in plain English: it's a C library you compile into your program, and once you open a file, you can run SQL against it. No server, no install, no root password.
If you're familiar with MySQL, here's the mental model. MySQL is a restaurant — a dedicated kitchen (server process), waitstaff (connection handling), a complex ordering system (query optimizer). You sit down, say "SELECT," and the back of house scrambles to bring you the dish.
SQLite is your fridge. Open it, grab what you need, nobody serves you. The entire database is a single data.db file. Copy it, carry it, done.
Traditional client-server database: SQLite's embedded model:
┌─────────┐ TCP/network ┌───────────┐ ┌──────────────────────────────┐
│ Your app │ ←───────────→ │ DB server │ │ Your app │
└─────────┘ └───────────┘ │ ├── libsqlite.so (engine) │
│ ├── data.db (the only file) │
│ └── all ops are local calls │
└──────────────────────────────┘
Why should you care? You might not encounter MySQL every day, but you're almost certainly already using SQLite. Your phone's contacts, WeChat messages, Chrome bookmarks and browsing history — all stored in SQLite. Every iPhone, every Android device, every browser runs a SQLite instance. It's probably the most deployed database engine on the planet, bar none.
Using it is trivial. Create a database, make a table, insert data, query:
$ sqlite3 test.db
sqlite> CREATE TABLE fruits (name TEXT, price INT);
sqlite> INSERT INTO fruits VALUES ('apple', 5);
sqlite> SELECT * FROM fruits WHERE price < 10;
apple|5
If you're writing C/C++, include sqlite3.h and a handful of lines embed a full database in your program.
Great. You're using it comfortably. But do you actually know — what do the bytes inside test.db look like?
Now flip roles. Stop being the user, become the reverse engineer. TinySqlite is a set of reverse-engineering notes that dissects the .db file's binary structure piece by piece. Let's begin.
Opening the File — How a .db File Is Organized
The Entire File Is a Chain of Pages
At the macro level, a .db file is astoundingly simple: it's a sequence of fixed-size pages laid end to end. Every page is the same size (typically 4096 bytes), numbered starting from page 1.
Picture a bookshelf where every shelf slot is the same width. To find the 3rd book, you start from the shelf edge and count to position 3 × slot_width. SQLite pages work the same way — the data for page N starts at file offset (N-1) × page_size.
my.db file:
┌─────── page 1 ───────┐┌─────── page 2 ───────┐┌─────── page 3 ───────┐┌── ...
│ file header (1st 100B)││ page header ││ page header │
│ page_size = 4096 ││ type = 0x0D (leaf) ││ type = 0x05 (interior)
│ num_tables = 3 ││ cells = [row1,row2] ││ child page ptrs │
│ ... ││ ... ││ ... │
└───────────────────────┘└──────────────────────┘└──────────────────────┘
Page 1 is special — its first 100 bytes form the file header, storing global metadata. Every page after that has only a page header followed by actual data.
What the File Header Carries
The first 100 bytes of page 1 in every SQLite file follow a fixed format. The first 16 bytes are the magic string "SQLite format 3\000" — the file's "ID card." It tells any program that tries to read the file: hey, I'm a SQLite 3 format database.
Bytes 16–17 store the page size. Note that this is stored in big-endian — high byte first. If these two bytes read 0x10 0x00, that's 4096. If the page size is 512, they'd be 0x02 0x00.
Here's how TinySqlite reads the page size:
static constexpr size_t kPageSizeOffset = 16;
auto read_u16_be(size_t offset) const noexcept -> uint16_t {
return static_cast<uint16_t>(data_[offset]) << 8
| static_cast<uint16_t>(data_[offset + 1]);
}
// In the constructor:
page_size_(read_u16_be(kPageSizeOffset))
Two bytes assembled into a uint16_t. No magic.
Another critical number hides at byte offset 103 (corresponding to SQLite's byte 56): the table count. TinySqlite reads this to know how many tables the database holds — both system and user tables.
static constexpr size_t kSchemaCountOffset = 103;
num_tables_(read_u16_be(kSchemaCountOffset))
What a B-tree Page Looks Like
Now that you know a file is a chain of pages, the next question is — what's inside a page? How is data actually organized?
SQLite uses a B-tree to organize data. Each table is a B-tree, and each node in that tree is a page. Pages have two roles:
-
Interior pages (type
0x05): These don't store actual data rows. They store "signposts" — child page numbers and key ranges. Their job is navigation: which subtree contains the data you're looking for. -
Leaf pages (type
0x0D): These hold the real row data. EveryINSERTultimately lands in a cell on some leaf page.
Visually:
Page 2 (interior, 0x05)
/ \
Page ? (leaf, 0x0D) Page ? (leaf, 0x0D)
[Granny Smith] [Fuji]
[Golden Delicious] [Honeycrisp]
A page's internal structure (the page header) starts at page offset 0:
| Offset | Size | Meaning |
|---|---|---|
| 0 | 1 byte | page type (0x05=interior table, 0x0D=leaf table, 0x02=interior index, 0x0A=leaf index) |
| 1 | 2 bytes | offset of first freeblock |
| 3 | 2 bytes | number of cells |
| 5 | 2 bytes | start of cell content area |
| 8 | 4 bytes | number of fragmented free bytes |
For interior pages, after the page header comes a rightmost child pointer (4 bytes), pointing to the rightmost child page. Then the cell pointer array — 2 bytes per cell, each pointing to that cell's actual location within the page.
This "rightmost child + cell pointer array" structure is what enables the B-tree to hop between pages. We'll expand on this when we cover full table scans.
At this point, you know three key facts:
- A .db file is a sequence of fixed-size pages
- The file header tells you the page size and how many tables exist
- A B-tree organizes the data — interior pages navigate, leaf pages store rows
The next natural question — how does SQLite know which tables exist and where each table's B-tree root lives? The answer is tucked inside a special table.
sqlite_master — The Database's "Table of Contents"
The Table of Tables
SQLite has a hidden system table called sqlite_master. Think of it as the table of contents at the front of a book — it doesn't store your business data, it describes the structure of the entire database.
sqlite_master (system table, exists in every .db file)
┌──────────┬───────────┬──────────┬─────────────────────────────┐
│ type │ name │ rootpage │ sql │
├──────────┼───────────┼──────────┼─────────────────────────────┤
│ "table" │ "apples" │ 2 │ "CREATE TABLE apples(...)" │
│ "table" │ "oranges" │ 4 │ "CREATE TABLE oranges(...)" │
│ "index" │ "idx_..." │ 6 │ "CREATE INDEX..." │
└──────────┴───────────┴──────────┴─────────────────────────────┘
Each row represents a table, index, view, or trigger. The two most important columns are name (the table name) and rootpage (the root page number of that table's B-tree). When you later run SELECT * FROM apples, that rootpage = 2 is how the engine finds the entry point to the apples table's data.
So how do you find sqlite_master itself? Its data lives at a fixed location — page 1. The file header's kSchemaCountOffset tells you how many rows there are, and right after it, starting at kCellPtrArrayStart (offset 108), is the cell pointer array — each 2-byte pointer references a cell within page 1 that belongs to sqlite_master.
But before we can actually parse those cells, we need two encoding tools — varint and serial type. They're how SQLite "writes numbers" and "describes types" on disk.
Varint: Waste Not, Want Not
SQLite's on-disk format leans heavily on a variable-length integer encoding called varint. The core idea is simple: small numbers take less space, big numbers take more.
The rule: each byte contributes its lower 7 bits as data, and the highest bit (bit 7) is a "continue" flag. If bit 7 is 1, there are more bytes coming. If it's 0, this is the last byte. Up to 9 bytes, with the 9th byte using all 8 bits.
| Value | Encoding (hex) | Bytes |
|---|---|---|
| 5 | 05 |
1 |
| 300 | 82 2C |
2 |
| 1000000 | 3D 09 40 |
3 |
300 = 0b_0000_0010_0010_1100. Split into 7-bit groups: 0000010 and 0101100. Add the continue bit — high group gets 1 (10000010 = 0x82), low group gets 0 (00101100 = 0x2C). read_varint does the reverse, pulling the value back out of the byte stream:
auto read_varint(size_t offset) const noexcept -> VarintResult {
uint64_t value = 0;
for (int i = 0; i < 9; ++i) {
auto byte = static_cast<uint8_t>(data_[offset + i]);
if (i == 8) {
value = (value << 8) | byte;
return {value, 9};
}
value = (value << 7) | static_cast<uint64_t>(byte & 0x7F);
if ((byte & 0x80) == 0)
return {value, static_cast<size_t>(i) + 1};
}
std::unreachable();
}
Each iteration grabs 7 bits and shifts them into the result. The loop stops when it hits a byte whose high bit is 0. If it reaches the 9th byte without stopping, it uses the full 8 bits — that's varint's maximum width.
Serial Type: What Exactly Is in This Column
Every column of a record carries a serial type code on disk. It tells the parser: is this column NULL, an integer, text, and how many bytes does it occupy?
| Type code | Meaning | Size in bytes |
|---|---|---|
| 0 | NULL | 0 |
| 1 ~ 4 | 1/2/3/4-byte integer | equals the type code |
| 5 | 6-byte integer | 6 |
| 6 | 8-byte integer | 8 |
| 7 | IEEE float | 8 |
| 8 | literal 0 | 0 |
| 9 | literal 1 | 0 |
| ≥12 and even | BLOB | (N - 12) / 2 |
| ≥13 and odd | text string | (N - 13) / 2 |
Notice the two special values 8 and 9 — the integer values 0 and 1 take up zero bytes on disk; the serial type alone encodes the value. SQLite's disk format is this miserly.
The corresponding code in TinySqlite:
static constexpr auto serial_type_size(uint64_t serial_type) noexcept -> size_t {
if (serial_type <= 4)
return std::array{0, 1, 2, 3, 4}[serial_type];
if (serial_type == 5) return 6;
if (serial_type == 6 || serial_type == 7) return 8;
if (serial_type >= 12 && serial_type % 2 == 0)
return static_cast<size_t>((serial_type - 12) / 2);
if (serial_type >= 13 && serial_type % 2 == 1)
return static_cast<size_t>((serial_type - 13) / 2);
return 0;
}
Decoding a Schema Row in Real Time
With varint and serial type in hand, we can now decode a single row from sqlite_master. Say this row describes the apples table: type="table", name="apples", rootpage=2, sql="CREATE TABLE apples(...)".
A schema table cell roughly follows this layout: [payload size (varint)] [rowid (varint)] [header size (varint)] [5 serial types (varint)] [body: actual data for 5 columns]
TinySqlite's parse_schema_entry does exactly this: skip payload size and rowid → read 5 serial types → compute per-column byte sizes from each serial type → sequentially read type, name, tbl_name, rootpage, and sql from the body region.
You don't need to memorize every step — just know that this is how you get a table name and its rootpage out of raw binary. Once you have the rootpage, you start traversing that table's B-tree from the corresponding page.
SELECT + WHERE — How Data Actually Gets Found
The first three sections laid all the groundwork: file format, B-tree, schema table, encoding primitives. Now let's answer the question that's been hanging since the beginning —
You type SELECT name FROM apples WHERE color='Yellow'. What does SQLite actually do?
SQL parsing? TinySqlite handles it with string search — find FROM, split column names, find WHERE, extract conditions. Let's skip past that in one sentence.
The interesting part is what comes next: finding the data.
Full Table Scan: Recursive Descent, Zero Missed Rows
Without an index, SQLite's only option is a full table scan — read every row of the target table top to bottom, then filter with the WHERE clause. Sounds simple enough, but when a table spans multiple pages, how do you guarantee nothing is skipped?
The answer is in the B-tree traversal algorithm. The entry point is the rootpage — the page number recorded in sqlite_master. TinySqlite starts here:
auto rp = db.rootpage(sel.table);
From the rootpage, read_columns_values recursively walks the entire B-tree. The core logic:
-
If it's an interior page (
0x05): each cell has a left child pointer pointing to a subtree. Iterate all cells, recursively process each subtree. There's also a rightmost child pointer pointing to the rightmost subtree — can't forget that one. -
If it's a leaf page (
0x0D): each cell is a data row. Read them one by one and match against the WHERE condition.
The rightmost child is the most overlooked piece of design — that 4-byte pointer sitting before the cell pointer array on interior pages. It points to the subpage covering the range "to the right" of all cells. Without it, the rightmost chunk of data simply gets dropped.
// Interior page processing logic
auto num_cells = read_u16_be(page_offset + 3);
auto right_child = read_u32_be(page_offset + 8); // ← don't forget this one
for (uint32_t i = 0; i < num_cells; ++i) {
auto cell_ptr = page_offset + read_u16_be(page_offset + 12 + i * 2);
auto child = read_u32_be(cell_ptr); // ← left child
auto child_rows = read_columns_values(child, column_indices, filter);
// ...collect rows from the child page...
}
auto right_rows = read_columns_values(right_child, column_indices, filter);
// ...collect rows from the rightmost child...
The recursion keeps diving, stops at leaf pages, reads data, and bubbles it back up. The entire B-tree gets fully traversed — not a single row is missed.
Column value extraction for each row relies on the serial type machinery from the previous section: read varints for serial types and byte widths, then pull data out of the payload region. If a column happens to be the one in the WHERE clause, compare on the spot.
Index Scan: Taking the B-tree Shortcut
The problem with full table scans is obvious: you're only looking for color='Yellow', yet you're reading every apple of every color. When a table has hundreds of thousands of rows, this hurts.
SQLite's solution is an index. An index is itself a B-tree, but with a few twists:
- Page types are
0x02(interior index page) and0x0A(leaf index page) - Cells don't carry full rows. They carry index column values + rowid
- The B-tree is sorted by the indexed column
Back to WHERE color='Yellow'. If the apples table has an index on the color column, TinySqlite's path becomes:
Step 1: Search the index B-tree, collect matching rowids.
The index_search function traverses the index B-tree to find every entry where color='Yellow'. Because the index pages are sorted by the color column, the search is binary — compare the index value on the current page, go left if the target is lower, collect the rowid on match, and stop (or continue searching the left subtree) if the target is higher. Vastly more efficient than a full table scan.
Step 2: Use rowids to locate individual rows in the table B-tree.
With a list of rowids in hand, read_row_by_rowid does point lookups on the table B-tree. Each point lookup follows a path similar to the recursive scan — compare rowids on interior pages to decide which child page to descend into — but it hunts for a single row rather than traversing every cell.
if (sel.where) {
auto idx_rp = db.index_rootpage(sel.table, sel.where->column);
if (idx_rp) {
std::vector<uint64_t> rowids;
db.index_search(*idx_rp, sel.where->value, rowids);
for (auto rid : rowids) {
auto row = db.read_row_by_rowid(*rp, rid, col_indices);
// ...collect results...
}
return result;
}
}
// No index, fall back to full table scan
return QueryResult{.rows = db.read_columns_values(*rp, col_indices, filter)};
The code above is the core decision logic in TinySqlite's execute_query: if there's an index, use it; otherwise, do a full scan. SQL parsing, schema rootpage lookup, B-tree traversal, WHERE filtering, index search — everything covered so far converges into these dozen lines.
Of course, production-grade SQLite is far more complex. It has a query optimizer to pick among indexes, WAL journaling for crash recovery, multi-version concurrency control, B-tree page splits and rebalancing. But the core skeleton — file header → page → B-tree → schema → full scan / index scan — is exactly this.
Databases Aren't That Mysterious
Starting from the client-server architecture contrast with MySQL, we've peeled all the way down to the .db file's bedrock: fixed-size pages strung together into a tree, a handful of header bytes telling you page size and table count, a schema table whose varint-encoded rows describe where every table lives, interior B-tree pages serving as signposts and leaf pages holding the actual data, and indexes as separate B-trees already sorted for you.
740 lines of C++23, zero external dependencies, spanning the full path from a binary file header to a SELECT query result. It won't run TPC-C, and it's not going to replace libsqlite3 in your project. But if you want to see what every byte in a .db file is doing, it's exactly enough.
Code at https://github.com/Tenaryo/TinySqlite — this is a teaching-grade implementation, not an industrial one. Issues and feedback are still welcome.
Top comments (0)