DEV Community

Athreya aka Maneshwar

Posted on Jan 11 • Edited on Mar 7

Inside SQLite Backend: Virtual Machine, Storage, and the Build Process

#webdev #programming #database #architecture

Hello, I'm Maneshwar. I'm building git-lrc, an AI code reviewer that runs on every commit. It is free, unlimited, and source-available on Github. Star Us to help devs discover the project. Do give it a try and share your feedback for improving the product.

In the previous post, I explored how SQLite’s frontend transforms SQL text into executable bytecodethrough the tokenizer, parser, and code generator.

That process ends with a prepared statement, but the real work begins only after that.

Today’s learning moves firmly into the backend of SQLite, where bytecode programs are executed, data is stored and retrieved, transactions are enforced, and the entire system is stitched together into a compact, embeddable library.

The Virtual Machine (VDBE)

Once the frontend finishes compilation, it hands over a bytecode program to the Virtual Database Engine, commonly called the VM or VDBE.

A bytecode program is:

A linear sequence of instructions
Each instruction has an opcode and up to five operands
Executed sequentially, one instruction at a time

The VM behaves like a custom CPU, designed specifically for database operations scanning tables, comparing values, managing cursors, and enforcing transactional semantics.

The Tree Module

While the VM defines what to do, the tree module defines how data is organized.

SQLite stores:

Tables as B+ trees
Indexes as B-trees

Each table and index has its own independent tree structure.

Implementation details:

btree.c → tree logic
btree.h → public interface

The tree module supports searching, insertion, deletion, updates, and structural changes such as creating or dropping tables and indexes.

The Pager: SQLite’s Core Infrastructure

The pager is one of the most critical components in SQLite.

The tree module never accesses the database file directly. Instead, it works with fixed-size pages requested from the pager.

Responsibilities of the Pager

The pager:

Reads and writes database pages
Maintains an in-memory page cache
Handles file locking
Manages rollback journals
Enforces transaction boundaries

In effect, the pager acts as:

Data manager
Lock manager
Log manager
Transaction manager

All of this logic lives in:

pager.c
pager.h

The pager is the backbone that allows SQLite to deliver ACID guarantees using a single database file.

lovestaco@i3nux-mint:~/pers/sqlite$ ll /home/lovestaco/pers/sqlite/bld/sqlite3
-rwxrwxr-x 1 lovestaco lovestaco 6.9M Jan 11 17:14 /home/lovestaco/pers/sqlite/bld/sqlite3

SQLite Build Process

SQLite’s build process reflects its philosophy of self-containment and reproducibility.

The build consists of six major steps:

Generate sqlite3.h
Build the SQL parser
Generate VM opcodes
Generate opcode names
Generate SQL keyword tables
Compile the library

Generated Files and Tools

During the build:

lemon.c generates parse.c and parse.h
mkkeywordhash.c generates keywordhash.h
awk and sed generate:
- sqlite3.h
- opcodes.h
- opcodes.c

The opcodes.h file assigns numeric values to VM instructions, while opcodes.c maps opcodes to human readable names which areuseful for debugging and diagnostics.

Amalgamation Build

Modern SQLite releases provide a single amalgamation file, sqlite3.c, along with sqlite3.h.

Advantages of using the amalgamation:

5–10% better performance
More aggressive compiler optimizations
Simplified build process
Easier embedding into applications

The SQLite team strongly recommends this approach. The command-line utility additionally requires shell.c.

Chapter-Level Summary

At this point, the SQLite landscape comes into full focus.

SQL is compiled into bytecode and executed by a purpose-built VM
SQLite ensures serializable execution using database-level locking
Journaling guarantees atomicity and recovery
Each database lives in a single native file anchored by sqlite_master
The architecture is modular and cleanly layered
The entire system is open source and in the public domain

This chapter has been a guided tour of SQLite’s internals, from API usage to execution, storage, recovery, and build mechanics.

In the next chapter, the focus shifts even deeper into database and journal file storage structures, where SQLite’s on-disk layout reveals how these abstractions are made real.

My experiments and hands-on executions related to SQLite will live here: [lovestaco/sqlite](https://github.com/lovestaco/sqlite/tree/master/sqlite## References:
SQLite Database System: Design and Implementation. N.p.: Sibsankar Haldar, (n.d.).

*AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.

git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.*

Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.

⭐ Star it on GitHub:

HexmosTech / git-lrc

Free, Unlimited AI Code Reviews That Run on Commit

git-lrc

Free, Unlimited AI Code Reviews That Run on Commit

AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.

git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

See It In Action

See git-lrc catch serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements

git-lrc-intro-60s.mp4

Why

🤖 AI agents silently break things. Code removed. Logic changed. Edge cases gone. You won't notice until production.
🔍 Catch it before it ships. AI-powered inline comments show you exactly what changed and what looks wrong.
🔁 Build a habit, ship better code. Regular review → fewer bugs → more robust code → better results in your team.
🔗 Why git? Git is universal. Every editor, every IDE, every AI…

View on GitHub