DEV Community

Athreya aka Maneshwar

Posted on Feb 20

Bytecode: SQLite’s Internal Programming Language

#webdev #programming #database #architecture

Hello, I'm Maneshwar. I'm working on git-lrc: a Git hook for Checking AI generated code.
AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

When you prepare a statement:

sqlite3_prepare_v2(...)

SQLite does not store SQL text internally.

It compiles SQL into a low-level instruction sequence, a bytecode program.

That program is executed by the VM.

In SQLite internals, a prepared statement is an object of type:

sqlite3_stmt

Internally this structure is called Vdbe

That object contains:

The bytecode program
Metadata for result columns
Bound parameter values
A program counter
Registers (memory cells)
Runtime state (open Btrees, sorters, temp structures)

So when you hold a sqlite3_stmt*, you're literally holding a tiny program.

The VM Is Not a Query Optimizer

One subtle but critical point:

The VM does zero query optimization.

It does not rethink execution plans.
It does not change strategies.

It simply executes the bytecode it is given.

If the planner made a bad decision, the VM faithfully carries it out.

That separation of concerns is intentional. It keeps:

Planning
Execution
Storage

cleanly decoupled.

The Register Machine Model

The VM is a register-based machine.

It does not use a stack model like many other interpreters. Instead, it operates on numbered memory cells called registers.

Each register can hold:

A value (NULL, int, real, text, blob)
Flags describing its current type
Metadata like encoding or collation info

A simplified mental model:

R1 = 5
R2 = 10
Add R1 R2 → R3

Registers are heavily reused. The compiler assigns register numbers strategically to minimize memory usage.

This register-based design is one reason SQLite’s VM is compact and efficient.

How Execution Proceeds

Execution is driven by a program counter (PC).

Each step:

Fetch instruction at PC
Execute it
Advance PC (or jump)

Some instructions:

Open a table cursor
Read a row
Compare values
Insert into B-tree
Jump if condition met
Halt

The VM keeps running until:

It hits a breakpoint (producing a row)
Or reaches the HALT instruction

That’s why:

sqlite3_step() advances execution
It may return a row
Or signal completion

The sqlite3_stmt Lifecycle

Here’s how the public API maps to VM internals:

`sqlite3_bind_*`

Assigns values to parameter registers.

`sqlite3_step`

Runs the bytecode until:

A row is produced, or
Execution halts

`sqlite3_column_*`

Reads result values from output registers.

`sqlite3_reset`

Rewinds the program counter.
Keeps the bytecode intact.
Optionally keeps bound values.

`sqlite3_finalize`

Destroys the Vdbe object.
Frees registers, cursors, and runtime state.

Under the hood, this is just managing a tiny custom-built virtual computer.

Why the VM Was a Brilliant Design Choice

The SQLite team strongly believes the VM architecture made development easier.

Why?

Because instead of debugging tangled C control flow, developers can:

Print bytecode
Trace instruction execution
Observe register values change
See exactly what SQL compiled into

This dramatically improves debuggability.

Bytecode programs are far easier to inspect than complex internal structures.

What the VM Actually Controls

The VM:

Formats table records
Formats index records
Converts between storage types
Evaluates expressions
Manages cursors
Drives tree operations
Orchestrates inserts and deletes

The tree module only reacts to VM commands.

The pager only reacts to tree commands.

The VM is the conductor.

Big Picture

Let’s connect everything we’ve covered so far:

Layer	Responsibility
SQL Parser	Converts SQL text to parse tree
Query Planner	Chooses strategy
VM (VDBE)	Executes bytecode, manages registers, performs type conversions
Tree Module	Maintains B-/B+-trees
Pager	Manages pages, journaling, WAL
OS	Reads/writes disk

Everything ultimately flows from the VM downward.

Coming Next

Now that we understand:

What the VM is
How it executes bytecode
How it manages registers
How it interfaces with trees

Let's understand Bytecode Programming Language in the coming days.

👉 Check out: git-lrc
Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.
⭐ Star it on GitHub:

HexmosTech / git-lrc

Free, Unlimited AI Code Reviews That Run on Commit

git-lrc

Free, Unlimited AI Code Reviews That Run on Commit

AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.

git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

See It In Action

See git-lrc catch serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements

git-lrc-intro-60s.mp4

Why

🤖 AI agents silently break things. Code removed. Logic changed. Edge cases gone. You won't notice until production.
🔍 Catch it before it ships. AI-powered inline comments show you exactly what changed and what looks wrong.
🔁 Build a habit, ship better code. Regular review → fewer bugs → more robust code → better results in your team.
🔗 Why git? Git is universal. Every editor, every IDE, every AI…

View on GitHub