Athreya aka Maneshwar

Posted on Feb 23

Inside SQLite’s Bytecode Language

#webdev #programming #database #architecture

Hello, I'm Maneshwar. I'm working on git-lrc: a Git hook for Checking AI generated code.

Yesterday we positioned the Virtual Machine as the execution heart of SQLite the layer that sits between the planner and the storage engine.

Today, we move one level deeper into the VM itself.

If the VM is a machine, then it needs a language.

That language is SQLite’s bytecode programming language.

It isn’t SQL. It isn’t C.
It’s a tiny, purpose-built instruction set designed specifically for manipulating tuples inside B- and B+-trees.

And once you understand this layer, SQLite stops feeling like a black box.

From SQL to a Linear Program

When you run:

SELECT * FROM t1;

SQLite does not execute that string directly. The frontend parses it, the planner decides how to execute it, and then the compiler emits a linear sequence of bytecode instructions.

A bytecode program is simply a list of instructions, executed in order unless a jump changes control flow.

Each instruction has the form:

(opcode, P1, P2, P3, P4, P5)

Where:

opcode → identifies the operation
P1, P2, P3 → 32-bit signed integers
P2 → often used as a jump destination
P4 → can be a pointer, integer, float, string, collation function, SQL function, etc.
P5 → an unsigned byte, usually flags

Some instructions use all operands.
Some ignore most of them.

SQLite keeps the instruction format uniform even if specific opcodes need only a subset of operands.

Internally, each instruction is represented as a VdbeOp structure. A complete prepared statement is nothing more than a linear array of VdbeOp objects.

That’s it.

No trees. No nested AST. Just a straight-line program with conditional jumps.

A Concrete Example: `SELECT * FROM t1`

In the example shown in Figure, a simple SELECT * FROM t1 compiles into about 14 bytecode instructions.

Execution begins at address 0 and proceeds sequentially until:

A HALT instruction is reached, or
Control jumps beyond the final instruction

Even for such a simple query, the program must:

Open a cursor on table t1
Rewind the cursor to the first row
Read column values
Emit result rows
Advance the cursor
Loop until exhausted
Halt

There is no “magic SELECT”. There is only a tiny, explicit program doing mechanical steps.

That explicitness is what makes the VM so powerful — and so debuggable.

Categories of Bytecode Instructions

SQLite currently defines roughly 140+ opcodes. They fall broadly into five families.

1. Arithmetic and Logic

These include operations like addition, subtraction, multiplication, division, remainder, bitwise operations, shifts, and string concatenation.

They operate on registers and write results back into registers.

2. Data Movement

These instructions move values between registers, copy them, or initialize them.

Since the VM is register-based, data movement is fundamental.

3. Control Flow

This includes:

Goto
Gosub
Return
Conditional branches
Halt

These instructions change the program counter and allow loops and conditional execution.

4. B- and B+-Tree Related

This is where the VM interfaces with storage. These opcodes:

Create or destroy trees
Open and close cursors
Move cursors forward or backward
Seek to specific keys
Insert or delete records
Begin, commit, or rollback transactions

Notice something subtle here:

The VM does not manipulate pages directly.
It tells the tree module what to do.

5. Specialized Instructions

These include operations such as:

Allocating a new rowid
Constructing a record from register values
Extracting the i-th column from a row
Managing ephemeral structures

This category contains the glue that makes higher-level SQL semantics possible.

The Structure of a VdbeOp

Each bytecode instruction is stored in a VdbeOp object containing:

opcode
p1
p2
p3
p4
p5
p4type (indicating what kind of data p4 holds)

There are over a dozen possible p4 types from integer literals to pointers to collation functions.

This design gives SQLite enormous flexibility without bloating the instruction format.

A prepared statement is literally:

VdbeOp program[N];

A linear array. Nothing more complex than that.

The brilliance of this design is separation of concerns.

The parser builds structure.
The planner decides strategy.
The compiler emits instructions.
The VM executes blindly.
The tree module performs structural modifications.
The pager guarantees durability.

Each layer does one job.

Because bytecode programs are linear and readable, SQLite developers can:

Print the program
Trace instruction execution
Inspect register contents
Watch cursor movement

Debugging becomes a matter of following a tiny instruction stream instead of reverse-engineering deep C call stacks.

Execution Is Just Interpretation

At runtime, the VM repeatedly:

Fetches the instruction at the program counter
Executes it
Updates the program counter

If an instruction modifies P2 as a jump target, control transfers accordingly.

There is no speculative execution.
No optimization at this stage.
No re-planning.

The VM simply interprets.

It is small. Deterministic. Mechanical.

AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

👉 Check out: git-lrc
Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.
⭐ Star it on GitHub:

HexmosTech / git-lrc

Free, Micro AI Code Reviews That Run on Commit

git-lrc

Free, Micro AI Code Reviews That Run on Commit

AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.

git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

See It In Action

See git-lrc catch serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements

git-lrc-intro-60s.mp4

Why

🤖 AI agents silently break things. Code removed. Logic changed. Edge cases gone. You won't notice until production.
🔍 Catch it before it ships. AI-powered inline comments show you exactly what changed and what looks wrong.
…

View on GitHub

DEV Community

Inside SQLite’s Bytecode Language

From SQL to a Linear Program

A Concrete Example: `SELECT * FROM t1`

Categories of Bytecode Instructions

1. Arithmetic and Logic

2. Data Movement

3. Control Flow

4. B- and B+-Tree Related

5. Specialized Instructions

The Structure of a VdbeOp

Execution Is Just Interpretation

HexmosTech / git-lrc

Free, Micro AI Code Reviews That Run on Commit

git-lrc

Free, Micro AI Code Reviews That Run on Commit

See It In Action

Why

Top comments (0)

From SQL to a Linear Program

A Concrete Example: SELECT * FROM t1

Categories of Bytecode Instructions

1. Arithmetic and Logic

2. Data Movement

3. Control Flow

4. B- and B+-Tree Related

5. Specialized Instructions

The Structure of a VdbeOp

Execution Is Just Interpretation

HexmosTech / git-lrc

Free, Micro AI Code Reviews That Run on Commit

git-lrc

Free, Micro AI Code Reviews That Run on Commit

See It In Action

Why

A Concrete Example: `SELECT * FROM t1`