DEV Community

Cover image for Bytecode: SQLite’s Internal Programming Language
Athreya aka Maneshwar
Athreya aka Maneshwar

Posted on

Bytecode: SQLite’s Internal Programming Language

Hello, I'm Maneshwar. I'm working on git-lrc: a Git hook for Checking AI generated code.
AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

When you prepare a statement:

sqlite3_prepare_v2(...)
Enter fullscreen mode Exit fullscreen mode

SQLite does not store SQL text internally.

It compiles SQL into a low-level instruction sequence, a bytecode program.

That program is executed by the VM.

In SQLite internals, a prepared statement is an object of type:

sqlite3_stmt
Enter fullscreen mode Exit fullscreen mode

Internally this structure is called Vdbe

That object contains:

  • The bytecode program
  • Metadata for result columns
  • Bound parameter values
  • A program counter
  • Registers (memory cells)
  • Runtime state (open Btrees, sorters, temp structures)

So when you hold a sqlite3_stmt*, you're literally holding a tiny program.

The VM Is Not a Query Optimizer

One subtle but critical point:

The VM does zero query optimization.

It does not rethink execution plans.
It does not change strategies.

It simply executes the bytecode it is given.

If the planner made a bad decision, the VM faithfully carries it out.

That separation of concerns is intentional. It keeps:

  • Planning
  • Execution
  • Storage

cleanly decoupled.

The Register Machine Model

The VM is a register-based machine.

It does not use a stack model like many other interpreters. Instead, it operates on numbered memory cells called registers.

Each register can hold:

  • A value (NULL, int, real, text, blob)
  • Flags describing its current type
  • Metadata like encoding or collation info

A simplified mental model:

R1 = 5
R2 = 10
Add R1 R2 → R3
Enter fullscreen mode Exit fullscreen mode

Registers are heavily reused. The compiler assigns register numbers strategically to minimize memory usage.

This register-based design is one reason SQLite’s VM is compact and efficient.

How Execution Proceeds

Execution is driven by a program counter (PC).

Each step:

  1. Fetch instruction at PC
  2. Execute it
  3. Advance PC (or jump)

Some instructions:

  • Open a table cursor
  • Read a row
  • Compare values
  • Insert into B-tree
  • Jump if condition met
  • Halt

The VM keeps running until:

  • It hits a breakpoint (producing a row)
  • Or reaches the HALT instruction

That’s why:

  • sqlite3_step() advances execution
  • It may return a row
  • Or signal completion

The sqlite3_stmt Lifecycle

Here’s how the public API maps to VM internals:

sqlite3_bind_*

Assigns values to parameter registers.
image

sqlite3_step

Runs the bytecode until:

  • A row is produced, or
  • Execution halts image

sqlite3_column_*

Reads result values from output registers.
image

sqlite3_reset

Rewinds the program counter.
Keeps the bytecode intact.
Optionally keeps bound values.

image

sqlite3_finalize

Destroys the Vdbe object.
Frees registers, cursors, and runtime state.

Under the hood, this is just managing a tiny custom-built virtual computer.

image

Why the VM Was a Brilliant Design Choice

The SQLite team strongly believes the VM architecture made development easier.

Why?

Because instead of debugging tangled C control flow, developers can:

  • Print bytecode
  • Trace instruction execution
  • Observe register values change
  • See exactly what SQL compiled into

This dramatically improves debuggability.

Bytecode programs are far easier to inspect than complex internal structures.

What the VM Actually Controls

The VM:

  • Formats table records
  • Formats index records
  • Converts between storage types
  • Evaluates expressions
  • Manages cursors
  • Drives tree operations
  • Orchestrates inserts and deletes

The tree module only reacts to VM commands.

The pager only reacts to tree commands.

The VM is the conductor.

Big Picture

Let’s connect everything we’ve covered so far:

Layer Responsibility
SQL Parser Converts SQL text to parse tree
Query Planner Chooses strategy
VM (VDBE) Executes bytecode, manages registers, performs type conversions
Tree Module Maintains B-/B+-trees
Pager Manages pages, journaling, WAL
OS Reads/writes disk

Everything ultimately flows from the VM downward.

Coming Next

Now that we understand:

  • What the VM is
  • How it executes bytecode
  • How it manages registers
  • How it interfaces with trees

Let's understand Bytecode Programming Language in the coming days.

git-lrc

👉 Check out: git-lrc
Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.
⭐ Star it on GitHub:

GitHub logo HexmosTech / git-lrc

Free, Unlimited AI Code Reviews That Run on Commit

git-lrc

Free, Unlimited AI Code Reviews That Run on Commit

AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.

git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

See It In Action

See git-lrc catch serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements

git-lrc-intro-60s.mp4

Why

  • 🤖 AI agents silently break things. Code removed. Logic changed. Edge cases gone. You won't notice until production.
  • 🔍 Catch it before it ships. AI-powered inline comments show you exactly what changed and what looks wrong.
  • 🔁 Build a habit, ship better code. Regular review → fewer bugs → more robust code → better results in your team.
  • 🔗 Why git? Git is universal. Every editor, every IDE, every AI…




Top comments (0)