DEV Community

Cover image for How INSERT and JOIN Actually Execute in SQLite
Athreya aka Maneshwar
Athreya aka Maneshwar

Posted on

How INSERT and JOIN Actually Execute in SQLite

Hello, I'm Maneshwar. I'm working on git-lrc: a Git hook for Checking AI generated code.

Yesterday we dissected individual opcodes like OpenRead, MakeRecord, Next, and Halt.

Today we zoom out again and observe how those instructions assemble into full execution patterns.

Instead of looking at single instructions, we now look at execution logic how the VM composes those tiny primitives to implement real SQL statements.

Let’s begin with something simple.

Insert Logic

Assume we have a table:

T1(c1 TEXT, c2 INTEGER)
Enter fullscreen mode Exit fullscreen mode

No indexes. No triggers. Just a plain table backed by a B+-tree.

Now execute:

INSERT INTO T1 VALUES('Hello, World!', 2000);
Enter fullscreen mode Exit fullscreen mode

On the surface, it feels like a single action. Internally, the VM executes a structured sequence of operations.

First, a write transaction must begin. The VM issues a Transaction opcode with write intent. No data changes happen before this step.

Second, the VM executes VerifyCookie. This ensures the schema version stored in the database matches what the prepared statement expects.

If the schema changed after compilation, execution aborts. SQLite never assumes the world stayed still.

Next, the VM opens a write cursor on the B+-tree representing table T1. This is done with an opcode like OpenWrite (similar in spirit to OpenRead, but for modifications). The cursor now points to the tree structure where rows are stored.

Then comes row construction.

The VM generates a new rowid. If the table uses an implicit integer primary key, SQLite determines a rowid that is not currently in use. This may involve consulting the table’s maximum existing rowid or using internal bookkeeping.

After that, MakeRecord is executed. The registers containing:

  • 'Hello, World!'
  • 2000

are serialized into a properly formatted table record. This includes:

  • Record header
  • Serial type encoding
  • Value payloads

Now the record exists as a contiguous byte sequence, ready for insertion.

The VM then calls the insertion opcode (Insert) to place the new record into the B+-tree through the open cursor. At this point, the tree module takes over:

  • It locates the correct leaf page.
  • Splits pages if necessary.
  • Maintains tree balance.
  • Marks pages dirty for the pager.

When insertion completes, the cursor is closed.

Finally, execution halts, returning a status code (SQLITE_OK if successful).

That’s it.

No magic. No abstraction leaks. Just a clean sequence:

  • Begin transaction
  • Verify schema
  • Open cursor
  • Create rowid
  • MakeRecord
  • Insert
  • Close
  • Halt

If indexes exist on T1, the story becomes slightly richer. For each index:

  • The VM opens another write cursor.
  • Constructs an index key record.
  • Inserts the index entry.

Every index update is just another record creation plus B-tree insertion. The pattern stays the same — only the number of cursors increases.

Join Logic

Now let’s consider something conceptually heavier: a join.

Suppose we run:

SELECT * FROM t1, t2 WHERE some_condition;
Enter fullscreen mode Exit fullscreen mode

Assume neither table has indexes.

How does SQLite execute this?

SQLite implements joins using nested loops only. It does not perform merge joins. It does not perform hash joins. Every join is a loop join.

The left-most table in the FROM clause becomes the outer loop.
The right-most table becomes the inner loop.

Conceptually, execution resembles:

for each row in t1:
    for each row in t2:
        if condition holds:
            output combined row
Enter fullscreen mode Exit fullscreen mode

Let’s walk through the VM-level steps.

First, a read transaction begins. A shared lock is acquired on the main database.

Second, VerifyCookie ensures the schema has not changed since compilation.

Then, two read cursors are opened:

  • Cursor A → B+-tree of t1
  • Cursor B → B+-tree of t2

The VM rewinds cursor A to the first row of t1.

For each row in t1, the VM:

  • Rewinds cursor B to the first row of t2.
  • Iterates through every row in t2.

Inside the inner loop, the VM executes bytecode instructions that evaluate some_condition. This typically involves:

  • Extracting columns via Column
  • Performing comparisons
  • Branching conditionally

If the condition evaluates to TRUE:

  • The VM assembles the result row (often directly using register values).
  • Executes ResultRow.

After exhausting t2, the VM advances cursor A with Next and repeats.

Finally, both cursors are closed, and execution halts.

The Important Insight

Even though SQL describes joins declaratively, the VM executes them procedurally.

There is no high-level “join operator” at runtime. There are only:

  • Cursor operations
  • Comparisons
  • Conditional jumps
  • Nested loops

The elegance lies in the fact that the same small instruction set powers both simple selects and complex joins.

When indexes are present, the planner changes the bytecode generation strategy. Instead of scanning the entire inner table, it may:

  • Seek directly into the index
  • Restrict loop bounds
  • Short-circuit iteration

But at execution time, the VM still just runs instructions in order.

Where We’re Heading Next

Now that we’ve seen:

  • How an INSERT becomes a sequence of record construction and tree insertion.
  • How a JOIN becomes nested cursor loops with conditional branches.

In the next post, we will move from algorithmic description to actual program execution flow.

git-lrc

AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

👉 Check out: git-lrc
Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.
⭐ Star it on GitHub:

GitHub logo HexmosTech / git-lrc

Free, Unlimited AI Code Reviews That Run on Commit

git-lrc logo

git-lrc

Free, Unlimited AI Code Reviews That Run on Commit


git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.

git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

See It In Action

See git-lrc catch serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements

git-lrc-intro-60s.mp4

Why

  • 🤖 AI agents silently break things. Code removed. Logic changed. Edge cases gone. You won't notice until production.
  • 🔍 Catch it before it ships. AI-powered inline comments show you exactly what changed and what looks wrong.
  • 🔁 Build a habit, ship better code. Regular review → fewer bugs → more robust code → better results in your team.
  • 🔗 Why git? Git is universal. Every editor, every IDE, every AI…




Top comments (0)