Athreya aka Maneshwar

Posted on Feb 25

How INSERT and JOIN Actually Execute in SQLite

#webdev #programming #database #architecture

Hello, I'm Maneshwar. I'm working on git-lrc: a Git hook for Checking AI generated code.

Yesterday we dissected individual opcodes like OpenRead, MakeRecord, Next, and Halt.

Today we zoom out again and observe how those instructions assemble into full execution patterns.

Instead of looking at single instructions, we now look at execution logic how the VM composes those tiny primitives to implement real SQL statements.

Let’s begin with something simple.

Insert Logic

Assume we have a table:

T1(c1 TEXT, c2 INTEGER)

No indexes. No triggers. Just a plain table backed by a B+-tree.

Now execute:

INSERT INTO T1 VALUES('Hello, World!', 2000);

On the surface, it feels like a single action. Internally, the VM executes a structured sequence of operations.

First, a write transaction must begin. The VM issues a Transaction opcode with write intent. No data changes happen before this step.

Second, the VM executes VerifyCookie. This ensures the schema version stored in the database matches what the prepared statement expects.

If the schema changed after compilation, execution aborts. SQLite never assumes the world stayed still.

Next, the VM opens a write cursor on the B+-tree representing table T1. This is done with an opcode like OpenWrite (similar in spirit to OpenRead, but for modifications). The cursor now points to the tree structure where rows are stored.

Then comes row construction.

The VM generates a new rowid. If the table uses an implicit integer primary key, SQLite determines a rowid that is not currently in use. This may involve consulting the table’s maximum existing rowid or using internal bookkeeping.

After that, MakeRecord is executed. The registers containing:

'Hello, World!'
2000

are serialized into a properly formatted table record. This includes:

Record header
Serial type encoding
Value payloads

Now the record exists as a contiguous byte sequence, ready for insertion.

The VM then calls the insertion opcode (Insert) to place the new record into the B+-tree through the open cursor. At this point, the tree module takes over:

It locates the correct leaf page.
Splits pages if necessary.
Maintains tree balance.
Marks pages dirty for the pager.

When insertion completes, the cursor is closed.

Finally, execution halts, returning a status code (SQLITE_OK if successful).

That’s it.

No magic. No abstraction leaks. Just a clean sequence:

Begin transaction
Verify schema
Open cursor
Create rowid
MakeRecord
Insert
Close
Halt

If indexes exist on T1, the story becomes slightly richer. For each index:

The VM opens another write cursor.
Constructs an index key record.
Inserts the index entry.

Every index update is just another record creation plus B-tree insertion. The pattern stays the same — only the number of cursors increases.

Join Logic

Now let’s consider something conceptually heavier: a join.

Suppose we run:

SELECT * FROM t1, t2 WHERE some_condition;

Assume neither table has indexes.

How does SQLite execute this?

SQLite implements joins using nested loops only. It does not perform merge joins. It does not perform hash joins. Every join is a loop join.

The left-most table in the FROM clause becomes the outer loop.
The right-most table becomes the inner loop.

Conceptually, execution resembles:

for each row in t1:
    for each row in t2:
        if condition holds:
            output combined row

Let’s walk through the VM-level steps.

First, a read transaction begins. A shared lock is acquired on the main database.

Second, VerifyCookie ensures the schema has not changed since compilation.

Then, two read cursors are opened:

Cursor A → B+-tree of t1
Cursor B → B+-tree of t2

The VM rewinds cursor A to the first row of t1.

For each row in t1, the VM:

Rewinds cursor B to the first row of t2.
Iterates through every row in t2.

Inside the inner loop, the VM executes bytecode instructions that evaluate some_condition. This typically involves:

Extracting columns via Column
Performing comparisons
Branching conditionally

If the condition evaluates to TRUE:

The VM assembles the result row (often directly using register values).
Executes ResultRow.

After exhausting t2, the VM advances cursor A with Next and repeats.

Finally, both cursors are closed, and execution halts.

The Important Insight

Even though SQL describes joins declaratively, the VM executes them procedurally.

There is no high-level “join operator” at runtime. There are only:

Cursor operations
Comparisons
Conditional jumps
Nested loops

The elegance lies in the fact that the same small instruction set powers both simple selects and complex joins.

When indexes are present, the planner changes the bytecode generation strategy. Instead of scanning the entire inner table, it may:

Seek directly into the index
Restrict loop bounds
Short-circuit iteration

But at execution time, the VM still just runs instructions in order.

Where We’re Heading Next

Now that we’ve seen:

How an INSERT becomes a sequence of record construction and tree insertion.
How a JOIN becomes nested cursor loops with conditional branches.

In the next post, we will move from algorithmic description to actual program execution flow.

AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

👉 Check out: git-lrc
Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.
⭐ Star it on GitHub:

HexmosTech / git-lrc

Free, Unlimited AI Code Reviews That Run on Commit

git-lrc

Free, Unlimited AI Code Reviews That Run on Commit

AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.

git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

See It In Action

See git-lrc catch serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements

git-lrc-intro-60s.mp4

Why

🤖 AI agents silently break things. Code removed. Logic changed. Edge cases gone. You won't notice until production.
🔍 Catch it before it ships. AI-powered inline comments show you exactly what changed and what looks wrong.
🔁 Build a habit, ship better code. Regular review → fewer bugs → more robust code → better results in your team.
🔗 Why git? Git is universal. Every editor, every IDE, every AI…

View on GitHub

Top comments (4)

Guilherme Zaia • Feb 27

Nested loops only is bold but honest. Most ORMs hide this and devs panic when N+1 hits production. The real question: does your git-lrc catch AI generating SELECT * joins with zero index awareness? Because that's the #1 vaporware I see in code reviews.

Matthew Hou • Feb 26

Nice walkthrough of the VM internals. The VerifyCookie step is the kind of detail that matters more than people think — especially now that AI coding tools generate SQL without understanding the execution model underneath. I've seen AI-generated queries that look correct syntactically but perform terribly because they don't account for how the B-tree actually traverses. Your git-lrc project for checking AI-generated code is interesting in this context. Are you finding that AI-generated SQL has different failure patterns than human-written SQL? My instinct says AI tends to write correct-but-naive queries (missing indexes, unnecessary subqueries) while humans write incorrect-but-performant ones.

Teguh Coding • Feb 26

Nice share bro

Some comments may only be visible to logged-in visitors. Sign in to view all comments.