Hello, I'm Maneshwar. I'm building git-lrc, an AI code reviewer that runs on every commit. It is free, unlimited, and source-available on Github. Star Us to help devs discover the project. Do give it a try and share your feedback for improving the product.
In the previous part, you saw how SQLite converts a parse tree into bytecode. At that point, SQLite knows exactly what to do and how to execute it.
But there is still one critical question left.
Is this the fastest way to do it?
That is where the query optimizer comes in.
It sits between parsing and code generation and decides how your query should actually be executed for the best performance.
Why Optimization Exists at All
Given a single SQL query, there are often multiple ways to execute it.
Take a simple example:
SELECT * FROM users WHERE age = 25;
This query can be executed in different ways:
- Scan the entire table and check each row
- Use an index on
ageto directly find matching rows
Both approaches produce the same result, but the performance difference can be massive.
The job of the optimizer is to pick the approach that produces the most efficient bytecode program.
As described, different parse trees can represent equivalent relational operations, and each can lead to different execution strategies.
The optimizer’s role is to select the one that minimizes execution time and resource usage
Plans, Not Just Queries
Internally, every SQL query is converted into a query plan.
A plan is essentially a strategy that answers:
- Which tables to access first
- Which indexes to use
- How to filter rows
- How to handle intermediate results
Each parse tree corresponds to a specific plan. The optimizer evaluates possible alternatives and chooses a plan that is efficient enough.
Finding the absolute best plan is computationally expensive, so SQLite does not try to be perfect.
Instead, it focuses on avoiding bad plans and finding a good enough plan quickly.
SQLite’s Philosophy: Frontend Does All the Work
One important design choice in SQLite is that the Virtual Machine does not optimize anything.
It simply executes bytecode instructions exactly as given.
This means all optimization must happen in the frontend, before bytecode is generated.
If the optimizer makes a poor decision, the VM will blindly execute inefficient instructions.
That is why query optimization is one of the most critical responsibilities in SQLite’s architecture
The Real Cost: Accessing Tables
The biggest cost in query execution is not computation. It is accessing data from disk.
Every time SQLite reads rows from a table, it performs I/O operations, which are expensive.
So the optimizer’s main goal is simple:
Reduce the number of rows read from base tables.
The fewer rows accessed, the faster the query runs.
Choosing Between Full Scan and Index Scan
For every table involved in a query, the optimizer must decide how to access it.
There are two main options.
Full Table Scan
SQLite reads every row in the table in rowid order.
This happens when:
- No index exists on the column being filtered
- The optimizer decides an index is not beneficial
Example:
SELECT * FROM users;
This requires scanning the entire table.
Index Scan
If an index exists, SQLite can use it to narrow down the rows.
Example:
SELECT * FROM users WHERE age = 25;
If there is an index on age, SQLite can jump directly to matching entries instead of scanning everything.
For very specific queries like:
SELECT * FROM users WHERE rowid = 2;
SQLite can directly access a single row using the table’s primary B+ tree, making the query extremely fast.
If no index exists for a condition like:
SELECT * FROM users WHERE age = 25;
SQLite has no choice but to scan the entire table and check each row individually
How Indexes Actually Work in SQLite
Each table in SQLite is stored as a B+ tree, where the key is the rowid. This is called the primary index.
In addition to that, SQLite can have secondary indexes, which are also B-trees built on other columns.
When using a secondary index, SQLite typically performs two steps:
- Search the index to find matching entries
- Extract the rowid from the index
- Use the rowid to fetch the actual row from the table
This means an indexed lookup often involves two tree searches.
However, there is an important optimization.
If all required columns are already present in the index, SQLite does not need to access the base table at all.
This avoids the second lookup and can significantly improve performance, sometimes making queries nearly twice as fast
Two Core Challenges in Optimization
For any query, the optimizer has to solve two main problems:
1. Which Plans Should Be Considered
There are many possible ways to execute a query.
The optimizer cannot explore all of them, so it uses heuristics to narrow down the options.
2. How to Estimate Cost
For each plan, SQLite estimates how expensive it will be.
Since SQLite does not maintain detailed statistics about tables, its cost estimation is relatively simple compared to larger database systems.
Despite this, it performs surprisingly well in practice.
Optimization Is Different for Different Queries
Not all queries benefit equally from optimization.
For example:
- INSERT statements have limited optimization opportunities
- Queries without a WHERE clause usually result in full table scans
Most optimization effort is focused on queries that filter data, especially SELECT statements.
Special Handling for DELETE and UPDATE
DELETE and UPDATE statements follow a slightly different execution model.
They are processed in two phases:
- SQLite identifies the rows that match the condition and stores their rowids in a temporary structure (RowSet)
- It then performs the actual deletion or update using those rowids
There is also a special optimization.
If you run:
DELETE FROM users;
SQLite uses a special opcode (OP_Clear) to wipe the entire table efficiently.
If you want to prevent this optimization, you can force a condition:
DELETE FROM users WHERE 1;
This forces SQLite to go through the normal row-by-row process
How SQLite Organizes Optimization Work
SQLite breaks queries into query blocks and optimizes each block independently.
Most of the optimization logic lives in the where.c file, which handles decisions like:
- Which indexes to use
- How to structure loops
- How to filter rows efficiently
This is the same component that works closely with the code generator to produce efficient loops for WHERE clauses.
*AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.*
Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.
⭐ Star it on GitHub:
HexmosTech
/
git-lrc
Free, Unlimited AI Code Reviews That Run on Commit
| 🇩🇰 Dansk | 🇪🇸 Español | 🇮🇷 Farsi | 🇫🇮 Suomi | 🇯🇵 日本語 | 🇳🇴 Norsk | 🇵🇹 Português | 🇷🇺 Русский | 🇦🇱 Shqip | 🇨🇳 中文 |
git-lrc
Free, Unlimited AI Code Reviews That Run on Commit
AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.
See It In Action
See git-lrc catch serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements
git-lrc-intro-60s.mp4
Why
- 🤖 AI agents silently break things. Code removed. Logic changed. Edge cases gone. You won't notice until production.
- 🔍 Catch it before it ships. AI-powered inline comments show you exactly what changed and what looks wrong.
- 🔁 Build a…
Top comments (0)