Hello, I'm Maneshwar. I'm building git-lrc, an AI code reviewer that runs on every commit. It is free, unlimited, and source-available on Github. Star Us to help devs discover the project. Do give it a try and share your feedback for improving the product.
In the previous part, you saw how SQLite selects indexes and balances filtering with sorting.
Now we move into another important category of queries.
Aggregation and subqueries.
These introduce new challenges because SQLite is no longer just filtering rows.
It is grouping, transforming, and sometimes restructuring queries entirely.
How SQLite Executes GROUP BY
When you use a GROUP BY clause, SQLite introduces a special internal structure called an aggregator.
This is essentially a temporary table that stores:
- A key → formed by GROUP BY columns
- A value → aggregate data like COUNT, SUM, etc.
For example:
SELECT department, COUNT(*) FROM employees GROUP BY department;
SQLite processes this in two phases.
First, it scans rows and builds groups.
Then, it produces the final results.
The execution pattern looks like this:
where-begin
compute group-by key
focus on the group-by key
update aggregate terms
where-end
foreach group-by
compute result-set
send result to caller
end-foreach
During the first phase, SQLite keeps updating aggregate values for each group.
During the second phase, it outputs the final computed results for each group
This approach ensures that rows are grouped efficiently without repeatedly scanning the same data.
The Problem with Subqueries in FROM
Now consider queries that use subqueries inside the FROM clause.
Example:
SELECT a FROM (
SELECT x + y AS a FROM t1 WHERE z < 100
) WHERE a > 5;
The default way to execute this is:
- Run the inner query
- Store results in a temporary table
- Run the outer query on that table
- Delete the temporary table
This approach has a major drawback.
The temporary table has no indexes, so any filtering or joining done by the outer query becomes inefficient.
It also requires scanning data multiple times.
Subquery Flattening: A Smarter Approach
To avoid this overhead, SQLite uses an optimization called subquery flattening.
Instead of executing the subquery separately, SQLite merges it into the outer query.
The previous example becomes:
SELECT x + y AS a FROM t1 WHERE z < 100 AND a > 5;
Now the query can be executed in a single pass over the table.
This has two major benefits:
- Eliminates temporary tables
- Allows indexes on the base table to be used
This significantly improves performance
When Flattening Is Allowed
Flattening is not always possible.
SQLite applies this optimization only when a strict set of conditions is satisfied.
- The subquery and the outer query do not both use aggregates.
- The subquery is not an aggregate or the outer query is not a join.
- The subquery is not the right operand of a left outer join.
- The subquery is not DISTINCT or the outer query is not a join.
- The subquery is not DISTINCT or the outer query does not use aggregates.
- The subquery does not use aggregates or the outer query is not DISTINCT.
- The subquery has a FROM clause.
- The subquery does not use LIMIT or the outer query is not a join.
- The subquery does not use LIMIT or the outer query does not use aggregates.
- The subquery does not use aggregates or the outer query does not use LIMIT.
- The subquery and the outer query do not both have ORDER BY clauses.
- The subquery and outer query do not both use LIMIT.
- The subquery does not use OFFSET.
- The outer query is not part of a compound select or the subquery does not have both an ORDER BY and a LIMIT clause
- The outer query is not an aggregate or the subquery does not contain ORDER BY
- The sub-query is not a compound select, or it is a UNION ALL compound clause made up entirely of non-aggregate queries, and the parent query: • is not itself part of a compound select, • is not an aggregate or DISTINCT query, and • has no other tables or sub-selects in the FROM clause. The parent and sub-query may contain WHERE clauses. Subject to rules (11), (12) and (13), they may also contain ORDER BY, LIMIT and OFFSET clauses.
- If the sub-query is a compound select, then all terms of the ORDER by clause of the parent must be simple references to columns of the sub-query.
- The subquery does not use LIMIT or the outer query does not have a WHERE clause
- If the sub-query is a compound select, then it must not use an ORDER BY clause.
Query flattening is an important optimization when views are used because each use of a view is translated into a su bquery.
Fast MIN and MAX Queries
Aggregation is not always expensive.
SQLite has a very efficient optimization for queries like:
SELECT MIN(age) FROM users;
or
SELECT MAX(age) FROM users;
If there is no index on the column, SQLite must scan the entire table.
But if an index exists, SQLite can do something much faster.
It directly navigates to:
- The first entry in the index for MIN
- The last entry in the index for MAX
Since indexes are stored as B-trees, this operation takes logarithmic time, not linear time.
If the column is an INTEGER PRIMARY KEY, SQLite can even use the table’s primary B+ tree directly.
This makes MIN and MAX queries extremely efficient when proper indexing is in place
Bringing It All Together
At this point, you have seen how SQLite handles:
- Filtering using WHERE
- Choosing indexes
- Ordering joins
- Grouping results
- Flattening subqueries
- Optimizing aggregations
All of this happens before execution, inside the frontend.
By the time the Virtual Machine runs, everything has already been carefully planned and optimized.
Final Thoughts on the Frontend
The SQLite frontend is a complete pipeline:
- The tokenizer breaks SQL into tokens
- The parser builds structured representations
- The optimizer reshapes queries for efficiency
- The code generator produces executable bytecode
All of this work is triggered by a single function call:
sqlite3_prepare()
Query optimization remains one of the most complex and delicate parts of any database system.
SQLite keeps things relatively simple by using heuristics instead of heavy statistical models, but still manages to achieve strong performance in most real-world scenarios
What’s Next
We have now covered the entire frontend pipeline of SQLite, from raw SQL to optimized bytecode.
In the next series, we will move beyond compilation and explore how SQLite interacts with the outside world.
We will start with the SQLite Interface Handler, where queries enter the system and results are returned.
*AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.*
Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.
⭐ Star it on GitHub:
HexmosTech
/
git-lrc
Free, Unlimited AI Code Reviews That Run on Commit
| 🇩🇰 Dansk | 🇪🇸 Español | 🇮🇷 Farsi | 🇫🇮 Suomi | 🇯🇵 日本語 | 🇳🇴 Norsk | 🇵🇹 Português | 🇷🇺 Русский | 🇦🇱 Shqip | 🇨🇳 中文 |
git-lrc
Free, Unlimited AI Code Reviews That Run on Commit
AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.
See It In Action
See git-lrc catch serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements
git-lrc-intro-60s.mp4
Why
- 🤖 AI agents silently break things. Code removed. Logic changed. Edge cases gone. You won't notice until production.
- 🔍 Catch it before it ships. AI-powered inline comments show you exactly what changed and what looks wrong.
- 🔁 Build a…
Top comments (1)
Solid approach. I've found that the gap between 'works in demo' and 'works reliably in production' is exactly these kinds of details. Thanks for sharing the implementation specifics.