Arnav Sharma

Posted on Aug 23

How SQL Queries Really Work: From Parsing to Execution

#programming #database #sql #learning

Ever wondered what happens after you hit run on a SQL query? Here’s a clear journey from parsing to execution.

Introduction

If you’ve ever written a SQL query and hit run, you might assume the database just reads it top-to-bottom and executes it line by line. But under the hood, something much more sophisticated happens.

Every SQL query goes through a three-step journey before you see results:

Parse → Optimize → Execute

Understanding this process not only deepens your knowledge of databases but also helps you write faster queries, debug performance issues, and make the most of indexes and query planners.

1. Query Parsing

What Happens During Parsing?

When you send a SQL query to the database, the first thing it does is parse it. Parsing ensures your query is valid and can be understood by the database engine.

Syntax Check:

The database checks if your SQL follows proper syntax. If you miss a semicolon or misuse a keyword, parsing fails immediately.
Validation Against Schema:

The query is checked against the database schema. For example, does the table users exist? Does the column age belong to it?

Parse Tree Generation

If everything checks out, the query is transformed into a parse tree — a structured representation of the query. Instead of being just text, it becomes a hierarchical tree that the database can work with.

Think of this as turning your SQL into a machine-readable structure.

2. Query Optimization

Once parsed, the query moves to the optimizer — the brain of the database.

Logical Optimization

The optimizer rewrites the query to simplify it without changing its meaning. Examples:

Removing unnecessary filters
Reordering operations for efficiency
Simplifying expressions (WHERE age > 18 AND age > 10 → WHERE age > 18)

Cost-Based Optimization

Here’s where the database really shines. The optimizer estimates the cost of different ways to execute your query — measuring factors like disk I/O, CPU usage, and memory.

It then chooses the most efficient strategy, considering:

Indexes: Should it scan sequentially or use an index?
Join Orders: Should users be joined with orders first, or the other way around?
Access Paths: Which path to data retrieval is fastest?

Execution Plan Creation

The optimizer produces an execution plan — a detailed blueprint of how the query will run.

Developers can peek into this with commands like:

EXPLAIN SELECT * FROM users WHERE age > 30;

The plan shows whether the database will do an index scan, sequential scan, or some other strategy.

3. Query Execution

Now the plan is handed off to the execution engine, which actually retrieves the data.

Execution Engine Role

The engine takes the execution plan and performs step-by-step operations.

Key Steps in Execution

Table Scans:

-   _Sequential Scan_: Reads the entire table row by row.

-   _Index Scan_: Uses an index to quickly jump to matching rows.

Applying Filters and Projections:

Apply the WHERE condition (age > 30) and select only the requested columns.
Joins and Aggregations:

If your query has joins or aggregates (SUM, COUNT, GROUP BY), these are computed at this stage.
Returning Results to the Client:

Results are streamed back row by row, not all at once, which allows large result sets to be handled efficiently.

4. Real-World Example

Let’s walk through a simple query:

SELECT * FROM users WHERE age > 30;

Parsing:

The query is checked for valid syntax, confirms that the users table exists, and that the age column is valid.
Optimization:

The optimizer considers:
- Sequential scan (read all rows, filter age > 30)
- Index scan (if an index exists on age, jump directly to rows > 30) It picks the cheaper one.
Execution:

The chosen plan runs. Rows with age > 30 are fetched, filtered, and returned to the client.

Perfect — let’s expand Section 5 with the actual execution order of SQL queries, which is very different from the way we write them. Here’s the revised section with a clear step-by-step execution flow:

5. Common Misconceptions About SQL Execution

“SQL just runs top to bottom” → Wrong.

SQL is declarative: you describe what you want, not how to get it. The optimizer and execution engine decide the how.

Logical Order vs Execution Order

When you write a query, it looks like this:

SELECT column_list
FROM table
WHERE condition
GROUP BY grouping_column
HAVING group_condition
ORDER BY column_list
LIMIT n;

But internally, the database executes it in a different order:

Actual Execution Order

FROM – Identify the source tables.
JOIN / ON – Combine tables based on join conditions.
WHERE – Filter rows before grouping.
GROUP BY – Group rows into buckets.
HAVING – Filter groups.
SELECT – Choose which columns/expressions to return.
DISTINCT – Remove duplicates if requested.
ORDER BY – Sort the results.
LIMIT / OFFSET – Return only a subset of rows.

👉 Example:

SELECT name, COUNT(*)
FROM users
WHERE age > 30
GROUP BY name
HAVING COUNT(*) > 2
ORDER BY name ASC
LIMIT 5;

Rows are pulled from users (FROM).
Filtered to only those with age > 30 (WHERE).
Grouped by name (GROUP BY).
Groups with fewer than 2 rows are removed (HAVING).
Final columns are selected (SELECT).
Results sorted by name (ORDER BY).
Top 5 returned (LIMIT).

This is why sometimes putting a condition in WHERE vs HAVING makes a big difference — because row filters happen before grouping, while group filters happen after grouping.

6. Why This Knowledge Helps Developers

Write Efficient Queries:

Knowing how queries are parsed and optimized helps you design queries the optimizer can handle well.
Debug Performance Issues:

Slow query? Run EXPLAIN to see if the optimizer picked a bad plan.
Understand Indexes and Execution Plans:

Indexes don’t always help. Understanding execution flow helps you decide when to add (or avoid) them.

Conclusion

SQL queries don’t just run. They go through a journey: