Abhishek Singh

Posted on Feb 13, 2022

Order of Execution in a SQL Query.

#database #sql #mysql #learning

Many students, new developers, and sometimes even hard-core developers may feel perplexed when writing complex SQL queries to retrieve data.

But understanding how your SQL query engine actually executes a query clause by clause, can help create a SELECT query which is syntactically and semantically sound.

A SELECT query is made up of several clauses, and there is a specific order in which each clause is read and executed.

tl;dr

A query engine understands and execute your query in the following order of clauses:

FROM
JOIN / LEFT JOIN / RIGHT JOIN / FULL OUTER JOIN
ON
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
LIMIT / OFFSET

Detailed Explanation with Example

Let's try to piece together a query for a real life use case.

Consider there are 2 tables in a company's application:

department and employee tables with a Many-to-One relationship

Date to retrieve: List top 3 departments having at-least 1000 female employees and highest average age of female employee who haven't left the company. The list should be descending order of the average age of the female employees.

Okay, so lets build our query step by step

1: FROM: Your source of the data required. The query identify which table it needs to use for obtaining the data. All the rows in that table are in contention for being retrieved.

FROM department AS dept

2: JOIN(s): It maybe possible a part of the data you require is in other table. So if you have mentioned any JOIN clause, the query engine will take that other table , and crosses the records of both the tables.

FROM department AS dept
JOIN employee as emp

3: ON
The condition of the JOIN is evaluated next. Only those crossed records will be kept for which the condition holds true.

FROM department AS dept
JOIN employee as emp
ON dept.id = emp.departmentId

4: WHERE
After successfully cross matching the records through JOIN, the data may still be required for some filtering based on the requirements. So the query engine will use the WHERE clause to test and filter out some of the records.(a.k.a vertical selection)

FROM department AS dept
JOIN employee as emp
ON dept.id = emp.departmentId
WHERE emp.gender="female" AND dateOfLeaving IS NULL

IMPORTANT:
It is advised to use the ON clause for matching/joining records (using Primary Keys/Foreign Keys)purpose only, while the WHERE clause for filtering data. This helps the query engine to optimise the JOIN execution. Also your query can be readable.

5: GROUP BY: After filtering the records, you may be required to combine/group the records based on some common column values. The GROUP BY is executed next, so the records are now converted into groups where each group will have a unique single or combination of values of the columns mentioned in the clause.

FROM department AS dept
JOIN employee as emp
ON dept.id = emp.departmentId
WHERE emp.gender="female" AND emp.dateOfLeaving IS NULL
GROUP BY dept.id

6: HAVING: Like WHERE is used to filter out unwanted records, HAVING helps to filter out groups. The condition is used to test on each group formed (not on the records!), and only those group are accepted in the result set which satisfy the condition. This is also a good place for using aggregate functions as well.

FROM department AS dept
JOIN employee as emp
ON dept.id = emp.departmentId
WHERE emp.gender="female" AND emp.dateOfLeaving IS NULL
GROUP BY dept.id
HAVING COUNT(DISTINCT(emp.id)) >= 1000

7: SELECT: After the result set is formed, the query engined will next use the SELECT clause to include only the columns mentioned (a.k.a Horizontal Selection). Make sure that the column names you mention will actually be present in the dataset after joining/filtering/grouping.

SELECT
    dept.id AS `depatmentId`, 
    dept.name AS `departmentName`, 
    COUNT(DISTINCT(emp.id)) AS `totalFemaleEmployees`,
    AVG(emp.age) AS `averageAge`
FROM departments AS dept
JOIN employees as emp
ON dept.id = emp.departmentId
WHERE emp.gender="female" AND emp.dateOfLeaving IS NULL
GROUP BY dept.id
HAVING COUNT(DISTINCT(emp.id)) >= 1000

IMPORTANT
Most of the time when using GROUP BY clause, the SELECT clause will be throwing an error. It should be noted that after grouping, only those columns can be used in SELECT clause which are unique in all the groups (after filtering). If any column in any group have multiple different values, then that column can't be used in SELECT, unless used with aggregate functions.

IMPORTANT
DISTINCT in the SELECT clause is used after the SELECT clause is executed, which eliminates any record/group which have a duplicate column value.

8: ORDER BY: This will let you sort the final result set in an ascending/descending order based on the column(s) mentioned.

SELECT
    dept.id AS `depatmentId`, 
    dept.name AS `departmentName`, 
    COUNT(DISTINCT(emp.id)) AS `totalFemaleEmployees`,
    AVG(emp.age) AS `averageAge`
FROM department AS dept
JOIN employee as emp
ON dept.id = emp.departmentId
WHERE emp.gender="female" AND emp.dateOfLeaving IS NULL
GROUP BY dept.id
HAVING COUNT(DISTINCT(emp.id)) >= 1000
ORDER BY AVG(emp.age) DESC

9: LIMIT/OFFSET: The engine uses this last clause to return back a subset of records/group. Using the OFFSET to specify how many records/groups to omit from starting and LIMIT to specify the size/no. of records/groups to be returned.

SELECT
    dept.id AS `depatmentId`, 
    dept.name AS `departmentName`, 
    COUNT(DISTINCT(emp.id)) AS `totalFemaleEmployees`,
    AVG(emp.age) AS `averageAge`
FROM department AS dept
JOIN employee as emp
ON dept.id = emp.departmentId
WHERE emp.gender="female" AND emp.dateOfLeaving IS NULL
GROUP BY dept.id
HAVING COUNT(DISTINCT(emp.id)) >= 1000
ORDER BY AVG(emp.age) DESC
LIMIT 3;

And that's it! Not only is our query easy to read, it executes perfectly and returns back the required info.

Using this approach anyone can design and write complex queries for most user case. But not all use cases simpler compared to this, and some require to just improvise a bit with this approach.

As a practice, try creating a query for this case: List top 3 departments having at-least 1000 employees of any gender and highest average age of female employee who haven't left the company. The list should be descending order of the average age of the female employees.

DEV Community

Order of Execution in a SQL Query.

tl;dr

Detailed Explanation with Example

Top comments (0)