DEV Community

Cover image for Order of Execution in a SQL Query.
Abhishek Singh
Abhishek Singh

Posted on

Order of Execution in a SQL Query.

Many students, new developers, and sometimes even hard-core developers may feel perplexed when writing complex SQL queries to retrieve data.

But understanding how your SQL query engine actually executes a query clause by clause, can help create a SELECT query which is syntactically and semantically sound.

A SELECT query is made up of several clauses, and there is a specific order in which each clause is read and executed.

tl;dr

A query engine understands and execute your query in the following order of clauses:

  1. FROM
  2. JOIN / LEFT JOIN / RIGHT JOIN / FULL OUTER JOIN
  3. ON
  4. WHERE
  5. GROUP BY
  6. HAVING
  7. SELECT
  8. ORDER BY
  9. LIMIT / OFFSET

Detailed Explanation with Example

Let's try to piece together a query for a real life use case.

Consider there are 2 tables in a company's application:

Sample DB Structure

department and employee tables with a Many-to-One relationship

Date to retrieve: List top 3 departments having at-least 1000 female employees and highest average age of female employee who haven't left the company. The list should be descending order of the average age of the female employees.

Okay, so lets build our query step by step

1: FROM: Your source of the data required. The query identify which table it needs to use for obtaining the data. All the rows in that table are in contention for being retrieved.

FROM department AS dept
Enter fullscreen mode Exit fullscreen mode

2: JOIN(s): It maybe possible a part of the data you require is in other table. So if you have mentioned any JOIN clause, the query engine will take that other table , and crosses the records of both the tables.

FROM department AS dept
JOIN employee as emp
Enter fullscreen mode Exit fullscreen mode

3: ON
The condition of the JOIN is evaluated next. Only those crossed records will be kept for which the condition holds true.

FROM department AS dept
JOIN employee as emp
ON dept.id = emp.departmentId
Enter fullscreen mode Exit fullscreen mode

4: WHERE
After successfully cross matching the records through JOIN, the data may still be required for some filtering based on the requirements. So the query engine will use the WHERE clause to test and filter out some of the records.(a.k.a vertical selection)

FROM department AS dept
JOIN employee as emp
ON dept.id = emp.departmentId
WHERE emp.gender="female" AND dateOfLeaving IS NULL
Enter fullscreen mode Exit fullscreen mode

IMPORTANT:
It is advised to use the ON clause for matching/joining records (using Primary Keys/Foreign Keys)purpose only, while the WHERE clause for filtering data. This helps the query engine to optimise the JOIN execution. Also your query can be readable.

5: GROUP BY: After filtering the records, you may be required to combine/group the records based on some common column values. The GROUP BY is executed next, so the records are now converted into groups where each group will have a unique single or combination of values of the columns mentioned in the clause.

FROM department AS dept
JOIN employee as emp
ON dept.id = emp.departmentId
WHERE emp.gender="female" AND emp.dateOfLeaving IS NULL
GROUP BY dept.id
Enter fullscreen mode Exit fullscreen mode

6: HAVING: Like WHERE is used to filter out unwanted records, HAVING helps to filter out groups. The condition is used to test on each group formed (not on the records!), and only those group are accepted in the result set which satisfy the condition. This is also a good place for using aggregate functions as well.

FROM department AS dept
JOIN employee as emp
ON dept.id = emp.departmentId
WHERE emp.gender="female" AND emp.dateOfLeaving IS NULL
GROUP BY dept.id
HAVING COUNT(DISTINCT(emp.id)) >= 1000
Enter fullscreen mode Exit fullscreen mode

7: SELECT: After the result set is formed, the query engined will next use the SELECT clause to include only the columns mentioned (a.k.a Horizontal Selection). Make sure that the column names you mention will actually be present in the dataset after joining/filtering/grouping.

SELECT
    dept.id AS `depatmentId`, 
    dept.name AS `departmentName`, 
    COUNT(DISTINCT(emp.id)) AS `totalFemaleEmployees`,
    AVG(emp.age) AS `averageAge`
FROM departments AS dept
JOIN employees as emp
ON dept.id = emp.departmentId
WHERE emp.gender="female" AND emp.dateOfLeaving IS NULL
GROUP BY dept.id
HAVING COUNT(DISTINCT(emp.id)) >= 1000
Enter fullscreen mode Exit fullscreen mode

IMPORTANT
Most of the time when using GROUP BY clause, the SELECT clause will be throwing an error. It should be noted that after grouping, only those columns can be used in SELECT clause which are unique in all the groups (after filtering). If any column in any group have multiple different values, then that column can't be used in SELECT, unless used with aggregate functions.

IMPORTANT
DISTINCT in the SELECT clause is used after the SELECT clause is executed, which eliminates any record/group which have a duplicate column value.

8: ORDER BY: This will let you sort the final result set in an ascending/descending order based on the column(s) mentioned.

SELECT
    dept.id AS `depatmentId`, 
    dept.name AS `departmentName`, 
    COUNT(DISTINCT(emp.id)) AS `totalFemaleEmployees`,
    AVG(emp.age) AS `averageAge`
FROM department AS dept
JOIN employee as emp
ON dept.id = emp.departmentId
WHERE emp.gender="female" AND emp.dateOfLeaving IS NULL
GROUP BY dept.id
HAVING COUNT(DISTINCT(emp.id)) >= 1000
ORDER BY AVG(emp.age) DESC
Enter fullscreen mode Exit fullscreen mode

9: LIMIT/OFFSET: The engine uses this last clause to return back a subset of records/group. Using the OFFSET to specify how many records/groups to omit from starting and LIMIT to specify the size/no. of records/groups to be returned.

SELECT
    dept.id AS `depatmentId`, 
    dept.name AS `departmentName`, 
    COUNT(DISTINCT(emp.id)) AS `totalFemaleEmployees`,
    AVG(emp.age) AS `averageAge`
FROM department AS dept
JOIN employee as emp
ON dept.id = emp.departmentId
WHERE emp.gender="female" AND emp.dateOfLeaving IS NULL
GROUP BY dept.id
HAVING COUNT(DISTINCT(emp.id)) >= 1000
ORDER BY AVG(emp.age) DESC
LIMIT 3;
Enter fullscreen mode Exit fullscreen mode

And that's it! Not only is our query easy to read, it executes perfectly and returns back the required info.

Using this approach anyone can design and write complex queries for most user case. But not all use cases simpler compared to this, and some require to just improvise a bit with this approach.

As a practice, try creating a query for this case: List top 3 departments having at-least 1000 employees of any gender and highest average age of female employee who haven't left the company. The list should be descending order of the average age of the female employees.

Top comments (0)