Recursive queries in SQL, enabled by Common Table Expressions (CTEs), allow us to work with hierarchical or recursive data structures such as employee-manager relationships, family trees, or file directories. This article introduces the WITH clause and demonstrates how recursive CTEs can simplify these operations.
What is a Common Table Expression (CTE)?
A CTE is a temporary result set defined within a WITH clause that can be referenced within the subsequent SELECT, INSERT, UPDATE, or DELETE statement. It improves query readability and is particularly useful for recursive operations.
Most Common Situations to Use CTEs
1. Breaking Down Complex Queries
CTEs allow you to break down a complex query into smaller, manageable parts. Each part of the query can be defined as a separate CTE, making the overall logic easier to follow.
Example: Multi-Step Aggregations
Imagine calculating:
- Total sales by region.
- Regions with total sales above $2000.
- Combining this data with a list of regions.
With CTE:
WITH RegionalSales AS (
SELECT Region, SUM(Amount) AS TotalSales
FROM Sales
GROUP BY Region
),
HighPerformingRegions AS (
SELECT Region
FROM RegionalSales
WHERE TotalSales > 2000
)
SELECT r.Region, rs.TotalSales
FROM Regions r
LEFT JOIN RegionalSales rs ON r.Region = rs.Region
WHERE r.Region IN (SELECT Region FROM HighPerformingRegions);
Why CTEs? Each step is isolated and easier to modify or debug, unlike deeply nested subqueries.
2. Reusability Within the Same Query
If a subquery is used multiple times within a query, a CTE can be defined once and referenced multiple times, improving readability and performance.
Example: Using the Same Subquery
Without CTE:
SELECT AVG(Salary) AS AvgSalary
FROM (
SELECT DepartmentID, AVG(Salary) AS AvgDepartmentSalary
FROM Employees
GROUP BY DepartmentID
) AS SubQuery1;
SELECT DepartmentID, AVG(Salary)
FROM Employees
GROUP BY DepartmentID
HAVING AVG(Salary) > (
SELECT AVG(Salary)
FROM (
SELECT DepartmentID, AVG(Salary) AS AvgDepartmentSalary
FROM Employees
GROUP BY DepartmentID
) AS SubQuery2
);
With CTE:
WITH DepartmentAverages AS (
SELECT DepartmentID, AVG(Salary) AS AvgDepartmentSalary
FROM Employees
GROUP BY DepartmentID
)
SELECT AVG(AvgDepartmentSalary) AS OverallAverage
FROM DepartmentAverages;
WITH DepartmentAverages AS (
SELECT DepartmentID, AVG(Salary) AS AvgDepartmentSalary
FROM Employees
GROUP BY DepartmentID
)
SELECT DepartmentID, AvgDepartmentSalary
FROM DepartmentAverages
WHERE AvgDepartmentSalary > (
SELECT AVG(AvgDepartmentSalary) FROM DepartmentAverages
);
Why CTEs? Reusability reduces redundancy and ensures consistency in logic.
3. Temporary Results Without Creating Tables
CTEs act as a temporary, inline table that exists only for the duration of the query. Unlike temporary tables, you don’t need additional DDL (e.g., CREATE TABLE) or cleanup (e.g., DROP TABLE).
4. Improved Query Readability
When working with complex queries, especially those involving multiple subqueries or joins, CTEs make the query easier to read and maintain.
Conclusion
CTEs are a powerful feature that simplifies SQL query writing, especially for recursive operations, hierarchical data, and breaking down complex logic.
Explore more
Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.
Top comments (4)
However, the name 'recursive' did get me to read the article as I thought it might have new information. When I think of recursive, I think about a single request to the database that builds a resultset, such as a BOM, from parent child relationships within the data, such that a part record that has a bill of materials table that refers to other part records. The solution here is a temporary table with a "recursion level" column and a loop that repeatedly adds child data until there are no new parts added to the table.
SELECT AVG(AvgDepartmentSalary) AS OverallAverage
FROM DepartmentAverages;
It is indeed a powerful approach in many scenarios. Explained in a simple way to udnerstand. Thanks.
It gives performance improvement, particularly, when the data set needs to be read multiple times to complete a query, with CTE, you read once.
Second scenario, even if it is one time read, if there are multiple table joins, it is better to get the narrowest data range at beginning with CTE and then join any master or other tables. With multiple tables and indexes, query optimization may not happen to the best possible way.
As pointed out by Dave B. You mention recursion in the title, but don't give an example in the article. Be careful though because recursion can be computationally expensive and it is possible to get into a loop. By default there is a limit of 100 recursion loops in MSSQL.
Another other advantage of CTEs is that they can contain window functions such as Lead/Lag/Rank/ SUM OVER etc and the results be used in the WHERE clause of the following query.
I like them because as you point out they keep logic in a separate place which makes them easier to test and also because it prevents deeply nested co-related sub-queries and inline sub-queries in the FROM clause. Moving code left is always a good practice.
It also means that if you need to shove the data into a #TempTable it is relatively easy to modify the code.
Thank for sharing