Derek Xiao for Arctype

Posted on Dec 31, 2020 • Originally published at blog.arctype.com on Dec 30, 2020

Becoming a SQL Guru with Recursive CTEs

#postgres #database #sql

Arctype writer: Daniel Lifflander

Intro to CTEs
Organizing Complex Queries
Unlocking the Power of CTEs with Recursion
Recursive CTEs in Action

Intro to CTEs

It's easy to get lost in a maze of subqueries, derived tables, and temporary tables when working with complex SQL queries. Postgres' common table expressions, or CTEs, are a lesser known feature, but they can be a useful structuring tool to craft a SQL query in a more readable and friendly way.

CTEs begin as a with statement and allow you to execute a query whose results will be available later to subsequent queries. If you have a query that relies on the results of another query, a reference of sorts, the CTE will allow Postgres to materialize the results of the reference query, ensuring that the query is only run once and its results are readily available to other queries farther down the line. (Note that this can be a double-edged sword, as it prevents the Postgres query planner from running optimization outside of the reference query’s SQL.)

The sequential list of "code blocks" created by CTEs adds readability to SQL scripts because every block is dependent on the ones above it. In contrast, it can be difficult to trace dependencies between derived tables and parse through nested subqueries (especially as their depth increases).

Organizing Complex Queries

Derived Table Example

Temporary Table Example

This will require two separate queries.

CTE Example

Note that the query above depends on a, which is listed first with the CTE, but as a derived table it is listed inside the query. CTEs allow you to logically order queries that are dependent on others. As queries get more complex, organization becomes key to reducing errors. CTEs allow you to organize code into distinct blocks with a clear dependency hierarchy.

Unlocking the Power of CTEs with Recursion

You can also make CTEs more than syntactic sugar by adding the recursive keyword! Once you add the recursive keyword, Postgres will now allow you to reference your CTE from within itself, allowing you to “generate” rows based on recursion.

Here’s an example using Postgres CTEs to implement the Fibonacci sequence:

Running this, we’d get the first 10:

It's probably not too often we find ourselves in a situation where we need Fibonacci numbers in SQL. Let’s continue on and take a look at a real problem I solved using this Postgres feature.

Recursive CTEs in Action

In the manufacturing industry, a bill of materials (or BoM) is a list of the raw materials, sub-assemblies, intermediate assemblies, sub-components, parts, and the quantities of each needed to manufacture an end product … BOMs are of hierarchical nature, with the top level representing the finished product which may be a sub-assembly or a completed item. BOMs that describe the sub-assemblies are referred to as modular BOMs

More simply, a BoM is a list of parts that a product is assembled from. Each part may consist of other parts, creating hierarchy. A car is a final product that is composed of many parts. A door is an example of a part of a car, its subparts being among a window, handle, various switches, and so forth. Recursive CTEs are great for hierarchical data! Let’s take a look at how we could model this hierarchy and use CTEs to generate a modular BoM sheet.

part table stores our parts.

part_hierarchy table stores a simple parent to child reference between parts, as well as a quantity if the part happens to need more than 1 of the child part for assembly. The check constraints prevent a part from referencing itself circularly.

Let's insert some parts

And some relations, approximately resembling the car example mentioned above

In our database we now have 5 parts and some relationships between them. If we wanted to see how many direct subassemblies a door has, we could query something like this, though it would only show us direct descendants:

Finally, let's now construct a query to list all of the parts and their subassemblies of our car.

The union all is the key part to this CTE. The first part of this UNION selects the base part, the second half selects any child parts by joining to the part_hierarchy table and calling itself (inducing recursion), then combines those results with the parent via the union. Without recursion, you'd have to add a join for every level of the hierarchy you needed to support.

The level value is introduced here to track how deep the recursion has run. It is hard coded at 0 to represent the parent. As the query "unwraps" and it calls itself, this increases by 1 to indicate how many generations far from the parent part the child part is.

The very last part that selects from the CTE pads the name with spaces based on the level, which is a quick and dirty way to visualize the hierarchy.

From here, you could use aggregate functions and rollup to calculate the costs of specific assemblies.

This example is definitely simplified from what I ended up using to actually solve this problem. Supporting part versioning and branching and other business requirements quickly complicates what this solution would look like in the real world. Recursive CTEs in SQL are not the only way to store and query hierarchical data, but for certain data types, sizes, and speed requirements, they are a quick and convenient feature to have in your wheelhouse.

Arctype

Now you're one step closer to becoming a SQL guru, but don't let an outdated SQL editor hold you back. Check out Arctype today and experience the modern SQL editor.

DEV Community