SQL Tip: Self-Referential Tables

#sql #programming

Note: The examples in this post are written for MSSQL. Syntax may differ for other SQL variants.

I was recently tasked with designing a SQL database for a new app. One of the requirements was to be able to store a category tree of unknown complexity and depth. It wasn't sufficient to have categories and subcategories; subcategories could also have subcategories, which could have subcategories, and so on. Each branch of the tree could have a different structure and continue to a different depth. An example tree would look something like this:

Category 1
- Category 1.1
- Category 1.2
- Category 1.2.1
- Category 1.2.2
- Category 1.3
- Category 1.3.1
  - Category 1.3.1.1
- Category 1.3.2
Category 2
- Category 2.1
- Category 2.1.1
  - Category 2.1.1.1
  - Category 2.1.1.2
  - Category 2.1.1.2.1
- Category 2.2
- Category 2.2.1
- Category 2.2.2
- Category 2.2.3

It was clearly not going to be possible to satisfy these requirements with simple category and subcategory tables. I needed to be able to describe all the category relationships in one self-referential table.

Designing the self-referential table

Relationships in a self-referential table are defined by two fields:

id (primary key, uniqueidentifier, not nullable)
parentId (foreign key, uniqueidentifier, nullable)

Note that parentId is a foreign key that references the same table. If your table is called category, category.parentId has a foreign key constraint to category.id.

(You can also declare any other fields you might need to store the category name and other information. Those aren't relevant to this tip.)

When you insert a row into the table, you can set parentId to null to put the row at the top of the tree, or set parentId to the id of any other category to make the row a subcategory.

Selecting branches of the tree

Creating the tree is simple enough, but how do you select only the rows that are descendants of a given category id? For that, we can use a recursive common-table expression (recursive CTE).

The recursive CTE creates a virtual table ([tree]) that is the union of the rows that have a given parent, the rows that have that first set of rows as parents, the rows that have that next set of rows as parents, and so on until there are no more descendent rows to fetch. We then select all rows from the virtual table:

DECLARE @id UNIQUEIDENTIFIER;
SET @id = '<id of the category whose descendants you want to select>';

WITH [tree] AS (
    SELECT parent.* FROM [category] parent WHERE parent.id = @id
    UNION ALL
    SELECT child.* FROM [tree]
    INNER JOIN [category] child ON child.parentId = [tree].id
)

SELECT * from [tree]

We can modify the SELECT statement to fetch only the bottom-most descendants of a given category, omitting the upper levels of the tree:

DECLARE @id UNIQUEIDENTIFIER;
SET @id = '<id of the category whose descendants you want to select>';

WITH [tree] AS (
    SELECT parent.* FROM [category] parent WHERE parent.id = @id
    UNION ALL
    SELECT child.* FROM [tree]
    INNER JOIN [category] child ON child.parentId = [tree].id
)

SELECT * FROM [tree] WHERE NOT EXISTS (
    SELECT parentId FROM [category] WHERE [category].parentId = [tree].id
)

We can also invert the CTE to fetch categories that are ancestors of a given subcategory:

DECLARE @id UNIQUEIDENTIFIER;
SET @id = '<id of the subcategory whose ancestors you want to select>';

WITH [tree] AS (
    SELECT child.* FROM [category] child WHERE child.id = @id
    UNION ALL
    SELECT parent.* FROM [tree]
    INNER JOIN [category] parent ON parent.id = [tree].parentId
)

SELECT * from [tree]

By modifying the SELECT statement, we can find the top-most ancestor of any subcategory:

DECLARE @id UNIQUEIDENTIFIER;
SET @id = '<id of the subcategory whose ancestors you want to select>';

WITH [tree] AS (
    SELECT child.* FROM [category] child WHERE child.id = @id
    UNION ALL
    SELECT parent.* FROM [tree]
    INNER JOIN [category] parent ON parent.id = [tree].parentId
)

SELECT * from [tree] WHERE parentId IS NULL

This combination of a self-referential table and recursive common-table expressions provided a great solution to my requirements. The model is straightforward, and easy to maintain, and the selection of entire branches can be performed in a single query.

DEV Community

SQL Tip: Self-Referential Tables

Designing the self-referential table

Selecting branches of the tree

Top comments (0)