Note: The examples in this post are written for MSSQL. Syntax may differ for other SQL variants.
I was recently tasked with designing a SQL database for a new app. One of the requirements was to be able to store a category tree of unknown complexity and depth. It wasn't sufficient to have categories and subcategories; subcategories could also have subcategories, which could have subcategories, and so on. Each branch of the tree could have a different structure and continue to a different depth. An example tree would look something like this:
- Category 1
- Category 1.1
- Category 1.2
- Category 1.2.1
- Category 1.2.2
- Category 1.3
- Category 1.3.1
- Category 1.3.1.1
- Category 1.3.2
- Category 2
- Category 2.1
- Category 2.1.1
- Category 2.1.1.1
- Category 2.1.1.2
- Category 2.1.1.2.1
- Category 2.2
- Category 2.2.1
- Category 2.2.2
- Category 2.2.3
It was clearly not going to be possible to satisfy these requirements with simple category
and subcategory
tables. I needed to be able to describe all the category relationships in one self-referential table.
Designing the self-referential table
Relationships in a self-referential table are defined by two fields:
-
id
(primary key, uniqueidentifier, not nullable) -
parentId
(foreign key, uniqueidentifier, nullable)
Note that parentId
is a foreign key that references the same table. If your table is called category
, category.parentId
has a foreign key constraint to category.id
.
(You can also declare any other fields you might need to store the category name and other information. Those aren't relevant to this tip.)
When you insert a row into the table, you can set parentId
to null
to put the row at the top of the tree, or set parentId
to the id
of any other category to make the row a subcategory.
Selecting branches of the tree
Creating the tree is simple enough, but how do you select only the rows that are descendants of a given category id? For that, we can use a recursive common-table expression (recursive CTE).
The recursive CTE creates a virtual table ([tree]
) that is the union of the rows that have a given parent, the rows that have that first set of rows as parents, the rows that have that next set of rows as parents, and so on until there are no more descendent rows to fetch. We then select all rows from the virtual table:
DECLARE @id UNIQUEIDENTIFIER;
SET @id = '<id of the category whose descendants you want to select>';
WITH [tree] AS (
SELECT parent.* FROM [category] parent WHERE parent.id = @id
UNION ALL
SELECT child.* FROM [tree]
INNER JOIN [category] child ON child.parentId = [tree].id
)
SELECT * from [tree]
We can modify the SELECT
statement to fetch only the bottom-most descendants of a given category, omitting the upper levels of the tree:
DECLARE @id UNIQUEIDENTIFIER;
SET @id = '<id of the category whose descendants you want to select>';
WITH [tree] AS (
SELECT parent.* FROM [category] parent WHERE parent.id = @id
UNION ALL
SELECT child.* FROM [tree]
INNER JOIN [category] child ON child.parentId = [tree].id
)
SELECT * FROM [tree] WHERE NOT EXISTS (
SELECT parentId FROM [category] WHERE [category].parentId = [tree].id
)
We can also invert the CTE to fetch categories that are ancestors of a given subcategory:
DECLARE @id UNIQUEIDENTIFIER;
SET @id = '<id of the subcategory whose ancestors you want to select>';
WITH [tree] AS (
SELECT child.* FROM [category] child WHERE child.id = @id
UNION ALL
SELECT parent.* FROM [tree]
INNER JOIN [category] parent ON parent.id = [tree].parentId
)
SELECT * from [tree]
By modifying the SELECT
statement, we can find the top-most ancestor of any subcategory:
DECLARE @id UNIQUEIDENTIFIER;
SET @id = '<id of the subcategory whose ancestors you want to select>';
WITH [tree] AS (
SELECT child.* FROM [category] child WHERE child.id = @id
UNION ALL
SELECT parent.* FROM [tree]
INNER JOIN [category] parent ON parent.id = [tree].parentId
)
SELECT * from [tree] WHERE parentId IS NULL
This combination of a self-referential table and recursive common-table expressions provided a great solution to my requirements. The model is straightforward, and easy to maintain, and the selection of entire branches can be performed in a single query.
Top comments (0)