Modelling a tree data structure on postgres and using GraphQL to add and get data from the tree.
When are tree data structures used ?
Recursive or Tree Data Structures are commonly used to model your schema to store data that has a hierarchy.
Common use-cases of tree data structures include:
- Storing a directory structure for an application like Google Drive where you would have a root directory. Within this root directory you may have other directories or files. Every directory at any level of nesting can have any number of files or directories inside it.
- Comment threads for blogs or forums where each post or topic can have comments and each comment can in turn have comments (or replies).
- Finding common connections or friends for a networking application like LinkedIn or Facebook.
In this blog post we are going to take a look at how you can model your database to build a comment thread feature for your application. Moreover, we will also be using GraphQL to add and delete comments.
Let’s take an example of a very simple blog and work on enabling comments for each post. To elaborate on what we need
- Every post can have comments
- Every comment can have replies (child comments)
- We want to list these comments in an ordered fashion sorted by when they were created.
We are going to go with Hasura GraphQL engine as it provides a Postgres database with instant GraphQL queries.
Getting started
We will use the Hasura GraphQL engine for instantly getting GraphQL APIs over Postgres. Click on the button below to deploy the GraphQL engine to Heroku’s free tier.
This will deploy the graphql-engine
to Heroku. You might need to create a Heroku account if you don’t have one. The graphql-engine
will be running at https://your-app.herokuapp.com
(replace your-app
with your heroku app name).
API Console
Every Hasura cluster comes with an API Console
that you can use to build the backend for your application.
The API Console is available at https://your-app.herokuapp.com
/console
Creating a table
Head to the Data
tab and click on Create Table
to create a new table
Let’s call our table post_comments
to store the comments for each post. The table will have the following columns
-
id
Integer(auto increment) Primary Key -
parent_id
Integer Nullable -
comment
Text -
created_at
Timestamp default now() -
post_id
Integer -
user_id
Integer
Hit the create
button to create the table.
This is the table where we will store all the comments for each of the blog posts.
Creating a self reference
Next, let’s define a foreign key constraint on the parent_id
column to the id
column.
To do this, head to the Modify
tab and click on the Edit
button next to parent_id
. Check the Foreign Key
checkbox, select post_comments
as the reference table and id
as the reference column.
Adding a relationship to fetch the child comments
Next, click on the Relationship
tab and click on the Add
button under the Suggested Array Relationship
column. Name the relationship children_comments
Hit Save
to add this relationship.
Fetching and adding comments
The mutation to insert a comment will be
mutation add_comment {
insert_post_comments(objects: $objects) {
returning{
id
parent_id
comment
created_at
user_id
post_id
}
}
}
The $objects
variable will be
{
"objects": [
{
"user_id": 1,
"post_id": 1,
"comment": "First comment on post 1",
"parent_id": null
}
]
}
Similarly, the $objects
variable to add a reply to a comment
{
"objects": [
{
"user_id": 1,
"post_id": 1,
"comment": "First comment on post 1",
"parent_id": 1 //Or id of the comment to which this comment is a reply to
}
]
}
Fetching comments
If we are aware of the level of nesting in our comments, then the GraphQL query to fetch comments and all children comments for a post would be
query get_comments_for_post {
post_comments(
where: {
post_id: 1
parent_id: null
}
order_by: ["+created_at"]
) {
id
parent_id
comment
created_at
user_id
post_id
children_comments (
order_by: ["+created_at"]
){
id
parent_id
comment
created_at
user_id
post_id
}
}
}
Here, we are fetching all the comments for a post with an id
value of 1
and whose parent_id
is null
. We are then fetching all the replies (children_comments
) as a relationship. The +created_at
in the order_by
field denotes that the comments should be fetched in the ascending order (based on the value of created_at
). Alternatively, -
would denote descending. If no symbol is specified in the query, then +
is assumed by default.
Similarly, in case you had another level of nesting, the query would be
query get_comments_for_post {
post_comments(
where: {
post_id: 2
parent_id: null
}
order_by: ["+created_at"]
) {
id
parent_id
comment
created_at
user_id
post_id
children_comments (
order_by: ["+created_at"]
){
id
parent_id
comment
created_at
user_id
post_id
children_comments(
order_by: ["+created_at"]
) {
id
parent_id
comment
created_at
user_id
post_id
}
}
}
}
Working with unknown levels of nesting
Sites like https://news.ycombinator.com/ allow any level of nesting. Which means that every comment can have one more child comments. In this case, fetching our comment like we did above does not work since we do not know how many levels we need to fetch.
One of the ways of handling this is to fetch the complete list of comments for the particular topic (in this case a blog post) and then arrange it in memory on your client.
query get_comments_for_post {
post_comments(
where: {
post_id: 1
}
order_by: ["created_at"]
) {
id
parent_id
comment
created_at
user_id
post_id
}
}
You could also have another table that keeps a track of ancestry, something like post_comment_ancestry
comment_id
ancestor_id
Here, for each comment you will store a list of all of its ancestors. For eg: if comment A
has two child comments B
and C
and comment C has a child comment D
,
A
| - B
| - C
| - D
the entry in the post_comment_ancestry
table would be
+------------------------+
|comment_id | ancestor_id|
+------------------------+
| B | A |
| C | A |
| D | C |
| D | A |
+------------------------+
As you can see, comment D
has two entries for A
and C
respectively. Using this table you can fetch a list of all child comments to any arbitrary amount of nesting for a particular comment.
Conclusion
In this blog post we took a look at one of the ways in which we can work with tree data structures on Postgres.
If you would like to see any other use-cases or suggest improvements to the ideas mentioned above, let me know in the comments.
Hasura gives you instant realtime GraphQL APIs over any Postgres database without having to write any backend code.
For those of you who are new to the Hasura GraphQL engine, this is a good place to get started.
Top comments (0)