DEV Community

Cover image for Recursive Query with PostgreSQL
Taufek Johar
Taufek Johar

Posted on • Originally published at blog.taufek.dev on

Recursive Query with PostgreSQL

If in your database design, you have a resource with self reference
association, you will end up having a tree like structure when you joined them
up. In school or workplace, you might encountered tree-like chart which
is known as Organization Chart or Org Chart for short.

If I were to design the database table for this resource most probably it will
look like below:

Table Name: Employees

column data type nullable
id bigint/primary key false
full_name varchar/string false
role varchar/string false
manager_id bigint/foreign_key true

In Rails, my database migration will be as following:

class CreateEmployee < ActiveRecord::Migration[6.0]
  def change
    create_table :employees do |t|
      t.string :full_name
      t.string :role
      t.references :manager, foreign_key: { to_table: :employees }
    end
  end
end
Enter fullscreen mode Exit fullscreen mode

And my Employee model looks like below:

class Employee < ApplicationRecord
  belongs_to :manager, optional: true, class_name: 'Employee'
  has_many :subordinates, foreign_key: 'manager_id', class_name: 'Employee'
end
Enter fullscreen mode Exit fullscreen mode

Let's say I want to build a page that will be able to render a tree structure with a given person id. I have a Tree UI component which requires following data structure:

id full_name role manager_id
1 Jack Dorsey CEO null
2 Michael Montano Engineering Lead 1
3 Kayvon Beykpour Product Lead 1
4 Joy Su VP Engineering 2
5 Nick Turnow Platform Lead 2
6 Keith Coleman VP Product 3
7 Lakshmi Shankar Sr Director, Strategy & Operations 3

Source: https://theorg.com/org/twitter

Disclaimer: Above is just a partial of Twitter Org Chart which I pulled from above source. It's not complete and most probably not up-to-date.

To build above data, I will use below builder class

class OrganizationTreeBuilder
  def initialize(employee)
    @employee = employee
  end

  def build
    [@employee.attributes] + fetch_subordinates(@employee)
  end

  private

  def fetch_subordinates(employee)
    employee.subordinates.map do |subordinate|
      [subordinate.attributes] + fetch_subordinates(subordinate)
    end.flatten
  end
end
Enter fullscreen mode Exit fullscreen mode

Above builder class accepts an Employee instance via the constructor and when build method is called, it will recursively fetch the subordinates of the
employee until it reaches to the level where an employee does not have any subordinates. Each time it calls employee.subordinates, it will execute a sql
query to fetch list of subordinates. Since we execute the query recursively in app level, I'm calling this as "app recursive query".

Obviously this works but not really efficient if your have very large organization. Before we start to improve this, we will write unit tests for this builder class. Below is the spec file for above builder:

RSpec.describe OrganizationTreeBuilder do
  subject { OrganizationTreeBuilder.new(ceo).build }

  context 'simple organization tree' do
    let!(:ceo) { create(:employee) }
    let!(:dev_manager1) { create(:employee, manager: cto) }
    let!(:dev_manager2) { create(:employee, manager: cto) }
    let!(:developer1_1) { create(:employee, manager: dev_manager1) }
    let!(:developer1_2) { create(:employee, manager: dev_manager1) }
    let!(:developer2_1) { create(:employee, manager: dev_manager2) }
    let!(:developer2_2) { create(:employee, manager: dev_manager2) }
    let!(:not_relevant) { create(:employee) }

    it 'builds organization tree', :aggregate_failure do
      expect(subject).to match_array(
        [
          ceo.attributes,
          dev_manager1.attributes,
          dev_manager2.attributes,
          developer1_1.attributes,
          developer1_2.attributes,
          developer2_1.attributes,
          developer2_2.attributes
        ]
      )
      expect(subject.count).to eq 7
    end
  end
end
Enter fullscreen mode Exit fullscreen mode

Above unit test will pass when run.

To gauge the performance I'm adding below test cases to above spec.

  ...
  context 'horizontal organization tree' do
    let!(:ceo) { create(:employee) }

    before { create_employee(ceo, 10, 2) }

    it 'builds organization tree' do
      time = Benchmark.measure { expect(subject.count).to eq 111 }
      puts time.real
    end
  end

  context 'vertical organization tree' do
    let!(:ceo) { create(:employee) }

    before { create_employee(ceo, 2, 10) }

    it 'builds organization tree' do
      time = Benchmark.measure { expect(subject.count).to eq 2047 }
      puts time.real
    end
  end

  context 'large organization tree' do
    let!(:ceo) { create(:employee) }

    before { create_employee(ceo, 5, 5) }

    it 'builds organization tree' do
      time = Benchmark.measure { expect(subject.count).to eq 3906 }
      puts time.real
    end
  end

  private

  def create_employee(manager, num, level)
    return if (level -= 1) < 0

    create_list(:employee, num, manager: manager).each do |employee|
      create_employee(employee, num, level)
    end
  end
  ...
Enter fullscreen mode Exit fullscreen mode

There are 3 different performance-related test cases:

  1. Horizontal Organization Tree. Organization tree is broad and shallow. Each employee has 10 subordinates with 2 levels.
  2. Vertical Organization Tree. Organization tree is narrow and deep. Each employee has 2 subordinates with 10 levels.
  3. Large Organization Tree. Organization tree is broad and deep. Each employee has 5 subordinates with 5 levels.

Below is how long it took to build the data for each scenario:

  1. Horizontal Organization Tree. 0.3sec
  2. Vertical Organization Tree. 6.2sec
  3. Large Organization Tree. 11.3sec

Time for Optimization

PostgreSQL has build-in recursive query which is made for this. Below is updated OrganizationTreeBuilder class
with PostgreSQL recursive query.

class OrganizationTreeBuilder
  def initialize(employee)
    @employee = employee
  end

  def build
    ActiveRecord::Base.connection.execute(<<~SQL)
      WITH RECURSIVE subordinates AS (
        SELECT
         employees.id,
         employees.full_name,
         employees.role,
         employees.manager_id
        FROM
         employees
        WHERE
         employees.id = #{@employee.id}

        UNION

        SELECT
         employees.id,
         employees.full_name,
         employees.role,
         employees.manager_id
        FROM
         employees
         JOIN subordinates ON subordinates.manager_id = employees.managers.id
      ) SELECT * FROM subordinates ORDER BY id ASC;
    SQL
  end
end
Enter fullscreen mode Exit fullscreen mode

The same unit tests above will still pass since the returned data is still the same.

Below are the comparison stats between previous (app recursive query) and current (PostgreSQL recursive query).

App Recursive Query PostgreSQL Recursive Query
Horizontal Organization Tree 0.3sec 0.000005sec
Vertical Organization Tree 6.2sec 0.000005sec
Large Organization Tree 11.3sec 0.000005sec

Obviously, PostgreSQL query is the winner here. Although my test dataset is small, but it does prove the point that PostgreSQL recursive query is more performant than app recursive query.

But what does the query actually do? It does not make sense to me when I read it for the first time. Turns out there are 2 parts within the CTE (Common Table Expression) to make this recursive query to work. There is a non-recursive query being UNION with recursive query.

Below is the non-recursive query which will be use as the base:

  SELECT
   employees.id,
   employees.full_name,
   employees.role,
   employees.manager_id
  FROM
   employees managers
  WHERE
   employees.id = #{@employee.id}
Enter fullscreen mode Exit fullscreen mode

From this query you will get following output:

id full_name role manager_id
1 Jack Dorsey CEO null

Then comes the recursive query below:

  SELECT
   employees.id,
   employees.full_name,
   employees.role,
   employees.manager_id
  FROM
   employees
   JOIN subordinates ON subordinates.manager_id = employees.managers.id
Enter fullscreen mode Exit fullscreen mode

The recursive query joins with the CTE named subordinates. It will start off with base data from the non-recursive query and it will get following output:

id full_name role manager_id
2 Michael Montano Engineering Lead 1
3 Kayvon Beykpour Product Lead 1

Above are all the subordinates for employee id 1. Then it will do the same query for each of these subordinates and it will keep going until the recursive query didn't get any output. At the end it will UNION all the records you will see the end results as below:

id full_name role manager_id
1 Jack Dorsey CEO null
2 Michael Montano Engineering Lead 1
3 Kayvon Beykpour Product Lead 1
4 Joy Su VP Engineering 2
5 Nick Turnow Platform Lead 2
6 Keith Coleman VP Product 3
7 Lakshmi Shankar Sr Director, Strategy & Operations 3

This is a simplified usage of recursive query in PostgreSQL. There are few more things you could do which you could find out from the official documentation.

Bonus

I'm using output from above builder class to build below tree structure using d3.js library.

outside_polygon

Source: https://jsfiddle.net/taufek/1anLkfyg/32

Reference: https://bl.ocks.org/d3noob/8329404

Top comments (0)