DEV Community

Mike
Mike

Posted on

Recursion is difficult: PostgreSQL Edition

This was originally posted on my blog, where you'll find me discussing everything from lunch to engineering, system administration, and running my own ISP!

Arguably one of the most difficult things to get a grasp on, no matter the experience level the developer has, is recursion. When implemented properly, recursion is beautiful, simple, and can be entirely stateless.

Now imagine implementing recursive functions successfully in PostgreSQL that are capable of maintaining themselves via various triggers. This premise is beautiful, a one-deploy function and trigger that can detect updates in tables, and trigger a function. This function accepts a parameter of the OLD state, and NEW state – seems simple enough, or is it?

Mocking up recursive functions

I've always found mocking the intended recursive function on paper is the best way to get the full picture, in most cases you can't reliably see recursion in your head. I usually default to my most-used language PHP when writing the function on paper, for example if I wanted to take the OLD state, and NEW state, and check if the id_link number changed, it would look like this:

function onUpdate( \psqltable $previous, \psqltable $next ) {
    if( $previous->get('id_link') !== $next->get('id_link') ) {
         // this is now linked to a new parent element, let's rebuild
         rebuild( $previous, $next );
    }
}

Our end goal is, if we're linked to a new "id_link" tag, we want to rebuild an entire tree structure, deleting and re-inserting new nodes as things change, all automatically via triggers. Let's dive into how this may look...

function rebuild( &$previous, &$next ) {
    $children_prev = $previous->children();
    if( count( $children_prev ) > 0 ) rebuild( $children_prev, $next ); // recursion one
    $previous->getCurrent()->reinsert()->change('id_link', $next->get('id_link'))->change('last_updated', time())->apply();
}

If we begin to pull apart this function, we can see we get all our children for our current node, if we have any, we call rebuild again for n children, until we've reached the bottom of the stack. We then get our current index in the object, change our id_link, last_updated fields to the parents id_link and time respectively.

On the surface this seems pretty simple, so let's give it a kick at the can in PSQL this time!

PSQL Mockup

Disclosure: By no means a PSQL expert!

Starting simple with our most basic triggers:

  DROP TRIGGER IF EXISTS trigger_for_changes ON reference_detail;
  CREATE TRIGGER trigger_for_changes BEFORE INSERT OR UPDATE OR DELETE ON reference_detail FOR EACH ROW EXECUTE PROCEDURE reference_detail_changed();

  CREATE OR REPLACE FUNCTION reference_detail_changed() RETURNS trigger language plpgsql AS $$
    BEGIN
        IF TG_OP = 'UPDATE' OR TG_OP = 'INSERT' THEN
            PERFORM adapt_change( OLD.id_link, NEW.id_link );
            RETURN NEW;
        ELSE
            PERFORM adapt_change( OLD.id_link, OLD.id_link );
            RETURN OLD;
        END IF;
    RETURN NEW;
    END;
  $$;

I'd decided in implementation to change it around, instead just pass our ID_LINK to search in the header table, which contains metadata of all current objects. Let's give our function an attempt:

 CREATE OR REPLACE FUNCTION adapt_change( old_id_link VARCHAR, new_id_link VARCHAR ) returns integer
  language plpgsql
  as $$
    DECLARE
      meta_row RECORD;
      ref_detail RECORD;
    BEGIN
      IF old_id_link == new_id_link THEN
        DELETE FROM reference_detail WHERE id_link = old_id_link;
        FOR meta_row IN SELECT id_curr_link FROM reference_meta WHERE id_meta_link = old_id_link LOOP
          DELETE FROM reference_meta WHERE id_meta_link = old_id_link; -- clear out meta record
          PERFORM adapt_change( ref_detail.rm.id_link, ref_detail.rm.id_link ); -- parent was destroyed, time to delete ourselves!
        END LOOP;
      ELSE
        UPDATE reference_detail SET id_link = new_id_link WHERE id_link = old_id_link;
        UPDATE reference_meta SET id_curr_link = new_id_link, id_meta_link = new_id_link WHERE id_link = old_id_link;
        FOR meta_row IN SELECT rowid, id_curr_link FROM reference_meta WHERE id_meta_link = old_id_link LOOP
          UPDATE reference_detail SET id_link = new_id_link WHERE id_link = id_curr_link;
          UPDATE reference_meta SET id_curr_link = new_id_link, id_meta_link = new_id_link WHERE id_curr_link = id_curr_link AND rowid = meta_row.rowid;
        END LOOP;
      END IF;
      RETURN 1;
    END;
  $$;

Wowsers!

It's a mess, but it's fully recursive, let me break it down some, starting with our first IF condition, we check if our old id matches our new id, if that's the case, we're a DELETE operation, so we'll delete ourselves, and our child tree as we no longer need them.

Our ELSE condition gets much more interesting, let's take an isolated look at it here:

UPDATE reference_detail SET id_link = new_id_link WHERE id_link = old_id_link;
UPDATE reference_meta SET id_curr_link = new_id_link, id_meta_link = new_id_link WHERE id_link = old_id_link;
FOR meta_row IN SELECT rowid, id_curr_link FROM reference_meta WHERE id_meta_link = old_id_link LOOP
  UPDATE reference_detail SET id_link = new_id_link WHERE id_link = id_curr_link;
  UPDATE reference_meta SET id_curr_link = new_id_link, id_meta_link = new_id_link WHERE id_curr_link = id_curr_link AND rowid = meta_row.rowid;
END LOOP;

Note: For those who are unaware, PERFORM is used as we don't care for the returned value in this case.

First, we update our record in the detail and meta table to match our new parent. Then we begin iterating on our meta_row object, getting the current (id_curr_link) from the meta table, when our id_meta_link matches the ID we just were a child node – then we begin recursion again, and begin to alter that record and all it's children.

It's by no means a perfect function, but for the use case I have currently, which is wanting to delete/update trees of data, it seems to just fit the bill. Performance wise, it's pretty snappy, on average takes less than one second from start to finish.

This is a basic implementation of a self-managing database table, ran by INSERT/UPDATE/DELETE triggers, and recursive functions. Pretty neat, if you ask me – it's nice to have the database maintain itself to a degree, it takes the headway off of using another language (eg, PHP) to maintain it. PostgreSQL is incredibly fast if done right – it's just getting it right which takes work, but it's worth it!

Top comments (0)