DEV Community

Sergey Solovev
Sergey Solovev

Posted on

Create and debug PostgreSQL extension using VS Code

In this tutorial we will create PostgreSQL extension ban_sus_query. It will check that DML queries contain predicates, otherwise will just throw an error.

Next, in order not to mislead up, I will use term contrib for PostgreSQL extension, and for extension for PostgreSQL Hacker Helper VS Code extension.

This tutorial is created not only for newbies in PostgreSQL development, but also as a tutorial for VS Code extension PostgreSQL Hacker Helper. Documentation for it you can find here.

Preparation

First things first, you must setup development environment. This involves 2 things.

PostgreSQL setup

To create extension you must build PostgreSQL. Even if you will not run any query and just trust me - you will not be able to build an extension, because compilation depends on several files which are created only during source code compilation.

Do not worry about it. On documentation site there are 2 related pages:

Pages contain not only examples, but also tips.

VS Code setup

VS Code - is the IDE we are using, so we should setup it properly for convenient development.

VS Code setup page contains lists of necessary extensions and configuration files.

Also, it has a very handy tasks.json file which is bundled with lots of predefined tasks for various use cases of PostgreSQL development. They use scripts from previous section and thus you can use them to turn VS Code in a PostgreSQL IDE. i.e. you can have PostgreSQL up and running from the ground using 2 tasks: Bootstrap and Run DB!

Creating initial files

PostgreSQL has infrastructure for contrib building and installation. In short, contribs have a template architecture - most parts are common for all.

So, for faster contrib creation we will use command: PGHH: Bootstrap extension.

Bootstrap extension command

It will prompt us to bootstrap some files - choose only C sources.

After that we will have our contrib files created:

README.md with directory contents

Initial code

Query execution pipeline has 3 stages:

  1. Parse/Semantic analysis - query string parsing and resolving tables
  2. Plan - query optimization and creating execution plan
  3. Execution - actual query execution

Our logic will be added to the 2 stage, because we must check real execution plan, not Query.
This is because after multiple transformations query can be changed in multiple ways - predicates can be deleted or added, therefore we may get a completely different query than in the original query string.

To implement that we will create hook on planner - planner_hook. Inside we will invoke actual planner and check it's output for the existence of predicates.

Starter code is the following:

#include "postgres.h"

#include "fmgr.h"
#include "optimizer/planner.h"

#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif

static planner_hook_type prev_planner_hook;

void _PG_init(void);
void _PG_fini(void);

static bool
is_sus_query(Plan *plan)
{
    /* ... */
    return false;
}

static PlannedStmt *
ban_sus_query_planner_hook(Query *parse,
                           const char *query_string,
                           int cursorOptions,
                           ParamListInfo boundParams)
{
    PlannedStmt *stmt;

    if (prev_planner_hook)
        stmt = prev_planner_hook(parse, query_string, cursorOptions, boundParams);
    else
        stmt = standard_planner(parse, query_string, cursorOptions, boundParams);

    if (is_sus_query(stmt->planTree))
        ereport(ERROR,
                (errmsg("DML query does not contain predicates")));

    return stmt;
}
void
_PG_init(void)
{
    prev_planner_hook = planner_hook;
    planner_hook = ban_sus_query_planner_hook;
}

void
_PG_fini(void)
{
    planner_hook = prev_planner_hook;
}
Enter fullscreen mode Exit fullscreen mode

Now we are ready to add "business-logic", but before let's understand how such suspicious queries look like.

Examine queries

Suspicious query - is a DELETE/UPDATE query that does not contain predicates.

One of the benefits that we are checking already planned statements is that all predicates are already optimized in a sense that boolean rules are applied.

Query plan - is a tree of Plan nodes. Each Plan contains lefttree/righttree - left and right children and qual - list of predicates to apply at this node. But we must check only UPDATE/DELETE nodes, not each node, - nodes for them is ModifyTable.

Thus our goal is:

traverse query tree, find ModifyTable and check that it's qual is not empty

But, before run sample queries to look what their queries looks like (inside) and which predicates they have.

For tests we will use this setup:

-- Schema
CREATE TABLE tbl(x int);

-- Test queries
DELETE FROM tbl;
DELETE FROM tbl WHERE x = 0;

UPDATE tbl SET x = 1;
UPDATE tbl SET x = 1 WHERE x = 0;
Enter fullscreen mode Exit fullscreen mode

To do this we will use our contrib - install it using make install, add to shared_preload_libraries='ban_sus_query' and put a breakpoint to return in ban_sus_query_planner_hook function.

PostgreSQL has multiprocess architecture, so you must specify PID of process (backend) to which you want to attach. You can do this in 2 ways:

  1. Request PID of backend directly using SELECT pg_backend_pid(); query. It will return PID of backend.
  2. Search for required backend by typing postgres in quick input (when using ${command:pickProcess}).

But in any way, you must setup launch.json file. In documentation for the extension I have included sample configuration file (link), that you can use in most projects.

After debugger have attached and you run first DELETE query without predicate, we will see the following:

  1. PlannedStmt contains top-level ModifyTable with empty qual list

    qual is empty in ModifyTable node

  2. Inner SeqScan also contains empty qual list

    qual is empty in SeqScan node

Now run DELETE query with predicate:

  1. PlannedStmt still contains empty qual list

    qual is still empty in ModifyTable node

  2. Inner SeqScan now contains qual with single element - equality predicate

    SeqScan has single equality predicate

This is no surprise, because our ModifyTable does not apply any filtering - it just takes tuples from children (note, that by convention single-child nodes store them in lefttree), so it's qual is empty, but filtering is applied to SeqScan - we must check this.

As you can mention, extension shows all Node variables with actual types, without showing generic Plan entry.
Also extension is able to show you elements of container types (List * in this example).
More than that, it renders Expr nodes (expressions) as it was in a query, so you do not have to manually check each field, trying to figure out what expression it is.

In vanilla PostgreSQL you would have to evaluate 2 expressions: (first) get NodeTag and (second) cast variable to obtained NodeTag.
In this example, to show stmt->planTree all you need to do is expand the tree node in variables explorer, but manually (without extension), you need to evaluate (i.e. in watch) 2 expressions/steps:

  1. ((Node *)planTree)->type get T_ModifyTable - tag of ModifyTable node, and then
  2. (ModifyTable *)planTree - show variable with real type.

Evaluate variable in watch

Such manipulations take roughly 5 second, but, as this time accumulates, totally it can take up to 1 hour in a day - just to show variable's contents!

But there is not such support for Expr variables - you will not see their representation.
For this you have to dump variable to log using pprint function, which is not very convenient when you developing in IDE.

Now we are ready to write some code.

is_sus_query implementation

I repeat, our goal is to traverse query tree, find ModifyTable and check that it's qual is not empty, but now we can refine it:

Search for ModifyTable in Plan tree and check that it's children have non-empty qual list

As tree traversal is a recursive function, we will use 2 recursive functions:

  • is_sus_query - main function that traverses plan tree to find ModifyTable node, and when it finds one invokes...
  • contains_predicates - function that checks that this Plan node contains any predicate in a query

Let's start with is_sus_query. All we have to do here is to check that Plan is a ModifyTable and if so, then check that it's children contain predicates.

Node type checking is a frequent operation, so extension ships with some snippets - one of them is a isaif, which expands to if(IsA()) check:

 raw `isaif` endraw  expansion

When we have determined, that it is a DML operation check that it is DELETE or UPDATE, because ModifyTable is used for other operations, i.e. INSERT. This is not hard - just check operation member.

static bool
is_sus_query(Plan *plan)
{
    /* ... */
    ModifyTable *modify = (ModifyTable *)plan;
    switch (modify->operation)
    {
        case CMD_UPDATE:
        case CMD_DELETE:
            /* Check predicates */
            break;
        default:
            break;
    }
    /* ... */
}

Enter fullscreen mode Exit fullscreen mode

And now check these operations contain predicates using contains_predicates function (will be defined further).

Also, do not forget to handle recursion: call is_sus_query for children and handle end case (NULL).

The result function looks like this:

static bool
is_sus_query(Plan *plan)
{
    /* Recursion end */
    if (plan == NULL)
        return false;

    if (IsA(plan, ModifyTable))
    {
        ModifyTable *modify = (ModifyTable *)plan;
        switch (modify->operation)
        {
            case CMD_UPDATE:
            case CMD_DELETE:
                return !contains_predicates(modify->plan.lefttree);
            default:
                break;
        }
    }

    /* Handle recursion */
    return is_sus_query(plan->lefttree) || is_sus_query(plan->righttree);
}
Enter fullscreen mode Exit fullscreen mode

contains_predicates implementation

Now perform actual checking of the predicates existence using contains_predicates. Inside this function we must check that given Plan contains predicates.

But situation is complicated by the fact that only base Plan is given and we do not know actual query. For example this query:

DELETE FROM t1 using t2 where t1.x = t2.x;
Enter fullscreen mode Exit fullscreen mode

Will contain JOIN in lefttree of ModifyTable:

Merge Join as child of ModifyTable

Thus we have to clarify what does contains_predicates must check. In order not to complicate things a lot, we will just find first node with any predicate.

static bool
contains_predicates(Plan *plan)
{
    if (plan == NULL)
        return false;

    if (plan->qual != NIL)
        return true;

    return contains_predicates(plan->lefttree) || contains_predicates(plan->righttree);
}
Enter fullscreen mode Exit fullscreen mode

Testing

First things first - test on example queries we defined above:

postgres=# delete from tbl;
ERROR:  DML query does not contain predicates
postgres=# delete from t1 where x = 0;
DELETE 0

postgres=# update tbl set x = 0;
ERROR:  DML query does not contain predicates
postgres=# update tbl set x = 0 where x = 0;
UPDATE 0
Enter fullscreen mode Exit fullscreen mode

It's working as expected.

Also, as we injected our contrib as the last step, we can handle more complicated cases, like:

postgres=# delete from t1 where true;
ERROR:  DML query does not contain predicates
Enter fullscreen mode Exit fullscreen mode

Further improvements

This is just the beginning of the contrib, because there are lot's of corner cases that are not handled.

For example, if we change true to false in last query, then we still will get an ERROR.
That is because the database has realized that subquery will not return anything, so replaced with "dummy" Plan - Result node with FALSE one-time check, so nothing will be returned:

dummy Result Node

Conclusion

So far we have seen how you can quickly create new contrib using single command that will create all necessary files.

To write some templated code, we used isaif snippet to quickly add check for Node type.

Also, we have traversed query plan tree and saw it's nodes, without requirement to obtain NodeTag and cast to given type, which incredibly boosts productivity.

And like the icing on the cake we saw expression representations of predicates. For our purposes this is not a very big deal, because query contained only 1 predicate, but in large queries with dozens of different predicates it's just a lifesaver.

But actually, this is the small part of the extension's features. As stated before, it knows about PostgreSQL variables semantics and actively uses it. For example, display elements of hash-tables, render some builtin scalar types in more convenient way (i.e. XLogRecPtr shown in FILE/OFFSET form, not integer), bitmask support, etc...

Tip & tricks

As you may have noticed, the extension greatly facilitates development and debugging process. But extension is just an extension, an automation tool - there are lots of other aspects that can be optimized. You can refer to documentation site where you can find useful information with tips and tricks for development:

And, of course, main links:

Top comments (0)