Arctype Team for Arctype

Posted on Mar 11, 2022 • Originally published at arctype.com

Hooks: The secret feature powering the Postgres ecosystem

#programming #postgres #tutorial #productivity

What do developers mean when they say Postgres is "extensible"? They're referring to low-level APIs which can change the core functionality of the database. In this post, we will explore a secret - meaning undocumented - feature called hooks which allow developers to not only add features to Postgres, but modify the way that queries are executed and data is parsed.

Hooks are used by some of the most popular projects in the Postgres ecosystem, including Timescale, pg_stat_statements, and Supabase. To give you ideas of what can be built, we will see how each of these projects uses specific hooks. At the end of this post, we provide a Makefile, some C code, and compilation instructions to get started customizing Postgres on your own.

What are hooks?

Hooks are event-based functions present in Postgres. We can create a hook based on an event/trigger, and the database will call it. Hooks are used extensively by many libraries and projects but they all share the same basic structure.

Structure

A Postgres hook begins by including the Postgres headers and providing some standard wrappers that the database engine knows about.

include "postgres.h"

// Hooks Code
static void custom_hook()

// All hooks are called from this function
void _PG_init(void);

// Hooks logic end in this function
void _PG_fini(void);

Initialization

Hooks need to be placed in the contrib Postgres source code path folder. You can check the source path present here for extensions already shipped with Postgres.

Once the hook code is in place, you can build it from the source or ship the attachment separately. Making the hook as an extension is beyond the scope of this blog article, but it is how all 3 projects covered here use them. Once the plugin is in place, you can then load the plugin using LOAD 'plugin_name' to test it.

Types of hooks

There are roughly 30 hooks across 6 categories that can be called in the C programming language. The psql-hooks project on Github is the unofficial documentation of all of the functionality.

General Hooks - Hooks that work with general Postgres functionality.
Security Hooks - Hooks with specific security functions such as password and user creation.
Function Hooks - Work during function execution.
Planner Hooks - Interception during the planner phase. For example, Timescale uses the planner_hook to change the query execution plan.
Executor Hooks - Interception during the execution phase. For example, pg_stat_statements uses the ExecutorRun_hook to inspect running queries and Supabase uses the ProcessUtility_hook to implement role based security.
PL/PG SQL Hooks - Handy when working with PL/PGSql functions.

Most documentation is better with examples, so let's review how Timescale, pg_stat_statements, and Supabase use hooks. For extra fun, read the linked C files below for sage advice from their developers on how to use these low-level APIs.

Hooks in Timescale

TimescaleDB extends Postgres "by adding hooks deep into PostgreSQL's query planner, data and storage model, and execution engine". This gives us a hint as to what parts of the database architecture they are plugging into. In particular, Timescale deals with time-series data in chunks, bits of memory where a continuous set of dates, times, events, or other time-series data is stored. A second core feature is continuous aggregates, where chunks are updated as new data comes in. Finally, Timescale has some very interesting scheduling features, again directly implemented in Postgres using hooks. Now that we know what types of hooks exist, we can examine Timescale's open-source codebase for how they are used to create the features above.

Query planning and the execution engine

The most important hook for Timescale is probably the planner_hook which deals with the query plan that Postgres produces when a user or application sends it a SQL statement. In src/planner.c we see how they use the function timescaledb_planner to modify the typical query plan to include chunks and ensure that HyperTables are queried correctly. Once the planner has completed Timescale does further transformations on the query using the post_parse_analyze_hook that they initialized at the very start of Postgres operation in src/loader/loader.c.

Likewise, in src/process_utility.c, the ProcessUtility hook is used to modify the execution engine of Postgres once the query plan has been generated. This file, over 4,000 lines of C, ensures that statements like ALTER TABLE get executed correctly and Timescale's unique compression features are utilized for maximum effect.

Housekeeping

There are two other interesting hooks outside of planning and execution that help Timescale ensure smooth updates and test their extension. shmem_startup_hook fires at initialization and is used to ensure that workers have the right version of the extension during updates, that scheduling happens correctly, and other tasks. And all of this functionality couldn't go without testing, so the emit_log_hook is used in src/bgw/log.c to inject logs in the automated test harness.

Hooks in pg_stat_statements

Citus Data called pg_stat_statements the most useful postgres extension. It is one of the best tools for Postgres performance monitoring and can be turned on easily. pg_stat_statements makes heavy use of the ExecutorStart_hook, the ExecutorRun_hook, the ExecutorFinish_hook, and the ExecutorEnd_hook because it is interested in information about the runtime of the query. Here is an example from pg_stat_statements.c where we track the total elapsed time of a query.

/*
 * ExecutorStart hook: start up tracking if needed
 */
static void
pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
    if (prev_ExecutorStart)
        prev_ExecutorStart(queryDesc, eflags);
    else
        standard_ExecutorStart(queryDesc, eflags);

    /*
     * If query has queryId zero, don't track it.  This prevents double
     * counting of optimizable statements that are directly contained in
     * utility statements.
     */
    if (pgss_enabled(exec_nested_level) && queryDesc->plannedstmt->queryId != UINT64CONST(0))
    {
        /*
         * Set up to track total elapsed time in ExecutorRun.  Make sure the
         * space is allocated in the per-query context so it will go away at
         * ExecutorEnd.
         */
        if (queryDesc->totaltime == NULL)
        {
            MemoryContext oldcxt;

            oldcxt = MemoryContextSwitchTo(queryDesc->estate->es_query_cxt);
            queryDesc->totaltime = InstrAlloc(1, INSTRUMENT_ALL, false);
            MemoryContextSwitchTo(oldcxt);
        }
    }
}

Hooks in Supabase

We could cover in detail how Supabase - the open source Firebase implemented in Postgres - uses hooks but they have done so themselves in an excellent blog post on protecting reserved roles in Postgres. Suffice to say they use only one hook, the ProcessUtility_hook to hijack (in a good way) a global pointer that hits right before every SQL statement is run.

Supabase uses ProcessUtility to jump into supautils.c , a library which checks if the role that is executing the statement is allowed to do so. This helps them match the web-based authentication on supabase.com to the database role inside Postgres and provide a very smooth authentication experience. Beyond reading the pg_stat_statements C code, their blog post and supautils.c is the best reference for implementing a moderately complex hook.

A fully-working example

Let's look at a fully-working example found in the Github repository here. Note: The license for the code below is available here.

//Initialise hook variable
static ClientAuthentication_hook_type original_client_auth_hook = NULL;

//Hook method
static void auth_delay_checks(Port *port, int status)
{

    if (original_client_auth_hook)
        original_client_auth_hook(port, status);


    if (status != STATUS_OK)
    {
        pg_usleep(1000000L);
    }
}

//The custom hook method is called in the global init
void _PG_init(void)
{
    original_client_auth_hook = ClientAuthentication_hook;
    ClientAuthentication_hook = auth_delay_checks;
}

//Unlink once the work is done
void _PG_fini(void)
{
    ClientAuthentication_hook = original_client_auth_hook;
}

The above piece of code is an elementary demonstration of hooks. When authentication is wrong, it introduces a sleep of 1 second before throwing an error. This code can be used as a deterrent for brute-forcing the password (this is just an example and not a security functionality/strategy that is recommended).

Conclusion

Hooks are a pretty advanced concept. A typical database user will never use it in their everyday work, but it's fun to understand how they work because you will never know when they might come in handy. In addition, some libraries already use them since they need to hook in for various metrics.

A rough strategy that you can use to see if hooks are suitable for your project involves considering the following points:

Is it an event-based use case?
It cannot be done on the application side and only on the database side. Is that ok?
Can we not use CDC (Change Data Capture) tools to solve this?
Do you have context about C programming language and the required knowledge about Postgres internals?
Do you have the ability to package Postgres as a custom installation? Cloud installations and services such as RDS/CloudSQL will not have this ability.

If these questions have been answered, hooks are an excellent use case.

DEV Community

Hooks: The secret feature powering the Postgres ecosystem

What are hooks?

Structure

Initialization

Types of hooks

Hooks in Timescale

Query planning and the execution engine

Housekeeping

Hooks in pg_stat_statements

Hooks in Supabase

A fully-working example

Other event-based architectures for Postgres

Conclusion

Top comments (0)

Read next

My React Journey: Day 15

Conditional Statements and Loops in JavaScript

2779. Maximum Beauty of an Array After Applying Operation

Python 🐍 and variable types