Simon Horlick

Posted on Oct 2

Ten postgres tricks that'll make your colleagues 😯

#postgres #database

Here are some postgres features I use all the time.

1 - Comment tables and columns

Everyone loves databases that are well commented. If you can easily see the details of the data that's being stored it makes working with it a breeze. Most database tools and downstream consumers will show these comments.

COMMENT ON TABLE public.country_codes IS 'Mapping table for country names and ISO 3166-1 codes.';
COMMENT ON COLUMN public.country_codes.country IS 'Human-readable name of the country.';
COMMENT ON COLUMN public.country_codes.region IS 'Geographic region the country belongs to.';
COMMENT ON COLUMN public.country_codes.iso2 IS 'The ISO 3166-1 alpha-2 code for the country.';
COMMENT ON COLUMN public.country_codes.iana IS 'The IANA assigned top-level domain for the country.';

2 - Automatic timestamps

Have your applications ignore created_at and updated_at fields and hand these over to the database.

ALTER TABLE public.foo ADD COLUMN "created_at" TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW();
ALTER TABLE public.foo ADD COLUMN "updated_at" TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW();

CREATE OR REPLACE FUNCTION trigger_set_updated_at_timestamp()
  RETURNS TRIGGER AS $$
    BEGIN
      NEW.updated_at = NOW();
    RETURN NEW;
    END;
  $$ LANGUAGE plpgsql;

CREATE TRIGGER set_updated_at_timestamp BEFORE UPDATE ON public.foo FOR EACH ROW EXECUTE PROCEDURE trigger_set_updated_at_timestamp();

3 - Audit tables

Want to track changes to a table with minimal effort? Simply create an audit table and have postgres store the changes:

CREATE TABLE X_audit(
    id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    operation CHAR(1) NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
    who TEXT DEFAULT current_user,
    val JSONB NOT NULL
);

CREATE OR REPLACE FUNCTION process_X_audit() RETURNS TRIGGER AS $$
    BEGIN
        --
        -- Create a row in X_audit to reflect the operation performed on X,
        -- making use of the special variable TG_OP to work out the operation.
        --
        IF (TG_OP = 'DELETE') THEN
            INSERT INTO X_audit (operation, val) SELECT 'D', to_jsonb(OLD);
        ELSIF (TG_OP = 'UPDATE') THEN
            INSERT INTO X_audit (operation, val) SELECT 'U', to_jsonb(NEW);
        ELSIF (TG_OP = 'INSERT') THEN
            INSERT INTO X_audit (operation, val) SELECT 'I', to_jsonb(NEW);
        END IF;
        RETURN NULL; -- result is ignored since this is an AFTER trigger
    END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER X_audit_insert
    AFTER INSERT OR DELETE ON X
    FOR EACH ROW
    EXECUTE FUNCTION process_X_audit();

-- Create a separate trigger for updates that checks for changes.
CREATE TRIGGER X_audit_update
    AFTER UPDATE ON X
    FOR EACH ROW
    WHEN (OLD.* IS DISTINCT FROM NEW.*)
    EXECUTE FUNCTION process_X_audit();

4 - Indexable timestamps

Sometimes you need an index on a timestamp, in general this isn't safe as it's locale-dependent. But if we do it in a locale-independent way then we can mark the function as immutable so it is available for indexing.

-- This query *is* immutable, however to_timestamp isn't immutable by default.
-- Re-package it into a function that's marked as immutable so we can use it in
-- indexes.
CREATE OR REPLACE FUNCTION to_timestamp_json(text)
  RETURNS timestamptz AS
$$SELECT to_timestamp($1, 'YYYY-MM-DD"T"HH24:MI:SSTZH:TZM')$$
  LANGUAGE sql IMMUTABLE;

-- Add an index to speed up timestamp queries.
CREATE INDEX idx_payload_date_time ON public.foo (to_timestamp_json(payload->>'date_time'));

5 - Upserts

Postgres natively supports upserts, i.e. insert if new, update if exists.

INSERT INTO books
    ("isbn", "title")
VALUES
    ('9781420953503', 'War and Peace')
ON CONFLICT (isbn) DO UPDATE SET
    title = EXCLUDED.title;

6 - Queues

Believe it or not, you can write a fully functional MPMC queue system only using postgres features.

CREATE TYPE job_status AS ENUM ('pending', 'in_progress', 'done', 'failed');

CREATE TABLE jobs (
    id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    status job_status NOT NULL DEFAULT 'pending',
    payload JSONB,
    visible_at TIMESTAMP WITH TIME ZONE DEFAULT now(),
    retry_count INT DEFAULT 0,
    created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);

Producers simply insert jobs into the queue:

INSERT INTO jobs (payload) VALUES ('{"task_type": "email", "recipient": "user@example.com"}');

Consumers fetch and lock jobs for processing (running the following steps inside a transaction):

UPDATE jobs
SET status = 'in_progress',
    visible_at = now() + INTERVAL '5 minutes', -- Set a visibility timeout
    updated_at = now()
WHERE id IN (
    SELECT id
    FROM jobs
    WHERE status = 'pending' AND visible_at <= now()
    ORDER BY created_at ASC
    FOR UPDATE SKIP LOCKED
    LIMIT 1
)
RETURNING id, payload;

Process the job based on the payload. After processing, update the job status.

UPDATE jobs
SET status = 'done',
    updated_at = now()
WHERE id = <job_id>;

The FOR UPDATE SKIP LOCKED is the magic phrase that ensures jobs are only visible by one consumer at a time.

7 - CHECK constraints

Sometimes database tables are edited directly by humans for a variety of reasons. Perhaps some bad data got in, or nobody got around to building that UI. Either way we can increase the safety of these edits by restricting known bad values. In the example below ISBN is a required text field, but we also check the user doesn't accidentally leave it blank (which would otherwise be allowed).

CREATE TABLE books (
    "isbn" VARCHAR(13) PRIMARY KEY,
    CONSTRAINT isbn_not_empty_ck CHECK (isbn <> ''),

    -- ...
);

Of course, this isn't a replacement for actual validation, especially as check constraints are harder to test and produce worse error messages (but who knows, maybe these can be improved in the future).

8 - Execution plans

Postgres provides great tools to analyse query execution.

book_store=# EXPLAIN (ANALYZE,BUFFERS) select * from books b inner join authors a on b.author_id = a.id;
                                                  QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=1.18..20.88 rows=8 width=220) (actual time=0.670..0.679 rows=8 loops=1)
   Hash Cond: (a.id = b.author_id)
   Buffers: shared hit=1 read=1
   ->  Seq Scan on authors a  (cost=0.00..17.00 rows=700 width=88) (actual time=0.281..0.283 rows=7 loops=1)
         Buffers: shared read=1
   ->  Hash  (cost=1.08..1.08 rows=8 width=132) (actual time=0.362..0.362 rows=8 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         Buffers: shared hit=1
         ->  Seq Scan on books b  (cost=0.00..1.08 rows=8 width=132) (actual time=0.014..0.015 rows=8 loops=1)
               Buffers: shared hit=1
 Planning:
   Buffers: shared hit=105 read=19
 Planning Time: 9.751 ms
 Execution Time: 0.788 ms
(14 rows)

You can paste the output into https://www.pgexplain.dev to get a useful graphical visualisation.

9 - JSONB

Sometimes you just need to dump a bunch of json into a table. We don't always
have control over what other people send us, and sometimes it's the right tool
for the job. That said, postgres really does have great support for json:

CREATE TABLE events (
    id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    payload JSONB NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);

Querying is dead simple:

SELECT payload->>'date' AS event_date,
       payload->'details'->>'user_id' AS user_id
FROM events
WHERE payload->>'event_type' = 'user_signup';

You can even index json fields to make them fast.

10 - Lateral joins

Suppose you have two tables:

authors

id	name
1	Jane Austen
2	Mark Twain

books

id	title	author_id
1	Pride & Prejudice	1
2	Emma	1
3	Tom Sawyer	2

You want to get each author’s name and their latest book (by id).

SELECT
  a.name,
  b.title AS latest_book_title
FROM
  authors a
  LEFT OUTER JOIN LATERAL (
    SELECT title
    FROM books
    WHERE books.author_id = a.id
    ORDER BY books.id DESC
    LIMIT 1
  ) b ON TRUE;

The LEFT OUTER JOIN LATERAL (...) b ON TRUE runs the subquery to find that author’s latest book. The LATERAL keyword lets the subquery reference columns (a.id) from the outer query which is useful when you want to fetch one or more related rows per row of the main table.

result

name	latest_book_title
Jane Austen	Emma
Mark Twain	Tom Sawyer

DEV Community