Here are some postgres features I use all the time.
1 - Comment tables and columns
Everyone loves databases that are well commented. If you can easily see the details of the data that's being stored it makes working with it a breeze. Most database tools and downstream consumers will show these comments.
COMMENT ON TABLE public.country_codes IS 'Mapping table for country names and ISO 3166-1 codes.';
COMMENT ON COLUMN public.country_codes.country IS 'Human-readable name of the country.';
COMMENT ON COLUMN public.country_codes.region IS 'Geographic region the country belongs to.';
COMMENT ON COLUMN public.country_codes.iso2 IS 'The ISO 3166-1 alpha-2 code for the country.';
COMMENT ON COLUMN public.country_codes.iana IS 'The IANA assigned top-level domain for the country.';
2 - Automatic timestamps
Have your applications ignore created_at and updated_at fields and hand these over to the database.
ALTER TABLE public.foo ADD COLUMN "created_at" TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW();
ALTER TABLE public.foo ADD COLUMN "updated_at" TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW();
CREATE OR REPLACE FUNCTION trigger_set_updated_at_timestamp()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = NOW();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER set_updated_at_timestamp BEFORE UPDATE ON public.foo FOR EACH ROW EXECUTE PROCEDURE trigger_set_updated_at_timestamp();
3 - Audit tables
Want to track changes to a table with minimal effort? Simply create an audit table and have postgres store the changes:
CREATE TABLE X_audit(
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
operation CHAR(1) NOT NULL,
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
who TEXT DEFAULT current_user,
val JSONB NOT NULL
);
CREATE OR REPLACE FUNCTION process_X_audit() RETURNS TRIGGER AS $$
BEGIN
--
-- Create a row in X_audit to reflect the operation performed on X,
-- making use of the special variable TG_OP to work out the operation.
--
IF (TG_OP = 'DELETE') THEN
INSERT INTO X_audit (operation, val) SELECT 'D', to_jsonb(OLD);
ELSIF (TG_OP = 'UPDATE') THEN
INSERT INTO X_audit (operation, val) SELECT 'U', to_jsonb(NEW);
ELSIF (TG_OP = 'INSERT') THEN
INSERT INTO X_audit (operation, val) SELECT 'I', to_jsonb(NEW);
END IF;
RETURN NULL; -- result is ignored since this is an AFTER trigger
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER X_audit_insert
AFTER INSERT OR DELETE ON X
FOR EACH ROW
EXECUTE FUNCTION process_X_audit();
-- Create a separate trigger for updates that checks for changes.
CREATE TRIGGER X_audit_update
AFTER UPDATE ON X
FOR EACH ROW
WHEN (OLD.* IS DISTINCT FROM NEW.*)
EXECUTE FUNCTION process_X_audit();
4 - Indexable timestamps
Sometimes you need an index on a timestamp, in general this isn't safe as it's locale-dependent. But if we do it in a locale-independent way then we can mark the function as immutable so it is available for indexing.
-- This query *is* immutable, however to_timestamp isn't immutable by default.
-- Re-package it into a function that's marked as immutable so we can use it in
-- indexes.
CREATE OR REPLACE FUNCTION to_timestamp_json(text)
RETURNS timestamptz AS
$$SELECT to_timestamp($1, 'YYYY-MM-DD"T"HH24:MI:SSTZH:TZM')$$
LANGUAGE sql IMMUTABLE;
-- Add an index to speed up timestamp queries.
CREATE INDEX idx_payload_date_time ON public.foo (to_timestamp_json(payload->>'date_time'));
5 - Upserts
Postgres natively supports upserts, i.e. insert if new, update if exists.
INSERT INTO books
("isbn", "title")
VALUES
('9781420953503', 'War and Peace')
ON CONFLICT (isbn) DO UPDATE SET
title = EXCLUDED.title;
6 - Queues
Believe it or not, you can write a fully functional MPMC queue system only using postgres features.
CREATE TYPE job_status AS ENUM ('pending', 'in_progress', 'done', 'failed');
CREATE TABLE jobs (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
status job_status NOT NULL DEFAULT 'pending',
payload JSONB,
visible_at TIMESTAMP WITH TIME ZONE DEFAULT now(),
retry_count INT DEFAULT 0,
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);
Producers simply insert jobs into the queue:
INSERT INTO jobs (payload) VALUES ('{"task_type": "email", "recipient": "user@example.com"}');
Consumers fetch and lock jobs for processing (running the following steps inside a transaction):
UPDATE jobs
SET status = 'in_progress',
visible_at = now() + INTERVAL '5 minutes', -- Set a visibility timeout
updated_at = now()
WHERE id IN (
SELECT id
FROM jobs
WHERE status = 'pending' AND visible_at <= now()
ORDER BY created_at ASC
FOR UPDATE SKIP LOCKED
LIMIT 1
)
RETURNING id, payload;
Process the job based on the payload. After processing, update the job status.
UPDATE jobs
SET status = 'done',
updated_at = now()
WHERE id = <job_id>;
The FOR UPDATE SKIP LOCKED
is the magic phrase that ensures jobs are only visible by one consumer at a time.
7 - CHECK constraints
Sometimes database tables are edited directly by humans for a variety of reasons. Perhaps some bad data got in, or nobody got around to building that UI. Either way we can increase the safety of these edits by restricting known bad values. In the example below ISBN is a required text field, but we also check the user doesn't accidentally leave it blank (which would otherwise be allowed).
CREATE TABLE books (
"isbn" VARCHAR(13) PRIMARY KEY,
CONSTRAINT isbn_not_empty_ck CHECK (isbn <> ''),
-- ...
);
Of course, this isn't a replacement for actual validation, especially as check constraints are harder to test and produce worse error messages (but who knows, maybe these can be improved in the future).
8 - Execution plans
Postgres provides great tools to analyse query execution.
book_store=# EXPLAIN (ANALYZE,BUFFERS) select * from books b inner join authors a on b.author_id = a.id;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Hash Join (cost=1.18..20.88 rows=8 width=220) (actual time=0.670..0.679 rows=8 loops=1)
Hash Cond: (a.id = b.author_id)
Buffers: shared hit=1 read=1
-> Seq Scan on authors a (cost=0.00..17.00 rows=700 width=88) (actual time=0.281..0.283 rows=7 loops=1)
Buffers: shared read=1
-> Hash (cost=1.08..1.08 rows=8 width=132) (actual time=0.362..0.362 rows=8 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=1
-> Seq Scan on books b (cost=0.00..1.08 rows=8 width=132) (actual time=0.014..0.015 rows=8 loops=1)
Buffers: shared hit=1
Planning:
Buffers: shared hit=105 read=19
Planning Time: 9.751 ms
Execution Time: 0.788 ms
(14 rows)
You can paste the output into https://www.pgexplain.dev to get a useful graphical visualisation.
9 - JSONB
Sometimes you just need to dump a bunch of json into a table. We don't always
have control over what other people send us, and sometimes it's the right tool
for the job. That said, postgres really does have great support for json:
CREATE TABLE events (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
payload JSONB NOT NULL,
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);
Querying is dead simple:
SELECT payload->>'date' AS event_date,
payload->'details'->>'user_id' AS user_id
FROM events
WHERE payload->>'event_type' = 'user_signup';
You can even index json fields to make them fast.
10 - Lateral joins
Suppose you have two tables:
authors
id | name |
---|---|
1 | Jane Austen |
2 | Mark Twain |
books
id | title | author_id |
---|---|---|
1 | Pride & Prejudice | 1 |
2 | Emma | 1 |
3 | Tom Sawyer | 2 |
You want to get each author’s name and their latest book (by id).
SELECT
a.name,
b.title AS latest_book_title
FROM
authors a
LEFT OUTER JOIN LATERAL (
SELECT title
FROM books
WHERE books.author_id = a.id
ORDER BY books.id DESC
LIMIT 1
) b ON TRUE;
The LEFT OUTER JOIN LATERAL (...) b ON TRUE
runs the subquery to find that author’s latest book. The LATERAL
keyword lets the subquery reference columns (a.id
) from the outer query which is useful when you want to fetch one or more related rows per row of the main table.
result
name | latest_book_title |
---|---|
Jane Austen | Emma |
Mark Twain | Tom Sawyer |
Top comments (0)