DEV Community

Peter Radilov
Peter Radilov

Posted on

PostgreSQL: Transactions, Locks, and Why Serializable Fails

If you've ever had queries that suddenly hang, transactions that behave unexpectedly, or an UPDATE from two sessions that produces strange results — this article will help you understand what's happening inside PostgreSQL. We'll cover isolation levels, locks, VACUUM, and demonstrate everything with live SQL examples.

A few years ago I gave internal talks on PostgreSQL for my team. Then I spent a long time working with other technologies, and when I recently came back to PostgreSQL, the first thing I did was revisit my own notes to refresh the fundamentals. I figured if it was useful to me, it might be useful to others too. The core concepts of transactions and locks in PostgreSQL haven't changed, but a few things have gotten better over the past 5 years — more on that at the end.

now() vs clock_timestamp()

PostgreSQL has two functions for getting the current time: now() and clock_timestamp().

SELECT now(), now(), clock_timestamp(), clock_timestamp();
Enter fullscreen mode Exit fullscreen mode

Both calls to now() will return the same time. But clock_timestamp() may differ — even within a single query.

To see the difference, let's open a transaction:

BEGIN;
SELECT now();
-- wait a moment...
SELECT now();
SELECT clock_timestamp();
COMMIT;
Enter fullscreen mode Exit fullscreen mode

now() always returns the time the transaction started. This is useful when you need to set identical created_at and updated_at timestamps within a single operation — now() guarantees the same value throughout the entire transaction.

clock_timestamp() returns the actual wall-clock time, independent of the transaction.

Isolation Levels

The first thing that sets PostgreSQL apart from many other databases: only three isolation levels.

  • Read Uncommitted / Read Committed — in PostgreSQL these are the same thing. Dirty reads simply don't exist.
  • Repeatable Read
  • Serializable

Another nice feature: PostgreSQL supports DDL inside transactions. You can create a table, insert data, and roll it all back within a single transaction — not every database can do this.

Locks: Two UPDATEs on the Same Row

Let's create a table:

BEGIN;
CREATE TABLE t_test1 (id int);
INSERT INTO t_test1 VALUES (1);
INSERT INTO t_test1 VALUES (2);
COMMIT;
Enter fullscreen mode Exit fullscreen mode

Now let's update the same data from two sessions simultaneously:

Session 1:

BEGIN ISOLATION LEVEL READ COMMITTED;
UPDATE t_test1 SET id = id + 1 RETURNING *;
-- don't commit yet
Enter fullscreen mode Exit fullscreen mode

Session 2:

BEGIN ISOLATION LEVEL READ COMMITTED;
UPDATE t_test1 SET id = id + 1 RETURNING *;
-- hangs!
Enter fullscreen mode Exit fullscreen mode

The second UPDATE waits for the first transaction to complete. Even though PostgreSQL uses MVCC and is positioned as a non-blocking database — write locks do exist.

Lock Types

# Lock When Acquired Conflicts With
1 ACCESS SHARE SELECT ACCESS EXCLUSIVE
2 ROW SHARE SELECT FOR UPDATE/SHARE EXCLUSIVE, ACCESS EXCLUSIVE
3 ROW EXCLUSIVE INSERT, UPDATE, DELETE SHARE and above
4 SHARE UPDATE EXCLUSIVE CREATE INDEX CONCURRENTLY, ANALYZE, VACUUM SHARE UPDATE EXCLUSIVE and above
5 SHARE CREATE INDEX ROW EXCLUSIVE and above
6 SHARE ROW EXCLUSIVE CREATE TRIGGER, some ALTER TABLE ROW SHARE and above
7 EXCLUSIVE Allows only reads ROW SHARE and above
8 ACCESS EXCLUSIVE DROP TABLE, ALTER TABLE, VACUUM FULL All

The higher the number, the stricter the lock. Two operations with conflicting locks cannot work on the same object simultaneously: one will wait.

Worth noting separately: the difference between CREATE INDEX (blocks inserts/updates/deletes) and CREATE INDEX CONCURRENTLY (builds the index slower but doesn't block DML operations).

Read Committed: Phantom and Non-Repeatable Reads

The default level is Read Committed — each statement within a transaction gets a fresh data snapshot.

Let's create a table for our examples:

BEGIN;
CREATE TABLE t_test2 (class int, value int);
INSERT INTO t_test2 VALUES (1, 10);
INSERT INTO t_test2 VALUES (1, 20);
INSERT INTO t_test2 VALUES (2, 100);
INSERT INTO t_test2 VALUES (2, 200);
COMMIT;
Enter fullscreen mode Exit fullscreen mode

Phantom Reads

Session 1:

BEGIN ISOLATION LEVEL READ COMMITTED;
SELECT sum(value) FROM t_test2;
-- 330
Enter fullscreen mode Exit fullscreen mode

Session 2 inserts data and commits:

INSERT INTO t_test2 VALUES (1, 30);
COMMIT;
Enter fullscreen mode Exit fullscreen mode

Session 1:

SELECT sum(value) FROM t_test2;
-- 360! New rows appeared.
COMMIT;
Enter fullscreen mode Exit fullscreen mode

Non-Repeatable Reads

Session 1:

BEGIN ISOLATION LEVEL READ COMMITTED;
SELECT value FROM t_test2 WHERE class = 1 AND value = 10;
-- Returns the row with value = 10
Enter fullscreen mode Exit fullscreen mode

Session 2 updates data and commits:

UPDATE t_test2 SET value = 15 WHERE class = 1 AND value = 10;
COMMIT;
Enter fullscreen mode Exit fullscreen mode

Session 1:

SELECT value FROM t_test2 WHERE class = 1 AND value = 10;
-- Empty! The row we just saw has changed.
COMMIT;
Enter fullscreen mode Exit fullscreen mode

Same issue as with phantom reads, but here no new rows appear — existing ones change.

Repeatable Read

To avoid both anomalies, use Repeatable Read:

BEGIN ISOLATION LEVEL REPEATABLE READ;
SELECT sum(value) FROM t_test2;
-- 390
Enter fullscreen mode Exit fullscreen mode

Meanwhile, another session inserts data and commits. We come back:

SELECT sum(value) FROM t_test2;
-- Still 390.
COMMIT;
Enter fullscreen mode Exit fullscreen mode

The data snapshot is fixed at the time of the first query in the transaction and doesn't change.

Serializable: Why It Fails

The strictest level. It simulates sequential transaction execution.

Cross-Inserts

Session 1:

BEGIN ISOLATION LEVEL SERIALIZABLE;
SELECT sum(value) FROM t_test2 WHERE class = 1;
INSERT INTO t_test2 VALUES (2, 30);
COMMIT; -- OK
Enter fullscreen mode Exit fullscreen mode

Session 2:

BEGIN ISOLATION LEVEL SERIALIZABLE;
SELECT sum(value) FROM t_test2 WHERE class = 2;
INSERT INTO t_test2 VALUES (1, 30);
COMMIT; -- ERROR: could not serialize access
Enter fullscreen mode Exit fullscreen mode

Whoever commits first wins. The second one gets a serialization error.

False Conflicts and Predicate Locks

Here's an interesting case. Two transactions work with different classes — there's no logical conflict:

Session 1: reads class=1, inserts into class=1
Session 2: reads class=2, inserts into class=2

You'd think everything should be fine. But on a small table — the second transaction fails.

The reason is predicate locks (SSI, Serializable Snapshot Isolation). PostgreSQL stores them at the page level, not at the individual row level. When the data is small and both classes land on the same page, a false conflict occurs. On a table with a million rows, where the data lives on different pages, everything works fine.

This isn't a bug — it's an implementation detail. The PostgreSQL documentation explicitly states: transactions at the Repeatable Read and Serializable levels must be prepared to retry.

Puzzle: Alice and Bob On Call

An on-call schedule table:

BEGIN;
CREATE TABLE d_test (name text, on_call bool);
INSERT INTO d_test VALUES ('Alice', true);
INSERT INTO d_test VALUES ('Bob', true);
INSERT INTO d_test VALUES ('Carol', false);
COMMIT;
Enter fullscreen mode Exit fullscreen mode

The rule: at least one person must remain on call. Alice and Bob both want to leave at the same time.

Both sessions at Repeatable Read:

BEGIN ISOLATION LEVEL REPEATABLE READ;
SELECT count(*) FROM d_test WHERE on_call = true;
-- 2 (safe to leave)
UPDATE d_test SET on_call = false WHERE name = 'Alice'; -- or 'Bob'
COMMIT;
Enter fullscreen mode Exit fullscreen mode

Both transactions see two on-call people, both commit successfully. Result: nobody on call. A classic race condition.

Solutions

1. Serializable — the second transaction will fail with a serialization error and can be retried.

2. SELECT ... FOR UPDATE — lock the selected rows:

SELECT count(*) FROM d_test WHERE on_call = true FOR UPDATE;
Enter fullscreen mode Exit fullscreen mode

The second session will wait for the first to finish and will see the up-to-date data.

3. Atomic operation — combine the check and the action into a single query:

UPDATE d_test
SET on_call = false
WHERE name = 'Alice'
  AND (SELECT count(*) FROM d_test WHERE on_call = true) > 1;
Enter fullscreen mode Exit fullscreen mode

If only one person is on call at execution time, the UPDATE simply won't affect any rows. No window for a race condition.

Advisory Locks

PostgreSQL lets you lock on an arbitrary number rather than a row or table:

BEGIN;
SELECT pg_advisory_lock(15);
-- do some work
COMMIT;
-- the lock is NOT released!
SELECT pg_advisory_unlock(15);
Enter fullscreen mode Exit fullscreen mode

An important detail: pg_advisory_lock operates at the session level, not the transaction level. The lock persists until an explicit pg_advisory_unlock or until the connection is closed.

If you need a transaction-level lock, use pg_advisory_xact_lock — it's released automatically on COMMIT/ROLLBACK.

VACUUM: The Garbage Collector

With MVCC, PostgreSQL creates new row versions on every UPDATE. Old versions ("dead tuples") need to be cleaned up — that's what VACUUM does.

Experiment

CREATE TABLE t_test (id int) WITH (autovacuum_enabled = off);
INSERT INTO t_test SELECT * FROM generate_series(1, 100000);
SELECT pg_size_pretty(pg_relation_size('t_test'));
-- ~3.5 MB
Enter fullscreen mode Exit fullscreen mode

Let's update all the data:

UPDATE t_test SET id = id + 1;
SELECT pg_size_pretty(pg_relation_size('t_test'));
-- ~7 MB (doubled!)
Enter fullscreen mode Exit fullscreen mode

PostgreSQL marked the old rows as deleted and inserted new ones. The size doubled.

VACUUM t_test;
SELECT pg_size_pretty(pg_relation_size('t_test'));
-- Still ~7 MB!
Enter fullscreen mode Exit fullscreen mode

VACUUM doesn't shrink the table — it marks space as available for reuse. On subsequent INSERT/UPDATE operations, this space will be occupied without growing the file.

When the Size Actually Shrinks

DELETE FROM t_test WHERE id > 50000;
VACUUM t_test;
SELECT pg_size_pretty(pg_relation_size('t_test'));
-- ~3.5 MB — halved!
Enter fullscreen mode Exit fullscreen mode

We deleted the upper half of the data, and the file size actually decreased. The thing is, VACUUM returns disk space only if the freed pages are at the end of the file. We deleted rows with large ids that physically reside at the end — so VACUUM was able to "truncate the tail." If we had deleted rows from the middle of the table, the file size would remain the same, and the space would simply be marked as available for reuse.

What Changed in PostgreSQL Over 5 Years

The core concepts of transactions and locks haven't changed — everything described above still works the same way. But several things have improved:

  • VACUUM got significantly smarter. PG 13 introduced parallel index processing during VACUUM. PG 16 added skipping pages that haven't changed since the last pass, which dramatically sped up operations on large tables. Disabling autovacuum "to run it manually at night" is more of an anti-pattern in 2026.
  • REINDEX CONCURRENTLY (PG 12) — index rebuilding without locking the table. Previously, you had to create a new index and drop the old one.
  • Lock monitoring became more convenient. pg_stat_activity and pg_locks gained new fields, and the pg_wait_sampling extension allows collecting wait statistics.
  • Advisory locks still only work within a single database. For distributed systems with multiple PostgreSQL instances, you need external solutions (Redis, etcd).

In Part 2, we'll cover indexes: how PostgreSQL chooses an execution plan, why the cost in EXPLAIN is just "parrots," and when an index actually hurts performance.

Top comments (0)