If you've ever had queries that suddenly hang, transactions that behave unexpectedly, or an UPDATE from two sessions that produces strange results — this article will help you understand what's happening inside PostgreSQL. We'll cover isolation levels, locks, VACUUM, and demonstrate everything with live SQL examples.
A few years ago I gave internal talks on PostgreSQL for my team. Then I spent a long time working with other technologies, and when I recently came back to PostgreSQL, the first thing I did was revisit my own notes to refresh the fundamentals. I figured if it was useful to me, it might be useful to others too. The core concepts of transactions and locks in PostgreSQL haven't changed, but a few things have gotten better over the past 5 years — more on that at the end.
now() vs clock_timestamp()
PostgreSQL has two functions for getting the current time: now() and clock_timestamp().
SELECT now(), now(), clock_timestamp(), clock_timestamp();
Both calls to now() will return the same time. But clock_timestamp() may differ — even within a single query.
To see the difference, let's open a transaction:
BEGIN;
SELECT now();
-- wait a moment...
SELECT now();
SELECT clock_timestamp();
COMMIT;
now() always returns the time the transaction started. This is useful when you need to set identical created_at and updated_at timestamps within a single operation — now() guarantees the same value throughout the entire transaction.
clock_timestamp() returns the actual wall-clock time, independent of the transaction.
Isolation Levels
The first thing that sets PostgreSQL apart from many other databases: only three isolation levels.
- Read Uncommitted / Read Committed — in PostgreSQL these are the same thing. Dirty reads simply don't exist.
- Repeatable Read
- Serializable
Another nice feature: PostgreSQL supports DDL inside transactions. You can create a table, insert data, and roll it all back within a single transaction — not every database can do this.
Locks: Two UPDATEs on the Same Row
Let's create a table:
BEGIN;
CREATE TABLE t_test1 (id int);
INSERT INTO t_test1 VALUES (1);
INSERT INTO t_test1 VALUES (2);
COMMIT;
Now let's update the same data from two sessions simultaneously:
Session 1:
BEGIN ISOLATION LEVEL READ COMMITTED;
UPDATE t_test1 SET id = id + 1 RETURNING *;
-- don't commit yet
Session 2:
BEGIN ISOLATION LEVEL READ COMMITTED;
UPDATE t_test1 SET id = id + 1 RETURNING *;
-- hangs!
The second UPDATE waits for the first transaction to complete. Even though PostgreSQL uses MVCC and is positioned as a non-blocking database — write locks do exist.
Lock Types
| # | Lock | When Acquired | Conflicts With |
|---|---|---|---|
| 1 | ACCESS SHARE | SELECT |
ACCESS EXCLUSIVE |
| 2 | ROW SHARE | SELECT FOR UPDATE/SHARE |
EXCLUSIVE, ACCESS EXCLUSIVE |
| 3 | ROW EXCLUSIVE |
INSERT, UPDATE, DELETE
|
SHARE and above |
| 4 | SHARE UPDATE EXCLUSIVE |
CREATE INDEX CONCURRENTLY, ANALYZE, VACUUM
|
SHARE UPDATE EXCLUSIVE and above |
| 5 | SHARE | CREATE INDEX |
ROW EXCLUSIVE and above |
| 6 | SHARE ROW EXCLUSIVE |
CREATE TRIGGER, some ALTER TABLE
|
ROW SHARE and above |
| 7 | EXCLUSIVE | Allows only reads | ROW SHARE and above |
| 8 | ACCESS EXCLUSIVE |
DROP TABLE, ALTER TABLE, VACUUM FULL
|
All |
The higher the number, the stricter the lock. Two operations with conflicting locks cannot work on the same object simultaneously: one will wait.
Worth noting separately: the difference between CREATE INDEX (blocks inserts/updates/deletes) and CREATE INDEX CONCURRENTLY (builds the index slower but doesn't block DML operations).
Read Committed: Phantom and Non-Repeatable Reads
The default level is Read Committed — each statement within a transaction gets a fresh data snapshot.
Let's create a table for our examples:
BEGIN;
CREATE TABLE t_test2 (class int, value int);
INSERT INTO t_test2 VALUES (1, 10);
INSERT INTO t_test2 VALUES (1, 20);
INSERT INTO t_test2 VALUES (2, 100);
INSERT INTO t_test2 VALUES (2, 200);
COMMIT;
Phantom Reads
Session 1:
BEGIN ISOLATION LEVEL READ COMMITTED;
SELECT sum(value) FROM t_test2;
-- 330
Session 2 inserts data and commits:
INSERT INTO t_test2 VALUES (1, 30);
COMMIT;
Session 1:
SELECT sum(value) FROM t_test2;
-- 360! New rows appeared.
COMMIT;
Non-Repeatable Reads
Session 1:
BEGIN ISOLATION LEVEL READ COMMITTED;
SELECT value FROM t_test2 WHERE class = 1 AND value = 10;
-- Returns the row with value = 10
Session 2 updates data and commits:
UPDATE t_test2 SET value = 15 WHERE class = 1 AND value = 10;
COMMIT;
Session 1:
SELECT value FROM t_test2 WHERE class = 1 AND value = 10;
-- Empty! The row we just saw has changed.
COMMIT;
Same issue as with phantom reads, but here no new rows appear — existing ones change.
Repeatable Read
To avoid both anomalies, use Repeatable Read:
BEGIN ISOLATION LEVEL REPEATABLE READ;
SELECT sum(value) FROM t_test2;
-- 390
Meanwhile, another session inserts data and commits. We come back:
SELECT sum(value) FROM t_test2;
-- Still 390.
COMMIT;
The data snapshot is fixed at the time of the first query in the transaction and doesn't change.
Serializable: Why It Fails
The strictest level. It simulates sequential transaction execution.
Cross-Inserts
Session 1:
BEGIN ISOLATION LEVEL SERIALIZABLE;
SELECT sum(value) FROM t_test2 WHERE class = 1;
INSERT INTO t_test2 VALUES (2, 30);
COMMIT; -- OK
Session 2:
BEGIN ISOLATION LEVEL SERIALIZABLE;
SELECT sum(value) FROM t_test2 WHERE class = 2;
INSERT INTO t_test2 VALUES (1, 30);
COMMIT; -- ERROR: could not serialize access
Whoever commits first wins. The second one gets a serialization error.
False Conflicts and Predicate Locks
Here's an interesting case. Two transactions work with different classes — there's no logical conflict:
Session 1: reads class=1, inserts into class=1
Session 2: reads class=2, inserts into class=2
You'd think everything should be fine. But on a small table — the second transaction fails.
The reason is predicate locks (SSI, Serializable Snapshot Isolation). PostgreSQL stores them at the page level, not at the individual row level. When the data is small and both classes land on the same page, a false conflict occurs. On a table with a million rows, where the data lives on different pages, everything works fine.
This isn't a bug — it's an implementation detail. The PostgreSQL documentation explicitly states: transactions at the Repeatable Read and Serializable levels must be prepared to retry.
Puzzle: Alice and Bob On Call
An on-call schedule table:
BEGIN;
CREATE TABLE d_test (name text, on_call bool);
INSERT INTO d_test VALUES ('Alice', true);
INSERT INTO d_test VALUES ('Bob', true);
INSERT INTO d_test VALUES ('Carol', false);
COMMIT;
The rule: at least one person must remain on call. Alice and Bob both want to leave at the same time.
Both sessions at Repeatable Read:
BEGIN ISOLATION LEVEL REPEATABLE READ;
SELECT count(*) FROM d_test WHERE on_call = true;
-- 2 (safe to leave)
UPDATE d_test SET on_call = false WHERE name = 'Alice'; -- or 'Bob'
COMMIT;
Both transactions see two on-call people, both commit successfully. Result: nobody on call. A classic race condition.
Solutions
1. Serializable — the second transaction will fail with a serialization error and can be retried.
2. SELECT ... FOR UPDATE — lock the selected rows:
SELECT count(*) FROM d_test WHERE on_call = true FOR UPDATE;
The second session will wait for the first to finish and will see the up-to-date data.
3. Atomic operation — combine the check and the action into a single query:
UPDATE d_test
SET on_call = false
WHERE name = 'Alice'
AND (SELECT count(*) FROM d_test WHERE on_call = true) > 1;
If only one person is on call at execution time, the UPDATE simply won't affect any rows. No window for a race condition.
Advisory Locks
PostgreSQL lets you lock on an arbitrary number rather than a row or table:
BEGIN;
SELECT pg_advisory_lock(15);
-- do some work
COMMIT;
-- the lock is NOT released!
SELECT pg_advisory_unlock(15);
An important detail: pg_advisory_lock operates at the session level, not the transaction level. The lock persists until an explicit pg_advisory_unlock or until the connection is closed.
If you need a transaction-level lock, use pg_advisory_xact_lock — it's released automatically on COMMIT/ROLLBACK.
VACUUM: The Garbage Collector
With MVCC, PostgreSQL creates new row versions on every UPDATE. Old versions ("dead tuples") need to be cleaned up — that's what VACUUM does.
Experiment
CREATE TABLE t_test (id int) WITH (autovacuum_enabled = off);
INSERT INTO t_test SELECT * FROM generate_series(1, 100000);
SELECT pg_size_pretty(pg_relation_size('t_test'));
-- ~3.5 MB
Let's update all the data:
UPDATE t_test SET id = id + 1;
SELECT pg_size_pretty(pg_relation_size('t_test'));
-- ~7 MB (doubled!)
PostgreSQL marked the old rows as deleted and inserted new ones. The size doubled.
VACUUM t_test;
SELECT pg_size_pretty(pg_relation_size('t_test'));
-- Still ~7 MB!
VACUUM doesn't shrink the table — it marks space as available for reuse. On subsequent INSERT/UPDATE operations, this space will be occupied without growing the file.
When the Size Actually Shrinks
DELETE FROM t_test WHERE id > 50000;
VACUUM t_test;
SELECT pg_size_pretty(pg_relation_size('t_test'));
-- ~3.5 MB — halved!
We deleted the upper half of the data, and the file size actually decreased. The thing is, VACUUM returns disk space only if the freed pages are at the end of the file. We deleted rows with large ids that physically reside at the end — so VACUUM was able to "truncate the tail." If we had deleted rows from the middle of the table, the file size would remain the same, and the space would simply be marked as available for reuse.
What Changed in PostgreSQL Over 5 Years
The core concepts of transactions and locks haven't changed — everything described above still works the same way. But several things have improved:
- VACUUM got significantly smarter. PG 13 introduced parallel index processing during VACUUM. PG 16 added skipping pages that haven't changed since the last pass, which dramatically sped up operations on large tables. Disabling autovacuum "to run it manually at night" is more of an anti-pattern in 2026.
-
REINDEX CONCURRENTLY(PG 12) — index rebuilding without locking the table. Previously, you had to create a new index and drop the old one. -
Lock monitoring became more convenient.
pg_stat_activityandpg_locksgained new fields, and thepg_wait_samplingextension allows collecting wait statistics. - Advisory locks still only work within a single database. For distributed systems with multiple PostgreSQL instances, you need external solutions (Redis, etcd).
In Part 2, we'll cover indexes: how PostgreSQL chooses an execution plan, why the cost in EXPLAIN is just "parrots," and when an index actually hurts performance.
Top comments (0)