What Is Table Bloat?
Table bloat in PostgreSQL refers to the phenomenon where "dead tuples" generated by UPDATE or DELETE operations remain uncollected by VACUUM, causing data files to grow unnecessarily large.
For VACUUM to reclaim dead tuples, it must be guaranteed that those tuples "cannot possibly be referenced by any currently running transaction." If old transactions persist for any reason, VACUUM's garbage collection stops at that point.
This article explains the following four causes of table bloat: how each manifests, how to identify the root cause, and how to resolve it.
- Long-running transactions
- Uncommitted prepared transactions
- Queries on standby servers with
hot_standby_feedbackenabled - Logical replication lag
Test Environment
- PostgreSQL 19dev (
34740b90bc123d645a3a71231b765b778bdcf049)
Long-Running Transactions
This is probably the most familiar cause. Whether active or idle, a long-running transaction prevents VACUUM from reclaiming dead tuples generated by UPDATE or DELETE operations that occurred after the transaction started. This is because the long-running transaction might need to read the pre-update versions of those tuples.
Setup
In Terminal 1, start a transaction and obtain a transaction ID.
Terminal 1:
=# BEGIN;
BEGIN
=*# SELECT txid_current();
txid_current
--------------
782
(1 row)
In a separate Terminal 2, delete data and run VACUUM.
Terminal 2:
=# DELETE FROM t;
DELETE 100
=# VACUUM (VERBOSE) t;
INFO: 00000: vacuuming "postgres.public.t"
LOCATION: heap_vacuum_rel, vacuumlazy.c:848
INFO: 00000: finished vacuuming "postgres.public.t": index scans: 0
pages: 0 removed, 1 remain, 1 scanned (100.00% of total), 0 eagerly scanned
tuples: 0 removed, 100 remain, 100 are dead but not yet removable
removable cutoff: 782, which was 2 XIDs old when operation ended
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
visibility map: 0 pages set all-visible, 0 pages set all-frozen (0 were all-visible)
index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
buffer usage: 15 hits, 0 reads, 0 dirtied
WAL usage: 0 records, 0 full page images, 0 bytes, 0 full page image bytes, 0 buffers full
memory usage: dead item storage 0.02 MB accumulated across 0 resets (limit 64.00 MB each)
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
LOCATION: heap_vacuum_rel, vacuumlazy.c:1199
VACUUM
The message tuples: 0 removed, 100 remain, 100 are dead but not yet removable indicates that dead tuples were not reclaimed.
Identifying the cause
The message removable cutoff: 782, which was 2 XIDs old when operation ended suggests that transaction ID 782 is the culprit. Check the pg_stat_activity view.
Terminal 2:
=# SELECT * FROM pg_stat_activity WHERE backend_xid = 782;
-[ RECORD 1 ]----+------------------------------
datid | 5
datname | postgres
pid | 94076
leader_pid | [NULL]
usesysid | 10
usename | shinya
application_name | psql
client_addr | [NULL]
client_hostname | [NULL]
client_port | -1
backend_start | 2026-01-19 13:41:30.049678+09
xact_start | 2026-01-19 13:54:38.856466+09
query_start | 2026-01-19 13:54:43.501664+09
state_change | 2026-01-19 13:54:43.50271+09
wait_event_type | Client
wait_event | ClientRead
state | idle in transaction
backend_xid | 782
backend_xmin | [NULL]
query_id | [NULL]
query | SELECT txid_current();
backend_type | client backend
The backend process with PID 94076 is in an "idle in transaction" state, which prevented VACUUM from reclaiming the dead tuples.
Resolution
Terminate the backend process with PID 94076.
Terminal 2:
=# SELECT pg_terminate_backend(94076);
pg_terminate_backend
----------------------
t
(1 row)
=# VACUUM (VERBOSE) t;
INFO: 00000: vacuuming "postgres.public.t"
LOCATION: heap_vacuum_rel, vacuumlazy.c:848
INFO: 00000: table "t": truncated 1 to 0 pages
LOCATION: lazy_truncate_heap, vacuumlazy.c:3401
INFO: 00000: finished vacuuming "postgres.public.t": index scans: 0
pages: 1 removed, 0 remain, 1 scanned (100.00% of total), 0 eagerly scanned
tuples: 100 removed, 0 remain, 0 are dead but not yet removable
removable cutoff: 792, which was 1 XIDs old when operation ended
new relfrozenxid: 792, which is 3 XIDs ahead of previous value
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
visibility map: 1 pages set all-visible, 1 pages set all-frozen (0 were all-visible)
index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 7.796 MB/s
buffer usage: 20 hits, 0 reads, 6 dirtied
WAL usage: 8 records, 6 full page images, 34103 bytes, 33628 full page image bytes, 0 buffers full
memory usage: dead item storage 0.02 MB accumulated across 0 resets (limit 64.00 MB each)
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
LOCATION: heap_vacuum_rel, vacuumlazy.c:1199
VACUUM
The message tuples: 100 removed, 0 remain, 0 are dead but not yet removable confirms that all dead tuples have been reclaimed.
Note: Always verify that it is safe to terminate the backend process before doing so.
Uncommitted Prepared Transactions
This can occur when applications or distributed database middleware use two-phase commit (2PC). Uncommitted prepared transactions persist in the database even after the session is disconnected, preventing VACUUM from reclaiming dead tuples.
Setup
To use prepared transactions, set max_prepared_transactions to 1 or higher and restart PostgreSQL.
postgresql.conf:
max_prepared_transactions = 10
Start a transaction and prepare it for 2PC.
=# BEGIN;
BEGIN
=*# PREPARE TRANSACTION 'foobar';
PREPARE TRANSACTION
Delete data and run VACUUM.
=# DELETE FROM t;
DELETE 100
=# VACUUM (VERBOSE) t;
INFO: 00000: vacuuming "postgres.public.t"
LOCATION: heap_vacuum_rel, vacuumlazy.c:848
INFO: 00000: finished vacuuming "postgres.public.t": index scans: 0
pages: 0 removed, 1 remain, 1 scanned (100.00% of total), 0 eagerly scanned
tuples: 0 removed, 100 remain, 100 are dead but not yet removable
removable cutoff: 796, which was 2 XIDs old when operation ended
new relfrozenxid: 793, which is 1 XIDs ahead of previous value
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
visibility map: 0 pages set all-visible, 0 pages set all-frozen (0 were all-visible)
index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
avg read rate: 14.974 MB/s, avg write rate: 14.974 MB/s
buffer usage: 13 hits, 4 reads, 4 dirtied
WAL usage: 4 records, 4 full page images, 25678 bytes, 25412 full page image bytes, 0 buffers full
memory usage: dead item storage 0.02 MB accumulated across 0 resets (limit 64.00 MB each)
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
LOCATION: heap_vacuum_rel, vacuumlazy.c:1199
VACUUM
The message tuples: 0 removed, 100 remain, 100 are dead but not yet removable indicates that dead tuples were not reclaimed.
Identifying the cause
The message removable cutoff: 796, which was 2 XIDs old when operation ended suggests that transaction ID 796 is the culprit. Check the pg_prepared_xacts view.
=# SELECT * FROM pg_prepared_xacts;
transaction | gid | prepared | owner | database
-------------+--------+-------------------------------+--------+----------
796 | foobar | 2026-01-19 14:49:30.089288+09 | shinya | postgres
(1 row)
The prepared transaction with global transaction ID "foobar" prevented VACUUM from reclaiming the dead tuples.
Resolution
Either commit or roll back the prepared transaction with global transaction ID "foobar" using COMMIT PREPARED or ROLLBACK PREPARED.
=# COMMIT PREPARED 'foobar';
COMMIT PREPARED
=# VACUUM (VERBOSE) t;
INFO: 00000: vacuuming "postgres.public.t"
LOCATION: heap_vacuum_rel, vacuumlazy.c:848
INFO: 00000: table "t": truncated 1 to 0 pages
LOCATION: lazy_truncate_heap, vacuumlazy.c:3401
INFO: 00000: finished vacuuming "postgres.public.t": index scans: 0
pages: 1 removed, 0 remain, 1 scanned (100.00% of total), 0 eagerly scanned
tuples: 100 removed, 0 remain, 0 are dead but not yet removable
removable cutoff: 798, which was 1 XIDs old when operation ended
new relfrozenxid: 798, which is 5 XIDs ahead of previous value
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
visibility map: 1 pages set all-visible, 1 pages set all-frozen (0 were all-visible)
index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 9.778 MB/s
buffer usage: 20 hits, 0 reads, 6 dirtied
WAL usage: 8 records, 6 full page images, 34103 bytes, 33628 full page image bytes, 0 buffers full
memory usage: dead item storage 0.02 MB accumulated across 0 resets (limit 64.00 MB each)
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
LOCATION: heap_vacuum_rel, vacuumlazy.c:1199
VACUUM
The message tuples: 100 removed, 0 remain, 0 are dead but not yet removable confirms that all dead tuples have been reclaimed.
Queries on Standby with hot_standby_feedback Enabled
In a streaming replication setup with hot_standby_feedback enabled, queries on the standby server can prevent VACUUM from reclaiming dead tuples on the primary. This is by design: hot_standby_feedback exists specifically to prevent dead tuple removal by VACUUM and avoid query conflicts on the standby.
Setup
Set up a streaming replication environment, enable hot_standby_feedback on the standby, and reduce wal_receiver_status_interval to increase the frequency of feedback from the standby to the primary.
postgresql.conf:
hot_standby_feedback = on
wal_receiver_status_interval = 1s
On the standby, start a transaction and execute a query.
Standby:
=# BEGIN;
BEGIN
=*# SELECT pg_sleep(300);
pg_sleep
----------
(1 row)
Note: We use
pg_sleep()to prevent the snapshot from being released immediately when the query completes. You could also set the transaction isolation level toREPEATABLE READto maintain the snapshot within the transaction.
On the primary, delete data and run VACUUM.
Primary:
=# DELETE FROM t;
DELETE 100
=# VACUUM (VERBOSE) t;
INFO: 00000: vacuuming "postgres.public.t"
LOCATION: heap_vacuum_rel, vacuumlazy.c:848
INFO: 00000: finished vacuuming "postgres.public.t": index scans: 0
pages: 0 removed, 1 remain, 1 scanned (100.00% of total), 0 eagerly scanned
tuples: 0 removed, 100 remain, 100 are dead but not yet removable
removable cutoff: 831, which was 1 XIDs old when operation ended
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
visibility map: 0 pages set all-visible, 0 pages set all-frozen (0 were all-visible)
index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
buffer usage: 12 hits, 0 reads, 0 dirtied
WAL usage: 0 records, 0 full page images, 0 bytes, 0 full page image bytes, 0 buffers full
memory usage: dead item storage 0.02 MB accumulated across 0 resets (limit 64.00 MB each)
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
LOCATION: heap_vacuum_rel, vacuumlazy.c:1199
VACUUM
The message tuples: 0 removed, 100 remain, 100 are dead but not yet removable indicates that dead tuples were not reclaimed.
Identifying the cause
The message removable cutoff: 831, which was 1 XIDs old when operation ended suggests that transaction ID 831 is the culprit. Check the pg_stat_replication view.
Primary:
=# SELECT * FROM pg_stat_replication WHERE backend_xmin = 831;
-[ RECORD 1 ]----+------------------------------
pid | 171632
usesysid | 10
usename | shinya
application_name | walreceiver
client_addr | [NULL]
client_hostname | [NULL]
client_port | -1
backend_start | 2026-01-19 16:32:24.33927+09
backend_xmin | 831
state | streaming
sent_lsn | 0/03081178
write_lsn | 0/03081178
flush_lsn | 0/03081178
replay_lsn | 0/03081178
write_lag | [NULL]
flush_lag | [NULL]
replay_lag | [NULL]
sync_priority | 0
sync_state | async
reply_time | 2026-01-19 17:29:08.063083+09
Transaction ID 831 is being reported via hot_standby_feedback.
Next, check the pg_stat_activity view on the standby.
Standby:
=# SELECT * FROM pg_stat_activity WHERE backend_xmin = 831;
-[ RECORD 1 ]----+------------------------------
datid | 5
datname | postgres
pid | 197146
leader_pid | [NULL]
usesysid | 10
usename | shinya
application_name | psql
client_addr | [NULL]
client_hostname | [NULL]
client_port | -1
backend_start | 2026-01-19 17:25:19.931946+09
xact_start | 2026-01-19 17:28:33.640921+09
query_start | 2026-01-19 17:28:36.617061+09
state_change | 2026-01-19 17:28:36.617067+09
wait_event_type | Timeout
wait_event | PgSleep
state | active
backend_xid | [NULL]
backend_xmin | 831
query_id | [NULL]
query | SELECT pg_sleep(300);
backend_type | client backend
The query SELECT pg_sleep(300) running on the standby backend process with PID 197146 is the cause.
Resolution
Cancel the query running on the standby backend process (PID 197146).
Standby:
=# SELECT pg_cancel_backend(197146);
pg_cancel_backend
-------------------
t
(1 row)
Primary:
=# VACUUM (VERBOSE) t;
INFO: 00000: vacuuming "postgres.public.t"
LOCATION: heap_vacuum_rel, vacuumlazy.c:848
INFO: 00000: table "t": truncated 1 to 0 pages
LOCATION: lazy_truncate_heap, vacuumlazy.c:3401
INFO: 00000: finished vacuuming "postgres.public.t": index scans: 0
pages: 1 removed, 0 remain, 1 scanned (100.00% of total), 0 eagerly scanned
tuples: 100 removed, 0 remain, 0 are dead but not yet removable
removable cutoff: 792, which was 1 XIDs old when operation ended
new relfrozenxid: 792, which is 3 XIDs ahead of previous value
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
visibility map: 1 pages set all-visible, 1 pages set all-frozen (0 were all-visible)
index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 7.796 MB/s
buffer usage: 20 hits, 0 reads, 6 dirtied
WAL usage: 8 records, 6 full page images, 34103 bytes, 33628 full page image bytes, 0 buffers full
memory usage: dead item storage 0.02 MB accumulated across 0 resets (limit 64.00 MB each)
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
LOCATION: heap_vacuum_rel, vacuumlazy.c:1199
VACUUM
The message tuples: 100 removed, 0 remain, 0 are dead but not yet removable confirms that all dead tuples have been reclaimed.
Logical Replication Lag
Logical replication reads WAL records, decodes them, and sends them to subscribers. A key point is that the decoding process requires the system catalog state as it existed at the "past point in time" when the WAL record was generated.
For example, when decoding WAL for a table that had a column dropped in the past, the decoder must reference the table definition from "before the column was dropped" rather than the current definition to decode correctly. Therefore, the system catalog dead tuples required for that point in time must be retained until the logical replication slot finishes processing that WAL.
Setup
Create a logical replication slot.
=# SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding');
pg_create_logical_replication_slot
------------------------------------
(test_slot,0/017BBC00)
(1 row)
Drop an existing table and run VACUUM on the pg_class catalog.
=# DROP TABLE t;
DROP TABLE
=# VACUUM (VERBOSE) pg_class;
INFO: 00000: vacuuming "postgres.pg_catalog.pg_class"
LOCATION: heap_vacuum_rel, vacuumlazy.c:848
INFO: 00000: finished vacuuming "postgres.pg_catalog.pg_class": index scans: 0
pages: 0 removed, 14 remain, 14 scanned (100.00% of total), 0 eagerly scanned
tuples: 0 removed, 418 remain, 1 are dead but not yet removable
removable cutoff: 780, which was 1 XIDs old when operation ended
new relfrozenxid: 779, which is 19 XIDs ahead of previous value
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
visibility map: 0 pages set all-visible, 0 pages set all-frozen (0 were all-visible)
index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 16.421 MB/s
buffer usage: 45 hits, 0 reads, 4 dirtied
WAL usage: 4 records, 4 full page images, 32658 bytes, 32392 full page image bytes, 0 buffers full
memory usage: dead item storage 0.02 MB accumulated across 0 resets (limit 64.00 MB each)
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
LOCATION: heap_vacuum_rel, vacuumlazy.c:1199
VACUUM
The message tuples: 0 removed, 418 remain, 1 are dead but not yet removable indicates that a dead tuple was not reclaimed.
Identifying the cause
The message removable cutoff: 780, which was 1 XIDs old when operation ended suggests that transaction ID 780 is the culprit. Check the pg_replication_slots view.
=# SELECT * FROM pg_replication_slots WHERE catalog_xmin = 780;
-[ RECORD 1 ]--------+------------------------------
slot_name | test_slot
plugin | test_decoding
slot_type | logical
datoid | 5
database | postgres
temporary | f
active | f
active_pid | [NULL]
xmin | [NULL]
catalog_xmin | 780
restart_lsn | 0/017BBBC8
confirmed_flush_lsn | 0/017BBC00
wal_status | reserved
safe_wal_size | [NULL]
two_phase | f
two_phase_at | [NULL]
inactive_since | 2026-01-19 20:09:07.152103+09
conflicting | f
invalidation_reason | [NULL]
failover | f
synced | f
slotsync_skip_reason | [NULL]
This logical replication slot requires transaction ID 780 as its oldest transaction affecting the system catalogs.
Resolution
Since this logical replication slot is not in use, drop it.
=# SELECT pg_drop_replication_slot('test_slot');
pg_drop_replication_slot
--------------------------
(1 row)
=# VACUUM (VERBOSE) pg_class;
INFO: 00000: vacuuming "postgres.pg_catalog.pg_class"
LOCATION: heap_vacuum_rel, vacuumlazy.c:848
INFO: 00000: finished vacuuming "postgres.pg_catalog.pg_class": index scans: 1
pages: 0 removed, 14 remain, 14 scanned (100.00% of total), 0 eagerly scanned
tuples: 1 removed, 417 remain, 0 are dead but not yet removable
removable cutoff: 781, which was 0 XIDs old when operation ended
new relfrozenxid: 781, which is 2 XIDs ahead of previous value
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
visibility map: 1 pages set all-visible, 1 pages set all-frozen (0 were all-visible)
index scan needed: 1 pages from table (7.14% of total) had 1 dead item identifiers removed
index "pg_class_oid_index": pages: 4 in total, 0 newly deleted, 0 currently deleted, 0 reusable
index "pg_class_relname_nsp_index": pages: 6 in total, 0 newly deleted, 0 currently deleted, 0 reusable
index "pg_class_tblspc_relfilenode_index": pages: 2 in total, 0 newly deleted, 0 currently deleted, 0 reusable
avg read rate: 7.032 MB/s, avg write rate: 63.288 MB/s
buffer usage: 62 hits, 1 reads, 9 dirtied
WAL usage: 9 records, 9 full page images, 55417 bytes, 54864 full page image bytes, 0 buffers full
memory usage: dead item storage 0.03 MB accumulated across 1 reset (limit 64.00 MB each)
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
LOCATION: heap_vacuum_rel, vacuumlazy.c:1199
VACUUM
The message tuples: 1 removed, 417 remain, 0 are dead but not yet removable confirms that all dead tuples have been reclaimed.
Ongoing Community Discussions
This article has explained the causes and solutions for table bloat.
Since we intentionally created these situations to demonstrate the solutions, they may not have seemed particularly difficult to handle. However, resolving these issues in practice requires significant PostgreSQL operational knowledge and involves checking multiple system views, which can be quite challenging.
Furthermore, when table bloat is accompanied by an impending transaction ID wraparound crisis, executing these procedures calmly while avoiding operational errors is stressful, and ideally, we would want to avoid such situations.
Currently, I am proposing a feature to the PostgreSQL community that would output the reason why dead tuples could not be reclaimed in the VACUUM log. It does not seem close to being committed yet, but once it is, I plan to update this blog post.
Report oldest xmin source when autovacuum cannot remove tuples
There have also been past discussions about displaying the reason why dead tuples could not be reclaimed in a system view, but this has not been implemented yet either. I would like to work on this as well.
Proposal: Expose oldest xmin as SQL function for monitoring
Summary
This article explained the causes of table bloat in PostgreSQL and how to address them. I hope you found it helpful.
Top comments (2)
Great blog and very useful insights! It would be even more helpful to include one more point about physical replication lag as another potential cause.
Thank you for your comment! Actually, physical replication lag itself doesn't cause table bloat on the primary. What matters is hot_standby_feedback combined with long-running queries on the standby, which is already covered in the article.