Shinya Kato

Posted on Jan 20

4 causes of table bloat in PostgreSQL and how to address them

#postgres

What Is Table Bloat?

Table bloat in PostgreSQL refers to the phenomenon where "dead tuples" generated by UPDATE or DELETE operations remain uncollected by VACUUM, causing data files to grow unnecessarily large.

For VACUUM to reclaim dead tuples, it must be guaranteed that those tuples "cannot possibly be referenced by any currently running transaction." If old transactions persist for any reason, VACUUM's garbage collection stops at that point.

This article explains the following four causes of table bloat: how each manifests, how to identify the root cause, and how to resolve it.

Long-running transactions
Uncommitted prepared transactions
Queries on standby servers with hot_standby_feedback enabled
Logical replication lag

Test Environment

PostgreSQL 19dev (34740b90bc123d645a3a71231b765b778bdcf049)

Long-Running Transactions

This is probably the most familiar cause. Whether active or idle, a long-running transaction prevents VACUUM from reclaiming dead tuples generated by UPDATE or DELETE operations that occurred after the transaction started. This is because the long-running transaction might need to read the pre-update versions of those tuples.

Setup

In Terminal 1, start a transaction and obtain a transaction ID.

Terminal 1:
=# BEGIN;
BEGIN
=*# SELECT txid_current();
 txid_current
--------------
          782
(1 row)

In a separate Terminal 2, delete data and run VACUUM.

Terminal 2:
=# DELETE FROM t;
DELETE 100
=# VACUUM (VERBOSE) t;
INFO:  00000: vacuuming "postgres.public.t"
LOCATION:  heap_vacuum_rel, vacuumlazy.c:848
INFO:  00000: finished vacuuming "postgres.public.t": index scans: 0
pages: 0 removed, 1 remain, 1 scanned (100.00% of total), 0 eagerly scanned
tuples: 0 removed, 100 remain, 100 are dead but not yet removable
removable cutoff: 782, which was 2 XIDs old when operation ended
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
visibility map: 0 pages set all-visible, 0 pages set all-frozen (0 were all-visible)
index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
buffer usage: 15 hits, 0 reads, 0 dirtied
WAL usage: 0 records, 0 full page images, 0 bytes, 0 full page image bytes, 0 buffers full
memory usage: dead item storage 0.02 MB accumulated across 0 resets (limit 64.00 MB each)
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
LOCATION:  heap_vacuum_rel, vacuumlazy.c:1199
VACUUM

The message tuples: 0 removed, 100 remain, 100 are dead but not yet removable indicates that dead tuples were not reclaimed.

Identifying the cause

The message removable cutoff: 782, which was 2 XIDs old when operation ended suggests that transaction ID 782 is the culprit. Check the pg_stat_activity view.

Terminal 2:
=# SELECT * FROM pg_stat_activity WHERE backend_xid = 782;
-[ RECORD 1 ]----+------------------------------
datid            | 5
datname          | postgres
pid              | 94076
leader_pid       | [NULL]
usesysid         | 10
usename          | shinya
application_name | psql
client_addr      | [NULL]
client_hostname  | [NULL]
client_port      | -1
backend_start    | 2026-01-19 13:41:30.049678+09
xact_start       | 2026-01-19 13:54:38.856466+09
query_start      | 2026-01-19 13:54:43.501664+09
state_change     | 2026-01-19 13:54:43.50271+09
wait_event_type  | Client
wait_event       | ClientRead
state            | idle in transaction
backend_xid      | 782
backend_xmin     | [NULL]
query_id         | [NULL]
query            | SELECT txid_current();
backend_type     | client backend

The backend process with PID 94076 is in an "idle in transaction" state, which prevented VACUUM from reclaiming the dead tuples.

Resolution

Terminate the backend process with PID 94076.

Terminal 2:
=# SELECT pg_terminate_backend(94076);
 pg_terminate_backend
----------------------
 t
(1 row)

=# VACUUM (VERBOSE) t;
INFO:  00000: vacuuming "postgres.public.t"
LOCATION:  heap_vacuum_rel, vacuumlazy.c:848
INFO:  00000: table "t": truncated 1 to 0 pages
LOCATION:  lazy_truncate_heap, vacuumlazy.c:3401
INFO:  00000: finished vacuuming "postgres.public.t": index scans: 0
pages: 1 removed, 0 remain, 1 scanned (100.00% of total), 0 eagerly scanned
tuples: 100 removed, 0 remain, 0 are dead but not yet removable
removable cutoff: 792, which was 1 XIDs old when operation ended
new relfrozenxid: 792, which is 3 XIDs ahead of previous value
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
visibility map: 1 pages set all-visible, 1 pages set all-frozen (0 were all-visible)
index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 7.796 MB/s
buffer usage: 20 hits, 0 reads, 6 dirtied
WAL usage: 8 records, 6 full page images, 34103 bytes, 33628 full page image bytes, 0 buffers full
memory usage: dead item storage 0.02 MB accumulated across 0 resets (limit 64.00 MB each)
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
LOCATION:  heap_vacuum_rel, vacuumlazy.c:1199
VACUUM

The message tuples: 100 removed, 0 remain, 0 are dead but not yet removable confirms that all dead tuples have been reclaimed.

Note: Always verify that it is safe to terminate the backend process before doing so.

Uncommitted Prepared Transactions

This can occur when applications or distributed database middleware use two-phase commit (2PC). Uncommitted prepared transactions persist in the database even after the session is disconnected, preventing VACUUM from reclaiming dead tuples.

Setup

To use prepared transactions, set max_prepared_transactions to 1 or higher and restart PostgreSQL.

postgresql.conf:
max_prepared_transactions = 10

Start a transaction and prepare it for 2PC.

=# BEGIN;
BEGIN
=*# PREPARE TRANSACTION 'foobar';
PREPARE TRANSACTION

Delete data and run VACUUM.

=# DELETE FROM t;
DELETE 100
=# VACUUM (VERBOSE) t;
INFO:  00000: vacuuming "postgres.public.t"
LOCATION:  heap_vacuum_rel, vacuumlazy.c:848
INFO:  00000: finished vacuuming "postgres.public.t": index scans: 0
pages: 0 removed, 1 remain, 1 scanned (100.00% of total), 0 eagerly scanned
tuples: 0 removed, 100 remain, 100 are dead but not yet removable
removable cutoff: 796, which was 2 XIDs old when operation ended
new relfrozenxid: 793, which is 1 XIDs ahead of previous value
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
visibility map: 0 pages set all-visible, 0 pages set all-frozen (0 were all-visible)
index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
avg read rate: 14.974 MB/s, avg write rate: 14.974 MB/s
buffer usage: 13 hits, 4 reads, 4 dirtied
WAL usage: 4 records, 4 full page images, 25678 bytes, 25412 full page image bytes, 0 buffers full
memory usage: dead item storage 0.02 MB accumulated across 0 resets (limit 64.00 MB each)
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
LOCATION:  heap_vacuum_rel, vacuumlazy.c:1199
VACUUM

The message tuples: 0 removed, 100 remain, 100 are dead but not yet removable indicates that dead tuples were not reclaimed.

Identifying the cause

The message removable cutoff: 796, which was 2 XIDs old when operation ended suggests that transaction ID 796 is the culprit. Check the pg_prepared_xacts view.

=# SELECT * FROM pg_prepared_xacts;
 transaction |  gid   |           prepared            | owner  | database
-------------+--------+-------------------------------+--------+----------
         796 | foobar | 2026-01-19 14:49:30.089288+09 | shinya | postgres
(1 row)

The prepared transaction with global transaction ID "foobar" prevented VACUUM from reclaiming the dead tuples.

Resolution

Either commit or roll back the prepared transaction with global transaction ID "foobar" using COMMIT PREPARED or ROLLBACK PREPARED.

=# COMMIT PREPARED 'foobar';
COMMIT PREPARED
=# VACUUM (VERBOSE) t;
INFO:  00000: vacuuming "postgres.public.t"
LOCATION:  heap_vacuum_rel, vacuumlazy.c:848
INFO:  00000: table "t": truncated 1 to 0 pages
LOCATION:  lazy_truncate_heap, vacuumlazy.c:3401
INFO:  00000: finished vacuuming "postgres.public.t": index scans: 0
pages: 1 removed, 0 remain, 1 scanned (100.00% of total), 0 eagerly scanned
tuples: 100 removed, 0 remain, 0 are dead but not yet removable
removable cutoff: 798, which was 1 XIDs old when operation ended
new relfrozenxid: 798, which is 5 XIDs ahead of previous value
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
visibility map: 1 pages set all-visible, 1 pages set all-frozen (0 were all-visible)
index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 9.778 MB/s
buffer usage: 20 hits, 0 reads, 6 dirtied
WAL usage: 8 records, 6 full page images, 34103 bytes, 33628 full page image bytes, 0 buffers full
memory usage: dead item storage 0.02 MB accumulated across 0 resets (limit 64.00 MB each)
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
LOCATION:  heap_vacuum_rel, vacuumlazy.c:1199
VACUUM

The message tuples: 100 removed, 0 remain, 0 are dead but not yet removable confirms that all dead tuples have been reclaimed.

Queries on Standby with `hot_standby_feedback` Enabled

In a streaming replication setup with hot_standby_feedback enabled, queries on the standby server can prevent VACUUM from reclaiming dead tuples on the primary. This is by design: hot_standby_feedback exists specifically to prevent dead tuple removal by VACUUM and avoid query conflicts on the standby.

Setup

Set up a streaming replication environment, enable hot_standby_feedback on the standby, and reduce wal_receiver_status_interval to increase the frequency of feedback from the standby to the primary.

postgresql.conf:
hot_standby_feedback = on
wal_receiver_status_interval = 1s

On the standby, start a transaction and execute a query.

Standby:
=# BEGIN;
BEGIN
=*# SELECT pg_sleep(300);
 pg_sleep
----------

(1 row)

Note: We use pg_sleep() to prevent the snapshot from being released immediately when the query completes. You could also set the transaction isolation level to REPEATABLE READ to maintain the snapshot within the transaction.

On the primary, delete data and run VACUUM.

Primary:
=# DELETE FROM t;
DELETE 100
=# VACUUM (VERBOSE) t;
INFO:  00000: vacuuming "postgres.public.t"
LOCATION:  heap_vacuum_rel, vacuumlazy.c:848
INFO:  00000: finished vacuuming "postgres.public.t": index scans: 0
pages: 0 removed, 1 remain, 1 scanned (100.00% of total), 0 eagerly scanned
tuples: 0 removed, 100 remain, 100 are dead but not yet removable
removable cutoff: 831, which was 1 XIDs old when operation ended
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
visibility map: 0 pages set all-visible, 0 pages set all-frozen (0 were all-visible)
index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
buffer usage: 12 hits, 0 reads, 0 dirtied
WAL usage: 0 records, 0 full page images, 0 bytes, 0 full page image bytes, 0 buffers full
memory usage: dead item storage 0.02 MB accumulated across 0 resets (limit 64.00 MB each)
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
LOCATION:  heap_vacuum_rel, vacuumlazy.c:1199
VACUUM

The message tuples: 0 removed, 100 remain, 100 are dead but not yet removable indicates that dead tuples were not reclaimed.

Identifying the cause

The message removable cutoff: 831, which was 1 XIDs old when operation ended suggests that transaction ID 831 is the culprit. Check the pg_stat_replication view.

Primary:
=# SELECT * FROM pg_stat_replication WHERE backend_xmin = 831;
-[ RECORD 1 ]----+------------------------------
pid              | 171632
usesysid         | 10
usename          | shinya
application_name | walreceiver
client_addr      | [NULL]
client_hostname  | [NULL]
client_port      | -1
backend_start    | 2026-01-19 16:32:24.33927+09
backend_xmin     | 831
state            | streaming
sent_lsn         | 0/03081178
write_lsn        | 0/03081178
flush_lsn        | 0/03081178
replay_lsn       | 0/03081178
write_lag        | [NULL]
flush_lag        | [NULL]
replay_lag       | [NULL]
sync_priority    | 0
sync_state       | async
reply_time       | 2026-01-19 17:29:08.063083+09

Transaction ID 831 is being reported via hot_standby_feedback.

Next, check the pg_stat_activity view on the standby.

Standby:
=# SELECT * FROM pg_stat_activity WHERE backend_xmin = 831;
-[ RECORD 1 ]----+------------------------------
datid            | 5
datname          | postgres
pid              | 197146
leader_pid       | [NULL]
usesysid         | 10
usename          | shinya
application_name | psql
client_addr      | [NULL]
client_hostname  | [NULL]
client_port      | -1
backend_start    | 2026-01-19 17:25:19.931946+09
xact_start       | 2026-01-19 17:28:33.640921+09
query_start      | 2026-01-19 17:28:36.617061+09
state_change     | 2026-01-19 17:28:36.617067+09
wait_event_type  | Timeout
wait_event       | PgSleep
state            | active
backend_xid      | [NULL]
backend_xmin     | 831
query_id         | [NULL]
query            | SELECT pg_sleep(300);
backend_type     | client backend

The query SELECT pg_sleep(300) running on the standby backend process with PID 197146 is the cause.

Resolution

Cancel the query running on the standby backend process (PID 197146).

Standby:
=# SELECT pg_cancel_backend(197146);
 pg_cancel_backend
-------------------
 t
(1 row)

Primary:
=# VACUUM (VERBOSE) t;
INFO:  00000: vacuuming "postgres.public.t"
LOCATION:  heap_vacuum_rel, vacuumlazy.c:848
INFO:  00000: table "t": truncated 1 to 0 pages
LOCATION:  lazy_truncate_heap, vacuumlazy.c:3401
INFO:  00000: finished vacuuming "postgres.public.t": index scans: 0
pages: 1 removed, 0 remain, 1 scanned (100.00% of total), 0 eagerly scanned
tuples: 100 removed, 0 remain, 0 are dead but not yet removable
removable cutoff: 792, which was 1 XIDs old when operation ended
new relfrozenxid: 792, which is 3 XIDs ahead of previous value
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
visibility map: 1 pages set all-visible, 1 pages set all-frozen (0 were all-visible)
index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 7.796 MB/s
buffer usage: 20 hits, 0 reads, 6 dirtied
WAL usage: 8 records, 6 full page images, 34103 bytes, 33628 full page image bytes, 0 buffers full
memory usage: dead item storage 0.02 MB accumulated across 0 resets (limit 64.00 MB each)
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
LOCATION:  heap_vacuum_rel, vacuumlazy.c:1199
VACUUM

The message tuples: 100 removed, 0 remain, 0 are dead but not yet removable confirms that all dead tuples have been reclaimed.

Logical Replication Lag

Logical replication reads WAL records, decodes them, and sends them to subscribers. A key point is that the decoding process requires the system catalog state as it existed at the "past point in time" when the WAL record was generated.

For example, when decoding WAL for a table that had a column dropped in the past, the decoder must reference the table definition from "before the column was dropped" rather than the current definition to decode correctly. Therefore, the system catalog dead tuples required for that point in time must be retained until the logical replication slot finishes processing that WAL.

Setup

Create a logical replication slot.

=# SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding');
 pg_create_logical_replication_slot
------------------------------------
 (test_slot,0/017BBC00)
(1 row)

Drop an existing table and run VACUUM on the pg_class catalog.

=# DROP TABLE t;
DROP TABLE
=# VACUUM (VERBOSE) pg_class;
INFO:  00000: vacuuming "postgres.pg_catalog.pg_class"
LOCATION:  heap_vacuum_rel, vacuumlazy.c:848
INFO:  00000: finished vacuuming "postgres.pg_catalog.pg_class": index scans: 0
pages: 0 removed, 14 remain, 14 scanned (100.00% of total), 0 eagerly scanned
tuples: 0 removed, 418 remain, 1 are dead but not yet removable
removable cutoff: 780, which was 1 XIDs old when operation ended
new relfrozenxid: 779, which is 19 XIDs ahead of previous value
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
visibility map: 0 pages set all-visible, 0 pages set all-frozen (0 were all-visible)
index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 16.421 MB/s
buffer usage: 45 hits, 0 reads, 4 dirtied
WAL usage: 4 records, 4 full page images, 32658 bytes, 32392 full page image bytes, 0 buffers full
memory usage: dead item storage 0.02 MB accumulated across 0 resets (limit 64.00 MB each)
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
LOCATION:  heap_vacuum_rel, vacuumlazy.c:1199
VACUUM

The message tuples: 0 removed, 418 remain, 1 are dead but not yet removable indicates that a dead tuple was not reclaimed.

Identifying the cause

The message removable cutoff: 780, which was 1 XIDs old when operation ended suggests that transaction ID 780 is the culprit. Check the pg_replication_slots view.

=# SELECT * FROM pg_replication_slots WHERE catalog_xmin = 780;
-[ RECORD 1 ]--------+------------------------------
slot_name            | test_slot
plugin               | test_decoding
slot_type            | logical
datoid               | 5
database             | postgres
temporary            | f
active               | f
active_pid           | [NULL]
xmin                 | [NULL]
catalog_xmin         | 780
restart_lsn          | 0/017BBBC8
confirmed_flush_lsn  | 0/017BBC00
wal_status           | reserved
safe_wal_size        | [NULL]
two_phase            | f
two_phase_at         | [NULL]
inactive_since       | 2026-01-19 20:09:07.152103+09
conflicting          | f
invalidation_reason  | [NULL]
failover             | f
synced               | f
slotsync_skip_reason | [NULL]

This logical replication slot requires transaction ID 780 as its oldest transaction affecting the system catalogs.

Resolution

Since this logical replication slot is not in use, drop it.

=# SELECT pg_drop_replication_slot('test_slot');
 pg_drop_replication_slot
--------------------------

(1 row)

=# VACUUM (VERBOSE) pg_class;
INFO:  00000: vacuuming "postgres.pg_catalog.pg_class"
LOCATION:  heap_vacuum_rel, vacuumlazy.c:848
INFO:  00000: finished vacuuming "postgres.pg_catalog.pg_class": index scans: 1
pages: 0 removed, 14 remain, 14 scanned (100.00% of total), 0 eagerly scanned
tuples: 1 removed, 417 remain, 0 are dead but not yet removable
removable cutoff: 781, which was 0 XIDs old when operation ended
new relfrozenxid: 781, which is 2 XIDs ahead of previous value
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
visibility map: 1 pages set all-visible, 1 pages set all-frozen (0 were all-visible)
index scan needed: 1 pages from table (7.14% of total) had 1 dead item identifiers removed
index "pg_class_oid_index": pages: 4 in total, 0 newly deleted, 0 currently deleted, 0 reusable
index "pg_class_relname_nsp_index": pages: 6 in total, 0 newly deleted, 0 currently deleted, 0 reusable
index "pg_class_tblspc_relfilenode_index": pages: 2 in total, 0 newly deleted, 0 currently deleted, 0 reusable
avg read rate: 7.032 MB/s, avg write rate: 63.288 MB/s
buffer usage: 62 hits, 1 reads, 9 dirtied
WAL usage: 9 records, 9 full page images, 55417 bytes, 54864 full page image bytes, 0 buffers full
memory usage: dead item storage 0.03 MB accumulated across 1 reset (limit 64.00 MB each)
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
LOCATION:  heap_vacuum_rel, vacuumlazy.c:1199
VACUUM

The message tuples: 1 removed, 417 remain, 0 are dead but not yet removable confirms that all dead tuples have been reclaimed.

Ongoing Community Discussions

This article has explained the causes and solutions for table bloat.

Since we intentionally created these situations to demonstrate the solutions, they may not have seemed particularly difficult to handle. However, resolving these issues in practice requires significant PostgreSQL operational knowledge and involves checking multiple system views, which can be quite challenging.

Furthermore, when table bloat is accompanied by an impending transaction ID wraparound crisis, executing these procedures calmly while avoiding operational errors is stressful, and ideally, we would want to avoid such situations.

Currently, I am proposing a feature to the PostgreSQL community that would output the reason why dead tuples could not be reclaimed in the VACUUM log. It does not seem close to being committed yet, but once it is, I plan to update this blog post.

Report oldest xmin source when autovacuum cannot remove tuples

There have also been past discussions about displaying the reason why dead tuples could not be reclaimed in a system view, but this has not been implemented yet either. I would like to work on this as well.

Proposal: Expose oldest xmin as SQL function for monitoring

Summary

This article explained the causes of table bloat in PostgreSQL and how to address them. I hope you found it helpful.

Top comments (2)

Shashikant Shakya • Jan 20

Great blog and very useful insights! It would be even more helpful to include one more point about physical replication lag as another potential cause.

Shinya Kato • Jan 21

Thank you for your comment! Actually, physical replication lag itself doesn't cause table bloat on the primary. What matters is hot_standby_feedback combined with long-running queries on the standby, which is already covered in the article.

What Is Table Bloat?

Test Environment

Long-Running Transactions

Setup

Identifying the cause

Resolution

Uncommitted Prepared Transactions

Setup

Identifying the cause

Resolution

Queries on Standby with hot_standby_feedback Enabled

Setup

Identifying the cause

Resolution

Logical Replication Lag

Setup

Identifying the cause

Resolution

Ongoing Community Discussions

Summary

Queries on Standby with `hot_standby_feedback` Enabled