PostgreSQL 🐘 is a robust database, but its MVCC (Multi-Version Concurrency Control) design can sometimes lead to a subtle issue: table and index bloat. If left unchecked, bloat wastes storage, increases I/O, and can slow queries down.
In this post, I’ll walk through:
- What bloat is and why it happens
- Different ways to measure it
- How to keep an eye on it without killing your database
- A few habits to reduce or fix it
What is bloat? 🧹
- Table bloat : Every update creates a new row version, and the old one becomes dead. Those dead tuples sit there until vacuum cleans them.
- Index bloat : Even when dead tuples are gone from the table, their index entries may stick around until an index vacuum or reindex.
Why care?
- Bigger indexes mean slower lookups. Bigger indexes = slower index scans (more pages to read).
- Larger tables mean more I/O. Bigger tables = more disk I/O and less cache efficiency.
- Wasted pages reduce cache efficiency. More bloat = more work for autovacuum.
A little is fine. A lot can hurt.
Ways to measure bloat 📐
1. ioguix scripts (detailed but heavy)
The ioguixpgsql-bloat-estimation
repo has two well-known queries:
They’re pretty accurate, but on a big database they can be very slow and eating too much resource.
2. Maxim Boguk’s formula (fast heuristic)
Described in pg_index_pilot. It’s not exact, but it’s fast enough to use in monitoring.
Formula:
bloat_indicator = index_size / reltuples
Here’s SQL you can run right away (no extensions needed):
SELECT
i.schemaname,
c.relname AS table_name,
i.indexrelname AS index_name,
pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size,
c.reltuples::bigint AS row_estimate,
CASE
WHEN c.reltuples > 0
THEN pg_relation_size(i.indexrelid) / c.reltuples
ELSE NULL
END AS bytes_per_row
FROM pg_stat_all_indexes i
JOIN pg_class c ON c.oid = i.relid
ORDER BY bytes_per_row DESC NULLS LAST
LIMIT 20;
-
bytes_per_row
acts as the Boguk bloat indicator. - The higher it is compared to the average row size, the more bloated the index might be.
- This is not exact, but it’s fast and lightweight.
Boguk’s bloat indicator is not measured in bytes or per cents. It is to be used in relative scenario: first, we measure the “ideal” value – the value of freshly built index. And then, we observe how the value changes over time – if it significantly bigger than the “ideal” one, it is time to reindex.
3. pgstattuple (precise but slower)
For deeper inspection, the pgstattuple
extension can scan indexes or tables directly:
CREATE EXTENSION IF NOT EXISTS pgstattuple;
SELECT * FROM pgstattuple('my_index');
It’s more precise, but scans every page, requires superuser and scans the whole object (can be slow).
Monitoring bloat over time 📊
A couple of lightweight checks go a long way:
- Watch
pg_stat_all_tables.n_dead_tup
for dead tuples. - Track index sizes with
pg_relation_size()
. - Use the Boguk formula as a baseline trend.
- Occasionally validate with
pgstattuple
.
There are tools too:
Preventing and fixing bloat 🛠️
From experience, a mix of these helps:
- Autovacuum tuning : Lower thresholds on busy tables so cleanup kicks in earlier.
- Fillfactor : For frequently updated tables, setting a lower fillfactor (say 80–90%) leaves room for HOT updates.
-
Reindexing : Use
REINDEX CONCURRENTLY
(Postgres ≥ 12) to shrink indexes without downtime. - Partitioning : Breaking a giant write-heavy table into partitions can keep bloat local.
-
Drop unused indexes : If
pg_stat_user_indexes.idx_scan = 0
for months, it’s a candidate for removal. Can easily win 40%.
How much does it matter? (🚦 rule of thumb)
- <20–30% bloat → probably fine, not urgent.
- > 50% bloat on large tables/indexes → worth attention, because performance will likely degrade.
The key is to watch trends, not just snapshots. A sudden growth spike is often more telling than a single measurement.
Final thoughts
- Use Boguk’s formula for fast monitoring.
- Use pgstattuple or ioguix scripts for precise checks.
- Prevent bloat with autovacuum tuning , reindexing , and dropping unused indexes.
Top comments (0)