PostgreSQL XID Wraparound: What It Is and How to Prevent It
Every PostgreSQL transaction gets a 32-bit transaction ID. That's about 4.2 billion IDs before the counter wraps around. If your database gets close to that limit without properly "freezing" old rows, PostgreSQL will do something drastic: it stops accepting new transactions entirely. Your database goes read-only. Not slow, not degraded -- completely unable to write.
I want to walk through what's actually happening under the hood, how to detect it, and how to make sure it never happens to you.
What Are XIDs and Why Do They Wrap Around?
PostgreSQL uses transaction IDs to determine row visibility. When you run a query, PostgreSQL compares XIDs to figure out whether a particular row was committed before or after your transaction started. It uses modular arithmetic with a window of about 2 billion transactions in each direction.
The problem: 32 bits gives you ~4.2 billion IDs total, but the visibility window is only ~2 billion. When a database consumes around 2 billion XIDs without freezing old tuple headers, PostgreSQL cannot safely assign new ones. Rather than risk data corruption, it shuts down writes and tells you to vacuum manually.
This is a deliberate safety mechanism. PostgreSQL would rather stop your application than silently corrupt your data. The issue is that XID age increases invisibly -- there's no performance degradation, no errors, nothing to alert on until you hit the hard limit.
How Freezing Prevents Wraparound
PostgreSQL's defense against wraparound is "freezing." When vacuum freezes a row, it marks the tuple header as visible to all future transactions regardless of XID. This effectively removes that row from the XID age calculation.
Autovacuum handles this automatically. When a table's XID age exceeds autovacuum_freeze_max_age (default: 200 million), autovacuum triggers an aggressive freeze vacuum on that table. Under normal conditions, this keeps XID age well below dangerous levels.
But several things can break this safety net:
- Long-running transactions hold back the XID horizon. Vacuum cannot freeze any rows newer than the oldest active transaction's snapshot.
- Idle-in-transaction sessions are the most common culprit -- a connection that started a transaction and never committed or rolled back.
- Disabled autovacuum on specific tables (someone thought vacuum was using too much I/O).
- Worker starvation -- only 3 autovacuum workers by default, and large tables can monopolize them.
Detecting the Problem
Database-Level XID Age
-- Check transaction ID age for all databases
SELECT
datname AS database_name,
age(datfrozenxid) AS xid_age,
round(100.0 * age(datfrozenxid) / 2147483647, 2) AS percent_to_wraparound,
datfrozenxid
FROM pg_database
WHERE datname NOT IN ('template0', 'template1')
ORDER BY age(datfrozenxid) DESC;
Rules of thumb:
- Below 5% -- healthy
- 5-25% -- investigate
- Above 50% -- emergency, act immediately
Per-Table XID Age
The database-level age is determined by whichever table has the oldest unfrozen XIDs. Find the bottleneck:
SELECT
schemaname,
relname AS table_name,
age(relfrozenxid) AS xid_age,
pg_size_pretty(pg_relation_size(oid)) AS table_size
FROM pg_class
JOIN pg_stat_user_tables USING (relname)
WHERE relkind = 'r'
ORDER BY age(relfrozenxid) DESC
LIMIT 10;
If one table dominates, that's where your vacuum effort needs to go.
Long-Running Transactions Blocking Freeze
SELECT pid, age(backend_xmin) AS xmin_age, state, query,
now() - xact_start AS transaction_duration
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
ORDER BY age(backend_xmin) DESC LIMIT 5;
A single idle-in-transaction session with a large xmin_age can hold back freeze progress for the entire database. This is the most common reason XID age grows unexpectedly.
Monitoring Over Time
Point-in-time queries show the current state, but they miss the critical information: the rate of change. An XID age of 100 million is fine if you consume 1 million XIDs per day (100 days of headroom). It's alarming if you consume 50 million per day (2 days of headroom). You need trend data to distinguish between these scenarios.
Fixing It
VACUUM FREEZE
Force freeze on the highest-priority tables:
-- Freeze a specific table
VACUUM FREEZE sim_counters;
-- Freeze all tables in the database
VACUUM FREEZE;
-- Verify XID age decreased
SELECT datname, age(datfrozenxid) AS xid_age
FROM pg_database
WHERE datname = current_database();
On large tables, VACUUM FREEZE reads and rewrites every page. It can take hours and generates significant I/O. For an emergency where you need maximum speed:
-- Remove all throttling for emergency freeze
SET vacuum_cost_delay = 0;
VACUUM FREEZE sim_events;
RESET vacuum_cost_delay;
Setting vacuum_cost_delay = 0 lets vacuum consume as much I/O as it wants. Use this in emergencies only -- during normal operations it will compete heavily with application queries.
Kill Blocking Transactions
If long-running transactions are preventing freeze:
-- Terminate the blocking session (get PID from the detection query above)
SELECT pg_terminate_backend(12345);
Preventing It
Prevention is far cheaper than emergency remediation. A few settings go a long way:
Kill Idle-in-Transaction Sessions Automatically
ALTER SYSTEM SET idle_in_transaction_session_timeout = '10min';
SELECT pg_reload_conf();
This is the single most impactful prevention measure. Forgotten connections that hold open transactions are the #1 cause of XID age creep.
Tune Freeze Thresholds for High-Throughput Systems
The default autovacuum_freeze_max_age of 200 million works for most workloads. If your system burns through millions of XIDs per day, lower it to trigger freeze cycles earlier:
-- Lower freeze threshold (in postgresql.conf or ALTER SYSTEM)
ALTER SYSTEM SET autovacuum_freeze_max_age = 100000000; -- 100 million
SELECT pg_reload_conf();
Monitor the Trend, Not Just the Value
The current XID age tells you where you are. The rate of change tells you when you'll have a problem. Track age(datfrozenxid) over time for each database and alert on sustained increases. If XID age has been climbing for days, autovacuum is falling behind and you need to investigate why.
The most dangerous XID wraparound scenarios are the ones where everything looks fine until it suddenly doesn't. By the time the current value is alarming, you may already be in emergency territory. Trend monitoring catches the problem days or weeks earlier, when intervention is straightforward rather than urgent.
Originally published at mydba.dev/blog/xid-wraparound-prevention


Top comments (0)