Last year, on a client project, I noticed a sudden spike in disk usage on a PostgreSQL database server. The system was running a critical production ERP, and disk fill rates exceeding 90% were a serious alarm signal. One of the first places I checked, of course, was the pg_wal directory. As I suspected, this directory was overflowing with gigabytes, even terabytes, of WAL (Write-Ahead Log) files.
This situation is actually a classic scenario that many system administrators or developers using PostgreSQL encounter. WAL bloat is an insidious problem that silently consumes disk space but can lead to major issues if the right steps aren't taken. In this post, I'll explain how I detected this problem and the 4 fundamental, step-by-step strategies I applied to reclaim disk space.
What is WAL Bloat and Why Does It Occur?
One of the core mechanisms that ensures data integrity and durability in PostgreSQL is the WAL, or Write-Ahead Log system. Any data modification (INSERT, UPDATE, DELETE) is first written to WAL files, then applied to the main data files. This guarantees that in the event of a crash, the database can be restored to its last consistent state. Normally, WAL files are recycled after a certain period or after checkpoints.
However, some situations can disrupt this recycling process and cause WAL files to accumulate on disk. This is what we call WAL bloat. I generally encounter it due to the following main reasons:
- Long-Running Transactions: An open transaction can prevent WAL segments from being recycled because there might be a need to revert to the point where that transaction started. In a manufacturing company's ERP, a nightly reporting process sometimes stayed open for 8-10 hours, leading to WAL bloat.
- Replication Slots: In systems using physical replication, replication slots hold the WAL segments that a replica server needs to catch up on. If a replica server remains offline for an extended period or its slot is not managed properly, these slots can continue to hold a large number of WAL files. In an internal banking platform, a replica server's slot in the test environment was forgotten for months, leading to over 2 TB of WAL accumulation on the primary server.
-
archive_modeSettings: Whenarchive_modeis active, WAL files are archived to a specified location. This is critical for disaster recovery scenarios. However, if the archiving process fails or the archiving target becomes full, WAL files continue to accumulate in thepg_waldirectory. I experienced a similar situation during a period when a cloud backup service had access problems. -
wal_keep_size(orwal_keep_segments) Value: This parameter determines how many WAL files PostgreSQL will keep in thepg_waldirectory. If a large value is set or if there's a need beyond the default, an unnecessarily large number of WAL files might be retained.
Detecting and intervening in these situations early is vital to prevent disk fill rates from reaching 100% and database service from stopping.
Detecting WAL Bloat: Disk Full Alarm and Beyond
WAL bloat usually manifests with disk full alarms, but being proactive is always better. I use several different methods and metrics to detect such problems in advance.
The first and simplest check is to inspect the size of the pg_wal directory. I connect to the server via SSH and run the following command:
sudo du -sh /var/lib/postgresql/16/main/pg_wal
Note: The directory path may vary depending on your PostgreSQL version and installation. On my system, it was /var/lib/postgresql/16/main.
This command shows the total size of the pg_wal directory. If this size is much larger than I expect (e.g., 50 GB on a system that normally should have 1-2 GB), I immediately start a deeper investigation.
From within the database, we can check the active WAL LSN (Log Sequence Number) position and archiving status:
SELECT pg_current_wal_lsn(), pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0') AS wal_bytes_since_start;
This query gives the current WAL position and the amount of WAL written since the database started. However, the crucial part is understanding why WAL files in the pg_wal directory cannot be recycled. For this, it's necessary to check replication slots and long-running transactions.
💡 Proactive Monitoring
In my own side project's infrastructure, I monitor the size of the
pg_waldirectory at regular intervals (with acronjob or a Prometheus exporter). When it exceeds a certain threshold (e.g., 10 GB), it immediately sends me an alert. This gives me a chance to intervene long before the disk fills up.
Additionally, scanning PostgreSQL logs (usually accessible via journalctl -u postgresql) for keywords like disk full or WAL segment can be helpful in understanding the root cause of the problem. Sometimes, you might see messages about the archiving command failing or warnings about a specific replication slot not progressing. For example, a log message I saw on April 28, 2026, was: LOG: could not archive WAL file "000000010000001E000000D0": No space left on device. This clearly indicated that the archiving target was full.
Step 1: Checking and Cleaning Replication Slots
Replication slots are a critical mechanism for PostgreSQL's physical replication. A slot holds the WAL LSN that a specific replica server needs to catch up on. If this slot is active, PostgreSQL will not delete or recycle WAL files after this LSN. This is necessary so that even if the replica server goes offline, it can resume from where it left off when it comes back online. However, unused or forgotten slots can cause the pg_wal directory to swell.
To check replication slots, I run the following query on the primary database:
SELECT slot_name, plugin, slot_type, active, restart_lsn, confirmed_flush_lsn
FROM pg_replication_slots;
The output of this query shows all replication slots, both active (active = true) and inactive (active = false). It's particularly important to pay attention to slots where the active column is f (false). These slots might indicate that a replica server is not connected or no longer exists. The restart_lsn column shows the oldest WAL LSN held by this slot, which gives an idea of how far back WAL files are being retained.
⚠️ Be Careful!
Deleting a replication slot will cause the replica server using that slot to lose synchronization with the primary server. In this case, the replica server may need to be rebuilt from scratch (with
pg_basebackup). Therefore, before deleting a slot, ensure which replica uses that slot and what its status is.
If I identify an inactive and no longer needed slot, I use the following command to delete it:
SELECT pg_drop_replication_slot('slot_name_here');
For example, if an old replication slot from a test server named test_replica_slot remains and that server no longer exists, I would run SELECT pg_drop_replication_slot('test_replica_slot'); to clean up that slot. After this operation, PostgreSQL automatically starts recycling unused WAL segments, and disk space gradually begins to free up.
I always consider this step the first one I check and usually the one that yields the fastest results. Once, on a client project, the pg_wal directory on the primary server reached 300 GB due to a replication slot from an old development environment. After deleting the slot, disk usage returned to normal within about 15 minutes.
Step 2: Optimizing archive_mode and wal_keep_size Settings
The second important step in WAL bloat management is to correctly configure the parameters that control PostgreSQL's WAL archiving and retention policies. These parameters are found in the postgresql.conf file and determine how long or how much WAL data the system will retain.
archive_mode
The archive_mode parameter controls whether WAL files are archived. Its values can be off, on, or always.
-
off: WAL archiving is disabled. -
on: WAL files are archived usingarchive_command. -
always: Works likeonand also archives during crash recovery.
If archive_mode is set to on or always and archive_command is not working correctly, or if the archiving target (e.g., an NFS share) is full, WAL files will start to accumulate in the pg_wal directory. To check this situation, I examine the postgresql.conf file and ensure that archive_command is correctly configured and operational.
Sometimes, instead of a simple cp command within archive_command, there might be a more complex script that uploads to a cloud storage service. Checking for errors in this script can be key to resolving the problem. In one project, I noticed that a script uploading to S3 was throwing authentication errors, which led to 100 GB of WAL accumulation in 2 days.
wal_keep_size (or wal_keep_segments)
This parameter determines the minimum amount of WAL (in MB) that the primary server will retain in the pg_wal directory for replication. In PostgreSQL 9.x and earlier, it was specified as the number of WAL segments under the name wal_keep_segments.
-
wal_keep_size = 0(default): No WAL is specifically retained for replica servers. If replication slots are used, the slots manage WAL retention. -
wal_keep_size = NMB: The specified amount of WAL files is retained.
Setting this value too high can lead to unnecessary disk space consumption. However, setting it too low can cause replica servers to fall behind and lead to timeline synchronization issues. Generally, if I'm using replication slots, I keep this value at 0, as slots provide much more dynamic and efficient management. If I'm not using replication slots (which I don't recommend), then I determine a value based on the longest offline period of replica servers and the WAL production rate.
After editing the postgresql.conf file to change these parameters, it may be necessary to restart the PostgreSQL service for the changes to take effect. However, we can also change them dynamically with the ALTER SYSTEM command:
-- To set wal_keep_size to 5 GB (e.g., if not using replication slots)
ALTER SYSTEM SET wal_keep_size = '5GB';
-- Or to set it to 0 (if using replication slots)
ALTER SYSTEM SET wal_keep_keep_size = '0';
-- Restart PostgreSQL for changes to apply
sudo systemctl restart postgresql@16-main.service
ℹ️ Checkpoints and WAL Bloat
Parameters like
checkpoint_timeoutandmax_wal_sizealso indirectly affect WAL generation and recycling. More frequent checkpoints can help WAL files be recycled faster, but they can also increase disk I/O. A balanced setting of these parameters affects overall database performance and WAL management.
Optimizing these settings is critical, especially for long-term disk space management. In my experience, often an unnecessarily high wal_keep_size parameter or a silently failing archive_command is one of the main causes of such bloat issues.
Step 3: Managing Long-Running Transactions and Unrecoverable States
Another common cause of WAL bloat is long-running transactions in the database. When a transaction starts with BEGIN; and does not end with COMMIT; or ROLLBACK;, all WAL records from the point where this transaction started are retained in case the transaction needs to be rolled back. This is a requirement of the MVCC (Multi-Version Concurrency Control) architecture and ensures transaction consistency. However, this situation also leads to the accumulation of WAL files.
To detect such transactions, I use the pg_stat_activity view:
SELECT pid, datname, usename, client_addr, application_name, backend_start, state, query_start,
query, state_change, pg_wal_lsn_diff(pg_current_wal_lsn(), backend_xmin::text::pg_lsn) AS xmin_wal_diff_bytes
FROM pg_stat_activity
WHERE state = 'active'
ORDER BY query_start ASC;
This query lists all active transactions, and the query_start column shows how long the transaction has been running. The xmin_wal_diff_bytes part indicates how much WAL is being retained based on the oldest active transaction ID of this transaction. If query_start is very old and xmin_wal_diff_bytes is high, this transaction is a potential source of WAL bloat.
In a client project, a transaction started for testing purposes by a developer but then forgotten remained open for 48 hours, causing 200 GB of WAL accumulation. For such situations, after identifying the responsible pid, it may be necessary to cancel or terminate the transaction.
-
pg_cancel_backend(pid): Attempts to gracefully cancel the transaction. This is often sufficient. -
pg_terminate_backend(pid): Forcefully terminates the transaction. This can cause the application performing the transaction to error out, so it should be used as a last resort.
-- For example, to cancel the transaction with PID 12345
SELECT pg_cancel_backend(12345);
-- Or to forcefully terminate it
SELECT pg_terminate_backend(12345);
⚠️ Communicate First!
Before terminating a transaction in a production environment, it is crucial to identify which application or user owns this transaction and to communicate with the relevant teams. Otherwise, you could cause unexpected service disruptions. What I usually do is try with
pg_cancel_backendfirst, and if it doesn't work and the situation is critical, then usepg_terminate_backend.
Long-running transactions not only cause WAL bloat but also prevent VACUUM and AUTOVACUUM operations from cleaning up old row versions. This, in turn, leads to table bloat and index bloat. Therefore, ensuring that transactions in application code are as short and efficient as possible is also a critical step for WAL management. In a production ERP, I significantly mitigated this problem by making autovacuum settings more aggressive (e.g., autovacuum_vacuum_scale_factor = 0.05) and setting up monitoring alarms that detect long-running transactions early.
Step 4: pg_basebackup and Temporary File Cleanup
PostgreSQL's pg_basebackup tool is used to take physical backups and initialize replica servers. During this process, or if it's interrupted, some temporary files or incomplete WAL segments might be left behind in the pg_wal directory or the main data directory. These situations can also contribute to WAL bloat.
During a pg_basebackup operation, a replication slot is usually created on the primary server (if specified), and WAL files are transferred via this slot. If pg_basebackup is suddenly interrupted or cannot complete, this slot might remain active and cause WAL files to accumulate. We can detect this situation by checking the pg_replication_slots view, as mentioned in Step 1.
Additionally, temporary files or incomplete WAL segments can sometimes remain in the pg_wal directory. These files typically have names like 00000001XXXXXXXXXXX.partial or a similar format. Such files are normally cleaned up automatically, but rarely, they might persist after an error or system crash.
To detect such situations, it may be necessary to directly inspect the pg_wal directory:
sudo ls -lh /var/lib/postgresql/16/main/pg_wal/ | grep -E '\.partial$|temp'
If I find such temporary files and am absolutely sure they are not being used by any process, I might consider deleting them manually. However, this operation must be performed with extreme caution, as deleting the wrong WAL files can lead to database inconsistency or data loss. Generally, such files are cleaned up when the database is restarted or a checkpoint is run.
🔥 Manually Deleting WAL Files
Manually deleting WAL files in the
pg_waldirectory is strongly discouraged and can lead to database corruption. WAL files should only be managed through PostgreSQL's own mechanisms (replication slots,wal_keep_size,archive_mode) or official tools (e.g.,pg_archivecleanup). Thepartialfile check above is merely a diagnostic tool, not a manual deletion command.
Once, in a test environment, the disk filled up during a pg_basebackup operation, and the process was left incomplete. Afterward, several partial WAL files remained in the pg_wal directory. When I restarted the database, these files were automatically cleaned up. This demonstrates PostgreSQL's self-recovery capability.
The primary goal of this step is to ensure that backup and replication processes are managed correctly. Regular and successful pg_basebackup operations ensure that backups are valid while also preventing unnecessary WAL accumulation. Reviewing my backup strategy and ensuring I use pg_basebackup commands correctly with target or slot parameters helps me minimize such problems.
Conclusion: WAL Bloat Management is an Ongoing Battle
PostgreSQL WAL bloat management is an area that requires continuous attention and monitoring; a "set it and forget it" approach simply doesn't work. I have applied the 4 steps mentioned above repeatedly in various projects and in the infrastructure of my own side products, successfully reclaiming disk space each time. However, each time, a detailed investigation and sometimes a different approach were needed to understand the root cause of the problem.
It's important to remember that such problems usually arise not from a single cause, but from a combination of multiple factors. A long-running transaction, an old replication slot, and a faulty archiving command can all be active simultaneously, creating a complex WAL bloat scenario. Therefore, regularly monitoring the size of the pg_wal directory and periodically checking the pg_stat_activity and pg_replication_slots views are important parts of my routine system administration tasks.
Encountering these issues actually helps us better understand our systems and build more resilient infrastructures. In my next post, I'll discuss the performance impacts of autovacuum settings in PostgreSQL and how I combat table bloat.
Top comments (0)