Piter Adyson

Posted on Jan 14

PostgreSQL WAL archiving explained — Understanding Write-Ahead Logs for backup and recovery

#postgres #database

Write-Ahead Logs are PostgreSQL's insurance policy against data loss. Every transaction gets written to WAL before it touches your actual database files. This might sound like extra overhead, but it's what keeps your data safe during crashes, enables point-in-time recovery and makes replication possible.

WAL archiving takes this further by saving these log files to external storage. This lets you restore your database to any moment in time, not just to your last backup. But it comes with operational complexity that many projects don't actually need.

Let's look at how WAL works, when you need archiving and what alternatives exist for simpler backup strategies.

What is Write-Ahead Logging?

WAL is PostgreSQL's crash recovery mechanism. Before any data changes hit your database files, PostgreSQL writes those changes to sequential log files. If the server crashes mid-transaction, PostgreSQL replays the WAL during startup to restore consistency.

The concept is simple: write changes to a log first, then apply them to actual data files. The log is sequential and fast to write. Data files get updated in the background at PostgreSQL's convenience. If something goes wrong, you have the log to reconstruct what happened.

WAL files sit in the pg_wal directory (called pg_xlog in older versions). PostgreSQL creates 16MB segments by default. Once a segment fills up, PostgreSQL moves to the next one. Old segments get recycled or removed once they're no longer needed for crash recovery.

How WAL archiving works

WAL archiving means copying completed WAL segments to external storage before PostgreSQL recycles them. This gives you a complete history of all database changes over time. Combined with a base backup, you can restore to any point between the backup and the present.

Here's the basic flow. PostgreSQL finishes writing to a WAL segment. Before recycling it, PostgreSQL runs your archive command (a shell script or program you define). This command copies the segment somewhere safe like S3, NAS or another server. Only after successful archiving does PostgreSQL mark the segment as recyclable.

You configure archiving in postgresql.conf:

wal_level = replica
archive_mode = on
archive_command = 'cp %p /mnt/wal_archive/%f'

The %p is the path to the WAL file on disk. The %f is just the filename. Your archive command should return exit code 0 on success or non-zero on failure. If it fails, PostgreSQL retries until it succeeds, which prevents data loss but can fill up pg_wal if your archive destination is down.

A simple archive command might copy files to network storage. Production setups often use more robust solutions with error handling, compression and verification.

Setting up WAL archiving for continuous backup

First you need a base backup. This is a full copy of your database cluster taken with pg_basebackup or a physical backup tool. WAL archives alone are useless without a base backup to apply them to.

Take a base backup:

pg_basebackup -D /backup/base -Ft -z -P

This creates a compressed tar archive of your database cluster. The -P shows progress. Store this somewhere safe.

Enable archiving in postgresql.conf:

wal_level = replica
archive_mode = on
archive_command = 'test ! -f /mnt/wal_archive/%f && cp %p /mnt/wal_archive/%f'
archive_timeout = 300

The archive_timeout forces a WAL segment switch every 5 minutes even if it's not full. This ensures your archive stays relatively current. Without it, a low-traffic database might not archive WAL for hours.

Restart PostgreSQL to apply the changes. New WAL segments will now get copied to /mnt/wal_archive/. Monitor this directory to make sure archiving is working and you're not running out of space.

Configuration Parameter	Purpose	Recommended Value
wal_level	Amount of information written to WAL	`replica` or `logical`
archive_mode	Enable/disable archiving	`on`
archive_command	Shell command to copy WAL files	Custom script with error handling
archive_timeout	Force WAL switch after this many seconds	`300` (5 minutes)
max_wal_size	Maximum WAL size before checkpoint	`1GB` or higher

Point-in-time recovery with WAL archives

PITR lets you restore to any moment between your base backup and your last archived WAL segment. This is useful when someone drops a table at 2:37 PM and you want to restore to 2:36 PM.

Start with your base backup. Extract it to your data directory. Create a recovery.conf file (or use recovery.signal in PostgreSQL 12+) that tells PostgreSQL where to find WAL archives and when to stop recovery.

For PostgreSQL 12 and later:

restore_command = 'cp /mnt/wal_archive/%f %p'
recovery_target_time = '2026-01-14 14:36:00'

Put these settings in postgresql.conf and create an empty file called recovery.signal in the data directory. Start PostgreSQL. It will restore from the base backup, then replay WAL archives up to your target time.

PostgreSQL stops at the specified time and creates a new timeline. This prevents confusion if you later need to restore again. You now have your database as it was at 14:36.

The three main recovery targets:

Time-based: restore to a specific timestamp
Transaction-based: restore to a specific transaction ID
Immediate: restore just the base backup without replaying any WAL

WAL archiving tools and solutions

Several tools handle WAL archiving better than custom shell scripts. They add compression, encryption, incremental backups and cloud storage support.

pgBackRest is a comprehensive backup tool that handles base backups, WAL archiving and restoration. It supports parallel processing, compression and multiple backup repositories. Configuration can be complex but it's solid for production use.

WAL-G focuses on continuous archiving with efficient compression and cloud storage. It uses delta compression for base backups and handles parallel upload. Good for teams comfortable with Go-based tools and cloud infrastructure.

Barman provides centralized backup management for multiple PostgreSQL servers. It handles both WAL archiving and traditional backups. The catalog system makes it easy to see what backups exist and manage retention policies.

For most users who just need reliable backups without WAL complexity, PostgreSQL backup tools like Databasus provide scheduled logical backups with cloud storage and notifications. This covers 95% of real-world backup needs with significantly less operational overhead.

Tool	Best For	Key Features
pgBackRest	Large production systems	Parallel processing, multiple repositories, compression
WAL-G	Cloud-native deployments	Efficient compression, S3/GCS/Azure support
Barman	Managing multiple servers	Centralized management, retention policies

When WAL archiving matters (and when it doesn't)

WAL archiving makes sense in specific situations. If you need second-by-second recovery granularity, it's essential. If you're running high-transaction systems where even 5 minutes of data loss is unacceptable, you need continuous archiving. Some compliance requirements mandate PITR capability.

But most projects don't need it. Cloud database services like AWS RDS, Google Cloud SQL and Azure already provide native PITR. Setting up your own WAL archiving for cloud databases is pointless since you can't restore those archives to managed instances anyway.

For self-hosted databases, hourly or daily logical backups cover most scenarios. Losing 1-2 hours of data is usually acceptable when weighed against the operational complexity of maintaining WAL archiving. The tooling is more complex, monitoring is harder and restoration takes longer.

Consider WAL archiving if you:

Need point-in-time recovery to the exact second
Run critical financial or transactional systems
Have compliance requirements for continuous backup
Manage high-volume databases where logical backups are too slow

Skip it if you:

Use managed cloud databases with built-in PITR
Can tolerate hourly or daily backup granularity
Want simple backup operations without complex tooling
Have limited DevOps resources for maintaining archiving infrastructure

Simple alternative with Databasus

If WAL archiving feels too complex for your needs, Databasus provides a simpler approach to PostgreSQL backups. It uses logical backups with scheduling, multiple storage options and notifications. No archive commands, no WAL management, no timeline confusion.

Databasus works through a web interface. You configure your database, pick a schedule and choose where to store backups. It handles compression, uploads to cloud storage and sends notifications when something goes wrong.

Installation with Docker

The simplest way to get started:

docker run -d \
  --name databasus \
  -p 4005:4005 \
  -v ./databasus-data:/databasus-data \
  --restart unless-stopped \
  databasus/databasus:latest

Installation with Docker Compose

Create a docker-compose.yml file:

services:
  databasus:
    container_name: databasus
    image: databasus/databasus:latest
    ports:
      - "4005:4005"
    volumes:
      - ./databasus-data:/databasus-data
    restart: unless-stopped

Run it:

docker compose up -d

Setting up your first backup

Once installed, access the dashboard at http://localhost:4005.

Add your database. Click "New Database" and enter your PostgreSQL connection details (host, port, database name, username and password). Databasus validates the connection before saving.

Select storage. Choose where to store backups: local disk, S3, Google Drive, Cloudflare R2, SFTP or other supported storage. For S3, you'll need your bucket name, region, access key and secret key.

Select schedule. Pick how often to run backups: hourly, daily, weekly, monthly or custom cron expression. For most production systems, hourly or daily backups provide adequate protection.

Create backup. Click "Create" and Databasus will run the first backup immediately. You'll see progress in real-time and get notifications when it completes.

The interface shows all your backups with timestamps, sizes and status. Click any backup to download, restore or delete it. No command line needed unless you prefer it.

Common WAL archiving problems

The archive command fails but PostgreSQL keeps trying, filling up pg_wal until your server runs out of disk space. This is the most common issue. Monitor pg_wal size and set up alerts. Make sure your archive destination is always available or use tools that queue failed archives.

Network issues during archiving can cause WAL to pile up quickly on busy servers. A 16MB segment might archive in seconds normally, but network problems can create a backlog of hundreds of segments in minutes. Keep local disk space generous and monitor archive lag.

Archives exist but restoration fails because files are corrupted or incomplete. Always test your restoration process. Having archives doesn't mean they work. Schedule periodic restoration tests to a separate server to verify your backup chain is intact.

Timeline confusion after PITR makes it hard to track which archives belong to which recovery attempt. PostgreSQL creates new timelines after each recovery to prevent mixing WAL from different recovery paths. Keep clear documentation of timeline switches and what each represents.

Problem	Symptom	Solution
Failed archive command	pg_wal fills up disk	Fix destination, add monitoring, increase disk space
Network issues	Archive lag increases	Use local queue, monitor lag, fix connectivity
Corrupted archives	Restoration fails	Verify archives after creation, test restores regularly
Missing base backup	Can't restore from WAL	Keep base backups with matching timeline markers

WAL configuration parameters

The wal_level parameter controls how much information gets written to WAL. The minimal setting writes just enough for crash recovery. The replica setting adds information needed for streaming replication. The logical setting adds even more for logical replication. Use replica unless you need logical replication.

checkpoint_timeout and max_wal_size control when PostgreSQL writes dirty buffers to disk. More frequent checkpoints mean faster crash recovery but more I/O overhead. Larger max_wal_size reduces checkpoint frequency but uses more disk space in pg_wal.

wal_keep_size prevents PostgreSQL from recycling WAL segments that streaming replicas might still need. This matters for replication but not for archive-based recovery. If you only use WAL archiving without streaming replication, you can leave this at default.

-- Recommended settings for WAL archiving
wal_level = replica
archive_mode = on
archive_command = '/usr/local/bin/archive_wal.sh %p %f'
archive_timeout = 300
max_wal_size = 2GB
min_wal_size = 1GB
checkpoint_timeout = 15min

These settings balance write performance, disk usage and archive frequency for most production systems. Adjust based on your transaction rate and disk capacity.

Conclusion

WAL archiving provides the finest-grained backup and recovery option for PostgreSQL. It captures every database change and enables point-in-time recovery to the exact second. This level of precision matters for high-stakes systems where data loss is measured in transactions, not hours.

But precision comes with operational cost. You need reliable archive storage, monitoring for failed archives, tested restoration procedures and staff who understand WAL mechanics. For teams running managed cloud databases, external WAL archiving doesn't even work with the provider's restoration tools.

Most projects are better served by scheduled logical backups. Tools that automate database dumps, handle compression, upload to multiple storage backends and send notifications when something fails. Hourly granularity handles the vast majority of real-world recovery scenarios without the complexity overhead.

Use WAL archiving when you truly need it. Use simpler solutions when you don't. The best backup strategy is one that actually gets maintained and tested, not the one with the most features you'll never use.

DEV Community