PostgreSQL Point-in-Time Recovery with pgBackRest
pg_dump gives you a snapshot at the moment you ran it. If your last dump was 6 hours ago and someone accidentally deletes a production table, those 6 hours are gone. Even with hourly dumps, you lose everything between the last dump and the incident. For a database processing thousands of transactions per minute, that gap is devastating. Point-in-time recovery (PITR) eliminates that gap -- restoring your database to any specific second by replaying the write-ahead log on top of a base backup.
How PITR Works
Two mechanisms combine:
- Base backups -- periodic snapshots of all database files
- WAL archiving -- continuous streaming of every WAL segment to a backup repository
The WAL records every change made to the database. By replaying WAL segments from a base backup forward to a target timestamp, you reconstruct the exact state at that moment. If the last archived WAL is 30 seconds old, your maximum data loss is 30 seconds -- not 6 hours.
pgBackRest is the standard tool for this. It handles base backups (full, incremental, differential), WAL archiving, retention, verification, and recovery -- with parallel compression, encryption, and remote repository support.
Detecting Whether You're Protected
Check if WAL archiving is even enabled:
SELECT name, setting
FROM pg_settings
WHERE name IN (
'archive_mode',
'archive_command',
'archive_timeout',
'wal_level'
);
You need archive_mode = on and wal_level = replica (or logical). If archive_mode is off, PITR is impossible.
Check for archiving failures:
SELECT
archived_count,
failed_count,
last_archived_wal,
last_archived_time,
last_failed_wal,
last_failed_time,
now() - last_archived_time AS archive_lag
FROM pg_stat_archiver;
A non-zero failed_count or archive_lag greater than a few minutes means the pipeline is broken. WAL segments are accumulating on the primary and will eventually fill the disk.
Verify backup freshness:
pgbackrest info --stanza=main
pgbackrest verify --stanza=main
Setting Up pgBackRest
Install and Configure
# Debian/Ubuntu
sudo apt-get install pgbackrest
# RHEL/Rocky
sudo dnf install pgbackrest
Create /etc/pgbackrest/pgbackrest.conf:
[main]
pg1-path=/var/lib/postgresql/18/main
[global]
repo1-path=/var/lib/pgbackrest
repo1-retention-full=2
repo1-retention-diff=7
repo1-cipher-type=aes-256-cbc
repo1-cipher-pass=your-secure-encryption-passphrase
process-max=4
compress-type=zst
compress-level=6
log-level-console=info
log-level-file=detail
Configure PostgreSQL
Add to postgresql.conf:
wal_level = replica
archive_mode = on
archive_command = 'pgbackrest --stanza=main archive-push %p'
archive_timeout = 60
The archive_timeout = 60 forces a WAL switch every 60 seconds even if the segment isn't full. This caps maximum data loss at 60 seconds.
Restart PostgreSQL, then initialize the stanza:
pgbackrest --stanza=main stanza-create
pgbackrest --stanza=main check
Schedule Backups
# Full backup (weekly)
pgbackrest --stanza=main --type=full backup
# Differential (daily -- changes since last full)
pgbackrest --stanza=main --type=diff backup
# Incremental (every 6 hours -- changes since last any backup)
pgbackrest --stanza=main --type=incr backup
Cron schedule:
0 2 * * 0 pgbackrest --stanza=main --type=full backup
0 2 * * 1-6 pgbackrest --stanza=main --type=diff backup
0 */6 * * * pgbackrest --stanza=main --type=incr backup
Performing Recovery
When disaster strikes, restore to a specific timestamp:
sudo systemctl stop postgresql
pgbackrest --stanza=main --type=time \
--target="2026-02-28 14:30:00+00" \
--target-action=promote \
restore
sudo systemctl start postgresql
Set --target to just before the incident. --target-action=promote opens the database for read-write after recovery.
You can also restore to a named restore point:
-- Create before a risky operation
SELECT pg_create_restore_point('before_schema_migration');
pgbackrest --stanza=main --type=name \
--target="before_schema_migration" \
--target-action=promote \
restore
Test Recovery Regularly
This is the most critical step. Schedule monthly recovery tests to a standby server:
pgbackrest --stanza=main --type=time \
--target="2026-02-28 12:00:00+00" \
--target-action=promote \
--pg1-path=/var/lib/postgresql/18/test_recovery \
restore
Verify the restored database contains expected data at the target timestamp. If recovery fails, fix the configuration before you need it in an emergency. Record the actual recovery time -- that's your real RTO.
Prevention
Build PITR into infrastructure from day one. Every production PostgreSQL database should have WAL archiving before its first production write.
Monitor three metrics continuously:
- Archive lag -- alert if > 5 minutes
- Failed archive count -- any non-zero value requires investigation
- Backup age -- alert if exceeding your backup interval + buffer
An untested backup is not a backup. Test quarterly at minimum. Document the procedure, the expected recovery time, and who executes it. Run end-to-end: restore, replay WAL, verify data, record duration.
Store backups off-host. A backup on the same disk is destroyed by the same failure. Use S3, Azure Blob, GCS, or a separate server. Enable encryption.
Originally published at mydba.dev/blog/postgres-point-in-time-recovery
Top comments (0)