DEV Community

Yasser B.
Yasser B.

Posted on • Originally published at rivestack.io

PostgreSQL Backup and Recovery: A Complete Guide

Your database has never failed. That's not because you're careful. It's because you haven't been running long enough.

Hardware dies. Disk controllers corrupt data silently for weeks before anyone notices. An engineer runs a DELETE without a WHERE clause on production. A cloud provider has a multi-zone outage. The only thing standing between you and catastrophic data loss is a backup you can actually restore from.

This guide covers how PostgreSQL backup actually works: the difference between logical and physical backups, how point-in-time recovery lets you undo bad queries, and what a production-grade backup strategy looks like. We will go through the tools, the tradeoffs, and the one mistake most teams make that turns a "we have backups" story into a "we lost two days of data" incident.

Logical vs Physical Backups

Before touching any tools, you need to understand the two categories of PostgreSQL backup:

Logical backups capture the data as SQL. The output is a file of CREATE TABLE, INSERT, and COPY statements that recreate your schema and data from scratch.

Physical backups copy the raw files that PostgreSQL uses to store data on disk, including the base directory and Write-Ahead Log (WAL) files.

Each has a different purpose:

Logical (pg_dump) Physical (pg_basebackup)
Portability High: restore to different Postgres versions Low: must match major version
Granularity Table, schema, database Entire cluster
Restore time Slow for large databases Fast: just copy files back
Point-in-time recovery No Yes (with WAL archiving)
Consistency Always consistent Consistent (with pg_start_backup)

For small databases or specific table exports, logical backups with pg_dump are perfectly reasonable. For production databases over a few gigabytes, or whenever you need point-in-time recovery, physical backups are the right foundation.

pg_dump: Logical Backups for Specific Needs

pg_dump is a utility that comes bundled with PostgreSQL. It connects to a running database and produces a self-consistent snapshot you can restore elsewhere.

Basic usage

# Dump a database to a plain SQL file
pg_dump -h localhost -U postgres mydb > mydb.sql

# Dump in custom format (recommended: faster restore, selective restore)
pg_dump -h localhost -U postgres -Fc mydb > mydb.dump

# Dump to a directory with parallel workers
pg_dump -h localhost -U postgres -Fd -j 4 -f mydb_dir/ mydb
Enter fullscreen mode Exit fullscreen mode

The custom format (-Fc) is almost always better than plain SQL. It compresses the output, supports parallel restore, and lets you restore individual tables or schemas selectively.

Restoring a pg_dump backup

# Restore a plain SQL dump
psql -h localhost -U postgres newdb < mydb.sql

# Restore a custom format dump
pg_restore -h localhost -U postgres -d newdb mydb.dump

# Restore with parallel workers (much faster for large databases)
pg_restore -h localhost -U postgres -j 4 -d newdb mydb.dump
Enter fullscreen mode Exit fullscreen mode

Limitations of pg_dump

pg_dump works by running a transaction against your running database. For large databases, this can take hours, and the entire dump runs in a single transaction to ensure consistency. If you need to restore 500 GB with pg_restore, you are looking at several hours of downtime.

More importantly, pg_dump gives you a snapshot at a point in time. If you run it at midnight and your application corrupts data at 10 PM the following day, you lose 22 hours of data. That might be acceptable for some workloads. For most production databases, it is not.

pg_basebackup: Physical Backups of the Entire Cluster

pg_basebackup creates a physical copy of the PostgreSQL data directory. Combined with WAL archiving, it forms the foundation of production backup strategies.

What pg_basebackup does

PostgreSQL writes every change to the Write-Ahead Log before applying it to the actual data files. This means a physical backup taken at any moment is internally consistent if you also keep the WAL generated during the backup. pg_basebackup handles the coordination automatically.

pg_basebackup \
  -h localhost \
  -U replicator \
  -D /backups/base \
  -Ft \
  -z \
  -P \
  --wal-method=stream
Enter fullscreen mode Exit fullscreen mode

The replicator user needs REPLICATION privilege:

CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'yourpassword';
Enter fullscreen mode Exit fullscreen mode

Restoring from pg_basebackup

sudo systemctl stop postgresql
sudo rm -rf /var/lib/postgresql/16/main/*
sudo tar -xzf /backups/base/base.tar.gz -C /var/lib/postgresql/16/main/
sudo systemctl start postgresql
Enter fullscreen mode Exit fullscreen mode

This gives you the database as it was at the time of the backup. To recover to a point after the backup, you need WAL archiving.

WAL Archiving and Point-In-Time Recovery

Point-in-time recovery (PITR) lets you restore a database to any moment between your last base backup and now. It works by replaying WAL files on top of a base backup.

Enabling WAL archiving

In postgresql.conf:

wal_level = replica
archive_mode = on
archive_command = 'cp %p /mnt/wal-archive/%f'
Enter fullscreen mode Exit fullscreen mode

In production, ship to object storage instead:

archive_command = 'aws s3 cp %p s3://my-wal-bucket/wal/%f'
Enter fullscreen mode Exit fullscreen mode

Performing a point-in-time recovery

# PostgreSQL 12+: create a recovery signal file
touch /var/lib/postgresql/16/main/recovery.signal

# Add to postgresql.conf
restore_command = 'cp /mnt/wal-archive/%f %p'
recovery_target_time = '2026-04-18 14:30:00'
recovery_target_action = 'promote'
Enter fullscreen mode Exit fullscreen mode

You can also recover to a named restore point you create before migrations:

SELECT pg_create_restore_point('before_migration');
Enter fullscreen mode Exit fullscreen mode

pgBackRest: Production-Grade Backup Management

pgBackRest handles everything from incremental backups to parallel compression and verification.

Minimal pgBackRest configuration

/etc/pgbackrest/pgbackrest.conf:

[global]
repo1-path=/var/lib/pgbackrest
repo1-retention-full=2
repo1-retention-diff=7
process-max=4

[main]
pg1-path=/var/lib/postgresql/16/main
Enter fullscreen mode Exit fullscreen mode

In postgresql.conf:

archive_command = 'pgbackrest --stanza=main archive-push %p'
archive_mode = on
wal_level = replica
Enter fullscreen mode Exit fullscreen mode

Running and restoring backups

# Full backup
sudo -u postgres pgbackrest --stanza=main --type=full backup

# Restore to a specific point in time
sudo -u postgres pgbackrest --stanza=main restore \
  --type=time \
  --target="2026-04-18 14:30:00"
Enter fullscreen mode Exit fullscreen mode

Backup Verification: The Step Most Teams Skip

An untested backup is not a backup. It is an assumption.

WAL segments can be silently corrupted. Archive commands can silently succeed (exit 0) while producing empty files. pg_dump output can be truncated. You will not know until you try to restore.

Monitor archive health continuously:

SELECT
  archived_count,
  last_archived_wal,
  last_archived_time,
  failed_count,
  last_failed_wal,
  last_failed_time
FROM pg_stat_archiver;
Enter fullscreen mode Exit fullscreen mode

If failed_count is climbing and last_archived_time is stale, your WAL archive is silently broken.

For mission-critical databases, run automated restore drills: take a backup, restore to an isolated instance, compare row counts against production. This sounds like overhead, and it is. It is also the thing that saves teams from finding out their backups were corrupted during an actual disaster.

Backup Retention and Recovery Time

A common production policy:

  • Daily full backups kept for 7 days
  • Continuous WAL archiving for PITR within the retention window
  • Weekly backups kept for 4 weeks
  • Monthly snapshots kept for 12 months

For recovery time: pg_restore with a single worker does roughly 1 GB per minute. A 100 GB database is 100 minutes. With parallel restore (-j 4), that drops to 30 to 40 minutes. A physical restore from pgBackRest over a 1 Gbps connection takes under 15 minutes for the base backup, plus WAL replay time.

Know your restore time before the incident, not during it.

A Practical Backup Checklist

  • [ ] Base backups running on a schedule (at least daily)
  • [ ] WAL archiving enabled and destination confirmed
  • [ ] Backup verification job running at least weekly
  • [ ] Restore procedure documented and tested
  • [ ] Retention policy defined and enforced
  • [ ] Monitoring for archive failures via pg_stat_archiver
  • [ ] Offsite storage (not on the same machine or availability zone as the primary)
  • [ ] Recovery time measured on actual restore drill

Summary

PostgreSQL gives you the primitives for a solid backup strategy. The gap between "we have backups" and "we can actually recover" is almost always process: untested restores, silent archive failures, and retention policies set once and never revisited.

If you want backups that work without the operational overhead, try Rivestack. Continuous backup, point-in-time recovery, and restore in a few clicks.

For more production PostgreSQL reading, see our guides on connection pooling with PgBouncer and PostgreSQL indexing strategies.


Originally published at rivestack.io

Top comments (0)