Grig

Posted on Jan 2

PostgreSQL backup and restore — Complete guide to backing up and restoring databases

#postgres #database

Database backups are a critical component of any production system, protecting against data loss from hardware failures, human errors and security incidents. PostgreSQL offers multiple backup approaches, each suited to different scenarios and requirements. This guide explores the complete backup and restore ecosystem for PostgreSQL, from simple logical backups to advanced point-in-time recovery strategies. Whether you're managing a small application database or a large enterprise system, understanding these backup methods will help you build a reliable data protection strategy.

Understanding PostgreSQL backup types

PostgreSQL provides two fundamental approaches to backing up your data: logical and physical backups. Logical backups create SQL statements that can recreate your database structure and data, while physical backups copy the actual data files used by PostgreSQL. Each approach offers distinct advantages and trade-offs that affect recovery time, flexibility and resource usage.

Logical backups are more portable and allow selective restoration of specific databases or tables, making them ideal for development environments and migrations. Physical backups capture the entire database cluster at the filesystem level, enabling faster restoration and supporting advanced features like point-in-time recovery. The choice between these methods depends on your database size, recovery time objectives and operational requirements.

Backup Type	Best For	Recovery Speed	Flexibility	Complexity
Logical (pg_dump)	Small to medium databases, selective restore	Slow to moderate	High (table-level restore)	Low
Physical (pg_basebackup)	Large databases, full cluster restore	Fast	Low (cluster-level only)	Moderate
Continuous archiving (WAL)	Point-in-time recovery, minimal data loss	Very fast	Very high	High

The table above compares different backup strategies to help you choose the right approach. Most production systems combine multiple methods to balance recovery speed with operational flexibility.

Using pg_dump for logical backups

The pg_dump utility is PostgreSQL's standard tool for creating logical backups of individual databases. It generates a script containing SQL commands to reconstruct your database, including schema definitions, data inserts and object dependencies. This approach works across PostgreSQL versions and can be used to migrate data between different systems.

To create a basic backup with pg_dump, use the following command:

pg_dump -U postgres -d mydbname -F c -f backup.dump

The -F c flag specifies custom format, which provides compression and allows selective restoration of specific tables. For plain SQL output that can be inspected or edited, use -F p instead. The custom format is generally recommended for production backups due to its flexibility and space efficiency.

For backing up all databases in your PostgreSQL instance, use pg_dumpall instead:

pg_dumpall -U postgres -f all_databases.sql

This command captures all databases, roles and tablespaces in a single file. However, it generates plain SQL output only, which can result in larger file sizes compared to pg_dump's custom format.

Restoring from pg_dump backups

Restoring a logical backup depends on the format used during backup creation. For custom format backups, use the pg_restore utility:

pg_restore -U postgres -d mydbname -F c backup.dump

The pg_restore command offers powerful options for selective restoration. You can restore specific tables using -t tablename or schemas using -n schemaname. This flexibility makes logical backups particularly useful when you need to recover individual objects rather than entire databases.

For plain SQL backups created with pg_dumpall or -F p format, use psql instead:

psql -U postgres -f all_databases.sql

When restoring large databases, consider using parallel restoration to speed up the process:

pg_restore -U postgres -d mydbname -F c -j 4 backup.dump

The -j 4 flag uses four parallel workers, significantly reducing restoration time for databases with many tables. This approach requires the custom format and works best on systems with multiple CPU cores.

Physical backups with pg_basebackup

Physical backups copy PostgreSQL's data directory at the filesystem level, capturing the exact state of your database cluster. The pg_basebackup utility performs this task while the database remains online, creating a consistent snapshot without requiring downtime. This method is faster than pg_dump for large databases and serves as the foundation for advanced backup strategies like continuous archiving.

To create a base backup, run the following command:

pg_basebackup -U postgres -D /backup/pgdata -F tar -z -P

The -D flag specifies the backup destination directory, -F tar creates compressed tar archives, -z enables gzip compression and -P shows progress during the backup process. The backup user needs replication privileges, which can be granted with:

ALTER USER postgres REPLICATION;

Physical backups capture the entire database cluster, including all databases, configuration files and WAL segments. This makes them ideal for disaster recovery scenarios where you need to restore everything quickly. However, you cannot selectively restore individual databases or tables from a physical backup.

Restoring from physical backups

Restoring a physical backup requires stopping PostgreSQL and replacing the data directory with your backup. First, stop the PostgreSQL service:

systemctl stop postgresql

Then remove the existing data directory and restore from backup:

rm -rf /var/lib/postgresql/data
tar -xzf /backup/pgdata/base.tar.gz -C /var/lib/postgresql/data

After extracting the backup, start PostgreSQL again:

systemctl start postgresql

This process replaces your entire database cluster, so ensure you have verified backups before performing restoration. Physical backups restore much faster than logical backups, making them preferable for large databases where downtime must be minimized.

Continuous archiving and point-in-time recovery

Continuous archiving captures Write-Ahead Log (WAL) files as PostgreSQL generates them, enabling recovery to any specific moment in time. This approach provides the lowest possible data loss in case of failure, often measured in seconds rather than hours. Combined with a base backup, WAL archiving allows you to replay database transactions up to any point before a failure occurred.

To enable WAL archiving, modify postgresql.conf with these settings:

wal_level = replica
archive_mode = on
archive_command = 'cp %p /archive/wal/%f'

The archive_command specifies how to copy WAL files to your archive location. For production systems, use more robust solutions like cloud storage or dedicated backup tools. After changing these settings, restart PostgreSQL for them to take effect.

Create a base backup to establish your recovery starting point:

pg_basebackup -U postgres -D /backup/base -F tar -z -X fetch

The -X fetch flag includes the WAL files needed to make the base backup consistent. PostgreSQL will now continuously archive WAL files as it generates them, allowing point-in-time recovery.

Performing point-in-time recovery

To recover to a specific point in time, first restore your base backup as described in the physical backup section. Then create a recovery.conf file (or recovery.signal for PostgreSQL 12+) in the data directory:

restore_command = 'cp /archive/wal/%f %p'
recovery_target_time = '2026-01-02 14:30:00'

The restore_command tells PostgreSQL where to find archived WAL files. The recovery_target_time specifies when to stop replaying transactions. You can also use recovery_target_xid to recover to a specific transaction ID or recovery_target_name to recover to a named restore point.

Start PostgreSQL, and it will enter recovery mode, replaying WAL files until reaching the target time:

systemctl start postgresql

PostgreSQL will automatically exit recovery mode and resume normal operations once recovery completes. This process can take considerable time depending on how many WAL files need replaying.

Automated backup solutions

Managing backups manually becomes challenging as your database infrastructure grows. Automated backup solutions handle scheduling, storage management and notifications, ensuring consistent backups without manual intervention. Modern tools offer features like encryption, compression and integration with cloud storage providers.

PostgreSQL backup tools like Databasus provide comprehensive automation for database backups. Databasus is a free, open source and self-hosted backup solution suitable for both individuals and enterprises. It supports flexible scheduling (hourly, daily, weekly, monthly or cron), multiple storage destinations (S3, Google Drive, local storage) and notifications through various channels (Email, Telegram, Slack, Discord).

To get started with Databasus, install it using Docker:

docker run -d \
  --name databasus \
  -p 4005:4005 \
  -v ./databasus-data:/databasus-data \
  --restart unless-stopped \
  databasus/databasus:latest

After installation, access the dashboard at http://localhost:4005 and follow these steps: add your database with connection credentials, select your preferred storage destination, configure the backup schedule and click "Create backup". Databasus uses AES-256-GCM encryption for secure backups and employs read-only database users to prevent accidental data modifications.

Backup best practices

A reliable backup strategy extends beyond simply creating backups. Testing your backups regularly ensures they're actually recoverable when needed. Many organizations discover their backups are corrupted or incomplete only during an actual disaster. Schedule regular restore tests to verify backup integrity and document your restoration procedures.

Storage location is equally critical to backup success. Storing backups on the same server as your database defeats the purpose, as hardware failure or security incidents could destroy both your data and backups simultaneously. Use multiple storage locations, including off-site or cloud storage, to protect against localized disasters. Consider following the 3-2-1 rule: maintain three copies of your data, on two different storage media, with one copy off-site.

Security considerations include:

Encrypt backups at rest and in transit to protect sensitive data
Restrict access to backup files using appropriate file permissions
Store encryption keys separately from encrypted backups
Implement backup retention policies to manage storage costs
Monitor backup jobs and set up alerts for failures
Document your backup and restore procedures

Regular maintenance ensures your backup system remains effective. Review and update backup schedules as your data volume grows. Monitor backup sizes and duration to identify performance issues before they become critical. Automate backup verification to catch problems early.

Comparing backup tools and approaches

Choosing the right backup tool depends on your specific requirements, infrastructure and team capabilities. Native PostgreSQL tools like pg_dump and pg_basebackup provide reliable functionality without additional dependencies, making them suitable for simple scenarios. However, they lack automation features and require custom scripting for scheduling and monitoring.

Third-party backup solutions offer enhanced features but vary significantly in complexity and cost. Cloud provider services like AWS RDS automated backups integrate seamlessly with managed databases but may limit your control over backup formats and retention policies. Self-hosted tools like Databasus provide a middle ground, offering automation and advanced features while maintaining full control over your backup infrastructure.

Tool	Scheduling	Multiple Storages	Encryption	UI	Cost
pg_dump	Manual/cron	No	No	No	Free
pg_basebackup	Manual/cron	No	No	No	Free
Databasus	Built-in	Yes	Yes	Yes	Free (open source)
Cloud provider tools	Built-in	Limited	Yes	Yes	Varies by provider

When evaluating backup solutions, consider your recovery time objective (RTO) and recovery point objective (RPO). RTO defines how quickly you need to restore service after a failure, while RPO specifies the maximum acceptable data loss. PITR with WAL archiving provides the lowest RPO, often seconds, but requires more complex infrastructure. Daily logical backups might suffice for development environments where some data loss is acceptable.

Conclusion

Effective PostgreSQL backup strategies combine multiple approaches to balance recovery speed, flexibility and operational complexity. Logical backups with pg_dump provide portability and selective restoration, while physical backups enable faster recovery of large databases. Continuous archiving with WAL files offers the lowest data loss for critical systems requiring point-in-time recovery.

Automating your backup process with dedicated tools reduces operational burden and ensures consistency. Whether using native PostgreSQL utilities with custom scripts or comprehensive solutions like Databasus, the key is testing your backups regularly and documenting restoration procedures. A backup is only valuable if you can successfully restore from it when disaster strikes. Invest time in building a robust backup strategy now to avoid data loss and downtime later.

DEV Community