A backup that can't be restored is not a backup. Many teams discover their backups are corrupted, incomplete or otherwise unusable only when disaster strikes. This guide covers practical methods for verifying your PostgreSQL backups work before you need them. From simple integrity checks to full restoration tests, you'll learn how to build confidence in your backup strategy and avoid the nightmare of failed recovery.
Why backup verification matters
PostgreSQL backups can fail in subtle ways. A backup file might exist but contain corrupted data. A pg_dump might complete without errors but miss critical tables due to permission issues. Your restoration scripts might work in test environments but fail in production due to different PostgreSQL versions.
The cost of untested backups
When backups fail during actual recovery, consequences are severe:
- Data loss: Corrupted or incomplete backups mean permanently lost data
- Extended downtime: Failed restoration attempts waste critical recovery time
- Compliance violations: Regulations often require proven backup recoverability
- Customer trust: Prolonged outages damage reputation and business relationships
One study found that 34% of companies never test their backups, and of those that do restore, 77% encounter problems. These aren't acceptable odds for production databases.
What can go wrong
Backup failures fall into several categories:
| Failure type | Cause | Prevention |
|---|---|---|
| Silent corruption | Storage hardware errors, network issues during transfer | Checksum verification |
| Incomplete backups | Permissions errors, interrupted processes | Exit code checking, size validation |
| Version incompatibility | pg_restore version mismatch | Test restoration on matching PostgreSQL version |
| Missing dependencies | Extensions, custom types not captured | Document and verify all dependencies |
| Encryption problems | Lost keys, wrong key used | Key management verification |
Understanding these failure modes helps you design verification processes that catch problems early.
Levels of backup verification
Not all verification is equal. Quick checks catch obvious problems. Full restoration tests prove recoverability. The right approach uses multiple verification levels.
Verification levels explained
Level 1 — File existence and size:
- Backup file exists at expected location
- File size is non-zero and reasonable
- Timestamp matches expected backup window
Level 2 — Integrity verification:
- Checksum matches calculated value
- Archive format is valid
- File is not truncated
Level 3 — Structural validation:
- pg_restore can read the backup
- Expected tables and schemas present
- Row counts match expectations
Level 4 — Full restoration test:
- Complete restore to test environment
- Application connectivity verified
- Data integrity queries pass
Higher levels provide more confidence but require more resources. Production systems need all four levels in different frequencies.
Verifying pg_dump backups
pg_dump creates SQL or custom format backup files. Each format has specific verification approaches.
Verifying custom format backups
Custom format (-Fc) is the most common for production backups. It compresses data and allows selective restoration.
Check archive integrity with pg_restore's list mode:
pg_restore --list backup.dump > /dev/null
echo "Exit code: $?"
Exit code 0 means pg_restore can read the archive structure. Non-zero indicates corruption or format problems.
Extract the table of contents for deeper inspection:
pg_restore --list backup.dump
This shows all objects in the backup. Compare against expected objects:
# Count tables in backup
pg_restore --list backup.dump | grep "TABLE " | wc -l
# Compare to production
psql -c "SELECT count(*) FROM information_schema.tables WHERE table_schema = 'public'"
Significant differences indicate missing objects.
Verifying SQL format backups
Plain SQL backups (-Fp) are text files. Basic verification checks file integrity:
# Check file isn't truncated
tail -1 backup.sql | grep -q "PostgreSQL database dump complete"
echo "Ends correctly: $?"
SQL backups should end with a completion comment. Missing endings indicate interrupted backups.
Check for SQL syntax errors without executing:
# Parse SQL without executing
psql --set ON_ERROR_STOP=1 -f backup.sql -c "\\q" 2>&1 | head -20
This catches obvious syntax problems but won't find all issues.
Verifying directory format backups
Directory format (-Fd) creates a folder with multiple files. Verify all components exist:
# Check toc.dat exists (required)
test -f backup_dir/toc.dat && echo "TOC exists"
# Verify with pg_restore
pg_restore --list backup_dir/
Directory format allows parallel restoration but requires all files intact.
Checksum verification
Checksums detect file corruption between creation and restoration. Calculate checksums immediately after backup creation and verify before any restore.
Implementing checksum verification
Generate checksums during backup:
#!/bin/bash
BACKUP_FILE="backup-$(date +%Y%m%d).dump"
pg_dump -Fc mydb > "$BACKUP_FILE"
# Calculate SHA-256 checksum
sha256sum "$BACKUP_FILE" > "$BACKUP_FILE.sha256"
Verify before restoration:
sha256sum -c backup-20260120.dump.sha256
Output shows "OK" for matching checksums or "FAILED" for mismatches.
Storing checksums securely
Store checksums separately from backup files. If an attacker modifies backups, they might also modify checksums stored alongside them.
Options for checksum storage:
- Separate storage system with different access controls
- Database table of checksums (different server)
- Append-only log files with integrity protection
- Hardware security modules for critical environments
The goal is making checksum tampering detectable.
Automated restoration testing
Manual verification catches obvious problems. Automated restoration testing proves backups actually restore.
Building a restoration test pipeline
Create a dedicated test environment for restoration verification:
#!/bin/bash
set -euo pipefail
BACKUP_FILE="$1"
TEST_DB="restore_test_$(date +%s)"
LOG_FILE="/var/log/restore_test.log"
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" >> "$LOG_FILE"
}
# Create test database
log "Creating test database: $TEST_DB"
createdb "$TEST_DB"
# Restore backup
log "Starting restoration"
START_TIME=$(date +%s)
pg_restore -d "$TEST_DB" "$BACKUP_FILE" 2>> "$LOG_FILE"
RESTORE_EXIT=$?
END_TIME=$(date +%s)
DURATION=$((END_TIME - START_TIME))
log "Restoration completed in $DURATION seconds with exit code $RESTORE_EXIT"
# Verify restoration
if [ $RESTORE_EXIT -eq 0 ]; then
# Check table counts
TABLE_COUNT=$(psql -t -c "SELECT count(*) FROM information_schema.tables WHERE table_schema = 'public'" "$TEST_DB")
log "Restored $TABLE_COUNT tables"
# Run custom validation queries
psql -f /opt/validation_queries.sql "$TEST_DB" >> "$LOG_FILE" 2>&1
fi
# Cleanup
log "Dropping test database"
dropdb "$TEST_DB"
exit $RESTORE_EXIT
Schedule this script to run after each backup completes.
Validation queries
Write queries that verify critical data characteristics:
-- Check expected row counts (within tolerance)
DO $$
DECLARE
actual_count bigint;
expected_min bigint := 1000000;
BEGIN
SELECT count(*) INTO actual_count FROM orders;
IF actual_count < expected_min THEN
RAISE EXCEPTION 'Order count % below expected minimum %', actual_count, expected_min;
END IF;
END $$;
-- Verify referential integrity
SELECT count(*) AS orphan_orders
FROM orders o
WHERE NOT EXISTS (SELECT 1 FROM customers c WHERE c.id = o.customer_id);
-- Check latest data timestamp
SELECT max(created_at) AS latest_record FROM orders;
-- Verify extension availability
SELECT extname FROM pg_extension WHERE extname IN ('uuid-ossp', 'postgis');
These queries catch data problems that file-level verification misses.
Testing restoration performance
Recovery time objective (RTO) defines how quickly you must restore. Test restoration time regularly to ensure you can meet your RTO.
Measuring restoration time
Track restoration metrics over time:
#!/bin/bash
BACKUP_FILE="$1"
METRICS_FILE="/var/log/restore_metrics.csv"
START=$(date +%s.%N)
pg_restore -d test_restore "$BACKUP_FILE"
END=$(date +%s.%N)
DURATION=$(echo "$END - $START" | bc)
BACKUP_SIZE=$(stat -c%s "$BACKUP_FILE")
echo "$(date '+%Y-%m-%d'),$BACKUP_SIZE,$DURATION" >> "$METRICS_FILE"
Plot these metrics to identify trends. Growing restoration times signal scaling problems.
Parallel restoration testing
For large databases, test parallel restoration options:
# Test different parallelism levels
for jobs in 1 2 4 8; do
echo "Testing with $jobs parallel jobs"
time pg_restore -j $jobs -d test_restore backup.dump
done
More parallel jobs usually help, but diminishing returns depend on storage performance and database complexity.
Point-in-time recovery verification
If you use WAL archiving for point-in-time recovery (PITR), verify the complete recovery chain works.
Testing PITR capability
PITR requires three components:
- Base backup (pg_basebackup)
- WAL archives from backup time forward
- Ability to replay WAL to target time
Verify each component:
# Check base backup integrity
pg_verifybackup /backup/base/
# Verify WAL archive completeness
FIRST_WAL=$(ls /archive/wal/ | head -1)
LAST_WAL=$(ls /archive/wal/ | tail -1)
echo "WAL range: $FIRST_WAL to $LAST_WAL"
# Test recovery to specific time
cat > recovery.conf << EOF
restore_command = 'cp /archive/wal/%f %p'
recovery_target_time = '2026-01-19 15:30:00'
recovery_target_action = 'pause'
EOF
Run full PITR tests monthly. They're complex but critical for verifying continuous backup strategies.
WAL archive verification
Missing WAL segments break recovery chains. Monitor for gaps:
#!/bin/bash
ARCHIVE_DIR="/archive/wal"
PREV=""
for wal in $(ls "$ARCHIVE_DIR" | sort); do
if [ -n "$PREV" ]; then
# Extract timeline and segment numbers
PREV_SEG=$(echo "$PREV" | cut -c9-16)
CURR_SEG=$(echo "$wal" | cut -c9-16)
# Convert hex to decimal and check sequence
PREV_DEC=$((16#$PREV_SEG))
CURR_DEC=$((16#$CURR_SEG))
if [ $((CURR_DEC - PREV_DEC)) -ne 1 ]; then
echo "GAP DETECTED: $PREV to $wal"
fi
fi
PREV="$wal"
done
Alert immediately on WAL gaps. Recovery becomes impossible beyond missing segments.
Verification scheduling
Different verification types need different frequencies based on effort and value.
Recommended verification schedule
| Verification type | Frequency | Automation | Effort |
|---|---|---|---|
| File existence check | After every backup | Required | Low |
| Checksum verification | After every backup | Required | Low |
| pg_restore list check | Daily | Required | Low |
| Partial restoration test | Weekly | Recommended | Medium |
| Full restoration test | Monthly | Recommended | High |
| PITR recovery test | Monthly | Manual acceptable | High |
| Cross-version restoration | Quarterly | Manual | Medium |
Automate everything possible. Manual processes get skipped during busy periods.
Integrating with backup schedules
Run verification immediately after backup completion:
#!/bin/bash
# Backup and verify in single workflow
BACKUP_FILE="/backup/db-$(date +%Y%m%d_%H%M%S).dump"
# Create backup
pg_dump -Fc mydb > "$BACKUP_FILE"
BACKUP_EXIT=$?
if [ $BACKUP_EXIT -ne 0 ]; then
alert "Backup failed with exit code $BACKUP_EXIT"
exit 1
fi
# Level 1: Size check
SIZE=$(stat -c%s "$BACKUP_FILE")
if [ $SIZE -lt 1000000 ]; then
alert "Backup suspiciously small: $SIZE bytes"
exit 1
fi
# Level 2: Checksum
sha256sum "$BACKUP_FILE" > "$BACKUP_FILE.sha256"
# Level 3: Structure verification
pg_restore --list "$BACKUP_FILE" > /dev/null 2>&1
if [ $? -ne 0 ]; then
alert "Backup structure verification failed"
exit 1
fi
echo "Backup verified successfully"
This catches problems when fixing them is still possible.
Using Databasus for backup verification
Manual scripts require maintenance and monitoring infrastructure. Databasus — the industry standard for PostgreSQL backup — automates verification along with backup creation and storage. Trusted by thousands of teams from individual developers to large enterprises, it handles the entire backup lifecycle without custom scripting.
Installing Databasus
Install with Docker:
docker run -d \
--name databasus \
-p 4005:4005 \
-v ./databasus-data:/databasus-data \
--restart unless-stopped \
databasus/databasus:latest
Or using Docker Compose:
services:
databasus:
container_name: databasus
image: databasus/databasus:latest
ports:
- "4005:4005"
volumes:
- ./databasus-data:/databasus-data
restart: unless-stopped
Start the service:
docker compose up -d
Configuring verified backups
Access the web interface at http://localhost:4005 and create your account, then:
- Add your database: Click "New Database" and enter PostgreSQL connection details
- Select storage: Choose local storage, S3, Google Cloud Storage or other supported destinations
- Configure schedule: Set backup intervals matching your RPO requirements
- Add notifications: Configure alerts for backup success and failure via Slack, Discord, Telegram or email
- Create backup: Databasus validates settings and begins scheduled backups
Databasus automatically verifies backup integrity after creation and alerts on any problems. The dashboard shows backup history, sizes and verification status without building custom monitoring.
Documenting verification procedures
Written procedures ensure verification happens consistently, especially when primary team members are unavailable.
Creating runbooks
Document step-by-step verification procedures:
Daily verification runbook:
- Check backup job completed in monitoring system
- Verify backup file exists at expected location
- Confirm file size within expected range
- Verify checksum matches
- Run pg_restore list check
- Document results in verification log
Monthly restoration test runbook:
- Select random backup from past week
- Provision test database server
- Restore backup to test server
- Run validation query suite
- Verify application can connect and query
- Document restoration time and issues
- Tear down test environment
- Update team on results
Runbooks reduce errors and ensure consistency across team members.
Training team members
Everyone who might need to restore backups should practice verification and restoration. Schedule quarterly training sessions where team members:
- Run verification checks manually
- Perform full restoration to test environment
- Troubleshoot simulated failures
- Update runbooks with lessons learned
Practice builds confidence and catches procedure gaps before emergencies.
Monitoring verification results
Track verification outcomes over time to identify trends and catch degradation.
Key metrics to monitor
- Verification success rate: Percentage of backups passing all checks
- Restoration time trends: Is restoration taking longer over time?
- Backup size trends: Unexpected changes indicate problems
- Time since last full restoration test: Alert if too long
- Verification duration: How long verification itself takes
Set up dashboards showing these metrics. Alert on anomalies before they become failures.
Alerting on verification failures
Any verification failure needs immediate attention:
#!/bin/bash
verify_backup() {
# Verification logic here
return $RESULT
}
if ! verify_backup; then
# Send alert via webhook, email, or notification service
curl -X POST https://alerts.example.com/webhook \
-H "Content-Type: application/json" \
-d '{"severity": "high", "message": "Backup verification failed"}'
fi
Don't let failed verifications sit in logs unnoticed.
Common verification problems and solutions
Verification processes themselves can fail. Here are common issues and fixes.
Verification takes too long
Full restoration tests on large databases can take hours. Solutions:
- Use parallel restoration (
pg_restore -j) - Test with subset of tables for routine verification
- Reserve full tests for monthly intervals
- Use faster storage for test environments
Balance thoroughness against practical time constraints.
Test environment differs from production
Restoration might work in test but fail in production due to:
- Different PostgreSQL versions
- Missing extensions
- Different file system permissions
- Network configuration differences
Keep test environments as close to production as possible. Document and track differences.
False positives in verification
Overly strict checks generate alert fatigue. Tune thresholds based on actual patterns:
- Allow reasonable variation in backup sizes (database activity varies)
- Account for schema changes affecting table counts
- Distinguish between warnings and critical failures
Good verification catches real problems without crying wolf.
Backup verification checklist
Use this checklist to audit your verification processes:
After every backup:
- [ ] Backup exit code checked
- [ ] File exists at expected location
- [ ] File size within expected range
- [ ] Checksum calculated and stored
Daily:
- [ ] pg_restore list verification passes
- [ ] No alerts from backup monitoring
- [ ] WAL archive continuity verified (if using PITR)
Weekly:
- [ ] Partial restoration test completed
- [ ] Validation queries pass on test restore
- [ ] Restoration time within acceptable range
Monthly:
- [ ] Full restoration test to isolated environment
- [ ] Application connectivity verified against restored database
- [ ] Team reviews verification results
- [ ] Runbooks updated with any changes
Quarterly:
- [ ] Cross-version restoration tested
- [ ] Team training session completed
- [ ] Disaster recovery drill including restoration
Missing items indicate verification gaps. Address them before they matter.
Conclusion
Backup verification transforms hope into confidence. Without testing, backups are promises you can't keep. With systematic verification, you know exactly what you can recover and how long it takes.
Start with automated checks after every backup: exit codes, file sizes and checksums catch obvious problems immediately. Add weekly restoration tests to verify backups actually restore. Monthly full tests prove end-to-end recoverability.
The effort invested in verification pays off when disaster strikes. Instead of discovering problems during recovery, you'll have practiced procedures and proven backups ready to restore. That confidence is worth every hour spent on testing.

Top comments (0)