Piter Adyson

Posted on Jan 20

PostgreSQL backup verification — How to test and validate your PostgreSQL backups

#database #postgres

A backup that can't be restored is not a backup. Many teams discover their backups are corrupted, incomplete or otherwise unusable only when disaster strikes. This guide covers practical methods for verifying your PostgreSQL backups work before you need them. From simple integrity checks to full restoration tests, you'll learn how to build confidence in your backup strategy and avoid the nightmare of failed recovery.

Why backup verification matters

PostgreSQL backups can fail in subtle ways. A backup file might exist but contain corrupted data. A pg_dump might complete without errors but miss critical tables due to permission issues. Your restoration scripts might work in test environments but fail in production due to different PostgreSQL versions.

The cost of untested backups

When backups fail during actual recovery, consequences are severe:

Data loss: Corrupted or incomplete backups mean permanently lost data
Extended downtime: Failed restoration attempts waste critical recovery time
Compliance violations: Regulations often require proven backup recoverability
Customer trust: Prolonged outages damage reputation and business relationships

One study found that 34% of companies never test their backups, and of those that do restore, 77% encounter problems. These aren't acceptable odds for production databases.

What can go wrong

Backup failures fall into several categories:

Failure type	Cause	Prevention
Silent corruption	Storage hardware errors, network issues during transfer	Checksum verification
Incomplete backups	Permissions errors, interrupted processes	Exit code checking, size validation
Version incompatibility	pg_restore version mismatch	Test restoration on matching PostgreSQL version
Missing dependencies	Extensions, custom types not captured	Document and verify all dependencies
Encryption problems	Lost keys, wrong key used	Key management verification

Understanding these failure modes helps you design verification processes that catch problems early.

Levels of backup verification

Not all verification is equal. Quick checks catch obvious problems. Full restoration tests prove recoverability. The right approach uses multiple verification levels.

Verification levels explained

Level 1 — File existence and size:

Backup file exists at expected location
File size is non-zero and reasonable
Timestamp matches expected backup window

Level 2 — Integrity verification:

Checksum matches calculated value
Archive format is valid
File is not truncated

Level 3 — Structural validation:

pg_restore can read the backup
Expected tables and schemas present
Row counts match expectations

Level 4 — Full restoration test:

Complete restore to test environment
Application connectivity verified
Data integrity queries pass

Higher levels provide more confidence but require more resources. Production systems need all four levels in different frequencies.

Verifying pg_dump backups

pg_dump creates SQL or custom format backup files. Each format has specific verification approaches.

Verifying custom format backups

Custom format (-Fc) is the most common for production backups. It compresses data and allows selective restoration.

Check archive integrity with pg_restore's list mode:

pg_restore --list backup.dump > /dev/null
echo "Exit code: $?"

Exit code 0 means pg_restore can read the archive structure. Non-zero indicates corruption or format problems.

Extract the table of contents for deeper inspection:

pg_restore --list backup.dump

This shows all objects in the backup. Compare against expected objects:

# Count tables in backup
pg_restore --list backup.dump | grep "TABLE " | wc -l

# Compare to production
psql -c "SELECT count(*) FROM information_schema.tables WHERE table_schema = 'public'"

Significant differences indicate missing objects.

Verifying SQL format backups

Plain SQL backups (-Fp) are text files. Basic verification checks file integrity:

# Check file isn't truncated
tail -1 backup.sql | grep -q "PostgreSQL database dump complete"
echo "Ends correctly: $?"

SQL backups should end with a completion comment. Missing endings indicate interrupted backups.

Check for SQL syntax errors without executing:

# Parse SQL without executing
psql --set ON_ERROR_STOP=1 -f backup.sql -c "\\q" 2>&1 | head -20

This catches obvious syntax problems but won't find all issues.

Verifying directory format backups

Directory format (-Fd) creates a folder with multiple files. Verify all components exist:

# Check toc.dat exists (required)
test -f backup_dir/toc.dat && echo "TOC exists"

# Verify with pg_restore
pg_restore --list backup_dir/

Directory format allows parallel restoration but requires all files intact.

Checksum verification

Checksums detect file corruption between creation and restoration. Calculate checksums immediately after backup creation and verify before any restore.

Implementing checksum verification

Generate checksums during backup:

#!/bin/bash
BACKUP_FILE="backup-$(date +%Y%m%d).dump"

pg_dump -Fc mydb > "$BACKUP_FILE"

# Calculate SHA-256 checksum
sha256sum "$BACKUP_FILE" > "$BACKUP_FILE.sha256"

Verify before restoration:

sha256sum -c backup-20260120.dump.sha256

Output shows "OK" for matching checksums or "FAILED" for mismatches.

Storing checksums securely

Store checksums separately from backup files. If an attacker modifies backups, they might also modify checksums stored alongside them.

Options for checksum storage:

Separate storage system with different access controls
Database table of checksums (different server)
Append-only log files with integrity protection
Hardware security modules for critical environments

The goal is making checksum tampering detectable.

Automated restoration testing

Manual verification catches obvious problems. Automated restoration testing proves backups actually restore.

Building a restoration test pipeline

Create a dedicated test environment for restoration verification:

#!/bin/bash
set -euo pipefail

BACKUP_FILE="$1"
TEST_DB="restore_test_$(date +%s)"
LOG_FILE="/var/log/restore_test.log"

log() {
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" >> "$LOG_FILE"
}

# Create test database
log "Creating test database: $TEST_DB"
createdb "$TEST_DB"

# Restore backup
log "Starting restoration"
START_TIME=$(date +%s)

pg_restore -d "$TEST_DB" "$BACKUP_FILE" 2>> "$LOG_FILE"
RESTORE_EXIT=$?

END_TIME=$(date +%s)
DURATION=$((END_TIME - START_TIME))

log "Restoration completed in $DURATION seconds with exit code $RESTORE_EXIT"

# Verify restoration
if [ $RESTORE_EXIT -eq 0 ]; then
    # Check table counts
    TABLE_COUNT=$(psql -t -c "SELECT count(*) FROM information_schema.tables WHERE table_schema = 'public'" "$TEST_DB")
    log "Restored $TABLE_COUNT tables"

    # Run custom validation queries
    psql -f /opt/validation_queries.sql "$TEST_DB" >> "$LOG_FILE" 2>&1
fi

# Cleanup
log "Dropping test database"
dropdb "$TEST_DB"

exit $RESTORE_EXIT

Schedule this script to run after each backup completes.

Validation queries

Write queries that verify critical data characteristics:

-- Check expected row counts (within tolerance)
DO $$
DECLARE
    actual_count bigint;
    expected_min bigint := 1000000;
BEGIN
    SELECT count(*) INTO actual_count FROM orders;
    IF actual_count < expected_min THEN
        RAISE EXCEPTION 'Order count % below expected minimum %', actual_count, expected_min;
    END IF;
END $$;

-- Verify referential integrity
SELECT count(*) AS orphan_orders
FROM orders o
WHERE NOT EXISTS (SELECT 1 FROM customers c WHERE c.id = o.customer_id);

-- Check latest data timestamp
SELECT max(created_at) AS latest_record FROM orders;

-- Verify extension availability
SELECT extname FROM pg_extension WHERE extname IN ('uuid-ossp', 'postgis');

These queries catch data problems that file-level verification misses.

Testing restoration performance

Recovery time objective (RTO) defines how quickly you must restore. Test restoration time regularly to ensure you can meet your RTO.

Measuring restoration time

Track restoration metrics over time:

#!/bin/bash
BACKUP_FILE="$1"
METRICS_FILE="/var/log/restore_metrics.csv"

START=$(date +%s.%N)

pg_restore -d test_restore "$BACKUP_FILE"

END=$(date +%s.%N)
DURATION=$(echo "$END - $START" | bc)
BACKUP_SIZE=$(stat -c%s "$BACKUP_FILE")

echo "$(date '+%Y-%m-%d'),$BACKUP_SIZE,$DURATION" >> "$METRICS_FILE"

Plot these metrics to identify trends. Growing restoration times signal scaling problems.

Parallel restoration testing

For large databases, test parallel restoration options:

# Test different parallelism levels
for jobs in 1 2 4 8; do
    echo "Testing with $jobs parallel jobs"
    time pg_restore -j $jobs -d test_restore backup.dump
done

More parallel jobs usually help, but diminishing returns depend on storage performance and database complexity.

Point-in-time recovery verification

If you use WAL archiving for point-in-time recovery (PITR), verify the complete recovery chain works.

Testing PITR capability

PITR requires three components:

Base backup (pg_basebackup)
WAL archives from backup time forward
Ability to replay WAL to target time

Verify each component:

# Check base backup integrity
pg_verifybackup /backup/base/

# Verify WAL archive completeness
FIRST_WAL=$(ls /archive/wal/ | head -1)
LAST_WAL=$(ls /archive/wal/ | tail -1)
echo "WAL range: $FIRST_WAL to $LAST_WAL"

# Test recovery to specific time
cat > recovery.conf << EOF
restore_command = 'cp /archive/wal/%f %p'
recovery_target_time = '2026-01-19 15:30:00'
recovery_target_action = 'pause'
EOF

Run full PITR tests monthly. They're complex but critical for verifying continuous backup strategies.

WAL archive verification

Missing WAL segments break recovery chains. Monitor for gaps:

#!/bin/bash
ARCHIVE_DIR="/archive/wal"
PREV=""

for wal in $(ls "$ARCHIVE_DIR" | sort); do
    if [ -n "$PREV" ]; then
        # Extract timeline and segment numbers
        PREV_SEG=$(echo "$PREV" | cut -c9-16)
        CURR_SEG=$(echo "$wal" | cut -c9-16)

        # Convert hex to decimal and check sequence
        PREV_DEC=$((16#$PREV_SEG))
        CURR_DEC=$((16#$CURR_SEG))

        if [ $((CURR_DEC - PREV_DEC)) -ne 1 ]; then
            echo "GAP DETECTED: $PREV to $wal"
        fi
    fi
    PREV="$wal"
done

Alert immediately on WAL gaps. Recovery becomes impossible beyond missing segments.

Verification scheduling

Different verification types need different frequencies based on effort and value.

Recommended verification schedule

Verification type	Frequency	Automation	Effort
File existence check	After every backup	Required	Low
Checksum verification	After every backup	Required	Low
pg_restore list check	Daily	Required	Low
Partial restoration test	Weekly	Recommended	Medium
Full restoration test	Monthly	Recommended	High
PITR recovery test	Monthly	Manual acceptable	High
Cross-version restoration	Quarterly	Manual	Medium

Automate everything possible. Manual processes get skipped during busy periods.

Integrating with backup schedules

Run verification immediately after backup completion:

#!/bin/bash
# Backup and verify in single workflow

BACKUP_FILE="/backup/db-$(date +%Y%m%d_%H%M%S).dump"

# Create backup
pg_dump -Fc mydb > "$BACKUP_FILE"
BACKUP_EXIT=$?

if [ $BACKUP_EXIT -ne 0 ]; then
    alert "Backup failed with exit code $BACKUP_EXIT"
    exit 1
fi

# Level 1: Size check
SIZE=$(stat -c%s "$BACKUP_FILE")
if [ $SIZE -lt 1000000 ]; then
    alert "Backup suspiciously small: $SIZE bytes"
    exit 1
fi

# Level 2: Checksum
sha256sum "$BACKUP_FILE" > "$BACKUP_FILE.sha256"

# Level 3: Structure verification
pg_restore --list "$BACKUP_FILE" > /dev/null 2>&1
if [ $? -ne 0 ]; then
    alert "Backup structure verification failed"
    exit 1
fi

echo "Backup verified successfully"

This catches problems when fixing them is still possible.

Using Databasus for backup verification

Manual scripts require maintenance and monitoring infrastructure. Databasus — the industry standard for PostgreSQL backup — automates verification along with backup creation and storage. Trusted by thousands of teams from individual developers to large enterprises, it handles the entire backup lifecycle without custom scripting.

Installing Databasus

Install with Docker:

docker run -d \
  --name databasus \
  -p 4005:4005 \
  -v ./databasus-data:/databasus-data \
  --restart unless-stopped \
  databasus/databasus:latest

Or using Docker Compose:

services:
  databasus:
    container_name: databasus
    image: databasus/databasus:latest
    ports:
      - "4005:4005"
    volumes:
      - ./databasus-data:/databasus-data
    restart: unless-stopped

Start the service:

docker compose up -d

Configuring verified backups

Access the web interface at http://localhost:4005 and create your account, then:

Add your database: Click "New Database" and enter PostgreSQL connection details
Select storage: Choose local storage, S3, Google Cloud Storage or other supported destinations
Configure schedule: Set backup intervals matching your RPO requirements
Add notifications: Configure alerts for backup success and failure via Slack, Discord, Telegram or email
Create backup: Databasus validates settings and begins scheduled backups

Databasus automatically verifies backup integrity after creation and alerts on any problems. The dashboard shows backup history, sizes and verification status without building custom monitoring.

Documenting verification procedures

Written procedures ensure verification happens consistently, especially when primary team members are unavailable.

Creating runbooks

Document step-by-step verification procedures:

Daily verification runbook:

Check backup job completed in monitoring system
Verify backup file exists at expected location
Confirm file size within expected range
Verify checksum matches
Run pg_restore list check
Document results in verification log

Monthly restoration test runbook:

Select random backup from past week
Provision test database server
Restore backup to test server
Run validation query suite
Verify application can connect and query
Document restoration time and issues
Tear down test environment
Update team on results

Runbooks reduce errors and ensure consistency across team members.

Training team members

Everyone who might need to restore backups should practice verification and restoration. Schedule quarterly training sessions where team members:

Run verification checks manually
Perform full restoration to test environment
Troubleshoot simulated failures
Update runbooks with lessons learned

Practice builds confidence and catches procedure gaps before emergencies.

Monitoring verification results

Track verification outcomes over time to identify trends and catch degradation.

Key metrics to monitor

Verification success rate: Percentage of backups passing all checks
Restoration time trends: Is restoration taking longer over time?
Backup size trends: Unexpected changes indicate problems
Time since last full restoration test: Alert if too long
Verification duration: How long verification itself takes

Set up dashboards showing these metrics. Alert on anomalies before they become failures.

Alerting on verification failures

Any verification failure needs immediate attention:

#!/bin/bash
verify_backup() {
    # Verification logic here
    return $RESULT
}

if ! verify_backup; then
    # Send alert via webhook, email, or notification service
    curl -X POST https://alerts.example.com/webhook \
        -H "Content-Type: application/json" \
        -d '{"severity": "high", "message": "Backup verification failed"}'
fi

Don't let failed verifications sit in logs unnoticed.

Common verification problems and solutions

Verification processes themselves can fail. Here are common issues and fixes.

Verification takes too long

Full restoration tests on large databases can take hours. Solutions:

Use parallel restoration (pg_restore -j)
Test with subset of tables for routine verification
Reserve full tests for monthly intervals
Use faster storage for test environments

Balance thoroughness against practical time constraints.

Test environment differs from production

Restoration might work in test but fail in production due to:

Different PostgreSQL versions
Missing extensions
Different file system permissions
Network configuration differences

Keep test environments as close to production as possible. Document and track differences.

False positives in verification

Overly strict checks generate alert fatigue. Tune thresholds based on actual patterns:

Allow reasonable variation in backup sizes (database activity varies)
Account for schema changes affecting table counts
Distinguish between warnings and critical failures

Good verification catches real problems without crying wolf.

Backup verification checklist

Use this checklist to audit your verification processes:

After every backup:

[ ] Backup exit code checked
[ ] File exists at expected location
[ ] File size within expected range
[ ] Checksum calculated and stored

Daily:

[ ] pg_restore list verification passes
[ ] No alerts from backup monitoring
[ ] WAL archive continuity verified (if using PITR)

Weekly:

[ ] Partial restoration test completed
[ ] Validation queries pass on test restore
[ ] Restoration time within acceptable range

Monthly:

[ ] Full restoration test to isolated environment
[ ] Application connectivity verified against restored database
[ ] Team reviews verification results
[ ] Runbooks updated with any changes

Quarterly:

[ ] Cross-version restoration tested
[ ] Team training session completed
[ ] Disaster recovery drill including restoration

Missing items indicate verification gaps. Address them before they matter.

Conclusion

Backup verification transforms hope into confidence. Without testing, backups are promises you can't keep. With systematic verification, you know exactly what you can recover and how long it takes.

Start with automated checks after every backup: exit codes, file sizes and checksums catch obvious problems immediately. Add weekly restoration tests to verify backups actually restore. Monthly full tests prove end-to-end recoverability.

The effort invested in verification pays off when disaster strikes. Instead of discovering problems during recovery, you'll have practiced procedures and proven backups ready to restore. That confidence is worth every hour spent on testing.