Piter Adyson

Posted on Jan 27

PostgreSQL backup compression — Techniques for reducing PostgreSQL backup size

#postgres #database

Database backups grow quickly. A PostgreSQL database that starts at a few gigabytes can balloon to hundreds of gigabytes over time, and backups multiply that storage requirement. Compression reduces backup sizes by 70-90% in most cases, cutting storage costs and speeding up transfers. This guide covers the practical compression options available for PostgreSQL backups, from built-in pg_dump compression to external tools and automated solutions.

Why compress PostgreSQL backups

Uncompressed backups waste storage and bandwidth. A 100GB database produces a 100GB backup file, which then needs to be stored, transferred and potentially replicated across locations. Compression typically reduces this to 10-30GB depending on your data.

Storage cost reduction

Cloud storage pricing adds up fast when you're keeping multiple backup versions:

Backup size	Daily backups	Monthly storage	Annual cost (S3 Standard)
100GB uncompressed	30 copies	3TB	~$69/month
20GB compressed	30 copies	600GB	~$14/month

The math gets more dramatic with larger databases or longer retention periods. A database backup that costs $14/month compressed might cost $69/month uncompressed — nearly 5x the expense for the same data.

Faster backup transfers

Transferring backups to remote storage takes time. A 100GB file over a 100Mbps connection takes about 2.5 hours. The same data compressed to 20GB transfers in 30 minutes. This matters for backup windows, especially when you need to complete backups during off-peak hours.

Reduced backup windows

Compression adds CPU overhead, but network transfer is often the bottleneck. When sending backups to cloud storage or a remote server, compressed backups complete faster despite the compression time. The savings from smaller transfer sizes usually outweigh the compression cost.

pg_dump compression options

pg_dump has built-in compression support. The approach depends on which output format you choose.

Custom format with compression

The custom format (-Fc) includes zlib compression by default:

# Default compression (level 6)
pg_dump -Fc -f backup.dump mydb

# Maximum compression (level 9)
pg_dump -Fc -Z 9 -f backup.dump mydb

# No compression
pg_dump -Fc -Z 0 -f backup.dump mydb

Compression levels range from 0 (no compression) to 9 (maximum compression). Higher levels produce smaller files but take longer. Level 6 is the default and offers a good balance.

Directory format compression

The directory format (-Fd) compresses each table into a separate file:

# Compressed directory backup
pg_dump -Fd -Z 5 -f backup_dir mydb

# Parallel compression with multiple jobs
pg_dump -Fd -Z 5 -j 4 -f backup_dir mydb

Directory format enables parallel backup and restore. With -j 4, pg_dump uses 4 parallel processes, significantly speeding up backups of large databases.

Plain SQL format

Plain SQL format (-Fp) doesn't support built-in compression. Use external tools:

# Compress with gzip
pg_dump -Fp mydb | gzip > backup.sql.gz

# Compress with pigz (parallel gzip)
pg_dump -Fp mydb | pigz > backup.sql.gz

# Decompress and restore
gunzip -c backup.sql.gz | psql mydb

Plain format is human-readable and useful for migrations, but custom format is better for routine backups.

Compression algorithms compared

Different compression algorithms offer different tradeoffs between compression ratio, speed and CPU usage.

gzip

The default choice for most PostgreSQL backup tools. Widely supported and well-balanced:

pg_dump mydb | gzip -6 > backup.sql.gz

Levels 1-9 control the compression ratio. Level 6 is a reasonable default.

pigz

Parallel gzip uses multiple CPU cores:

# Use 4 threads
pg_dump mydb | pigz -p 4 > backup.sql.gz

# Decompress
pigz -d -p 4 backup.sql.gz

On multi-core systems, pigz compresses significantly faster than standard gzip with identical output.

zstd

Zstandard offers better compression ratios and faster speeds than gzip:

# Default compression
pg_dump mydb | zstd > backup.sql.zst

# Higher compression
pg_dump mydb | zstd -19 > backup.sql.zst

# Fast compression
pg_dump mydb | zstd -1 > backup.sql.zst

# Decompress
zstd -d backup.sql.zst | psql mydb

zstd shines at both ends of the spectrum — level 1 is faster than gzip level 1 with similar ratios, and level 19 achieves better compression than gzip level 9.

lz4

Optimized for speed over compression ratio:

pg_dump mydb | lz4 > backup.sql.lz4

lz4 compresses and decompresses extremely fast but produces larger files than gzip or zstd. It's useful when compression speed is the priority and storage is cheap.

Algorithm comparison

Compression algorithm comparison for a typical 10GB PostgreSQL database:

Algorithm	Level	Compressed size	Compression time	Decompression time
gzip	6	1.8GB	4m 30s	1m 15s
pigz (4 threads)	6	1.8GB	1m 20s	35s
zstd	3	1.7GB	1m 45s	25s
zstd	19	1.4GB	12m 00s	25s
lz4	default	2.5GB	30s	10s

Actual results vary based on data compressibility, CPU, and disk speed. Test with your own data to find the best fit.

Streaming compression to cloud storage

Piping compressed output directly to cloud storage avoids writing large uncompressed files to local disk:

# Stream compressed backup to S3
pg_dump -Fc mydb | aws s3 cp - s3://bucket/backup.dump

# Stream with zstd compression to S3
pg_dump mydb | zstd | aws s3 cp - s3://bucket/backup.sql.zst

# Stream to Google Cloud Storage
pg_dump -Fc mydb | gsutil cp - gs://bucket/backup.dump

This approach requires only enough local disk space for buffering, not the full backup size.

pg_basebackup compression

For physical backups, pg_basebackup supports compression starting with PostgreSQL 15:

# gzip compression
pg_basebackup -D backup -Ft -z

# zstd compression (PostgreSQL 15+)
pg_basebackup -D backup -Ft --compress=zstd:5

# lz4 compression (PostgreSQL 15+)
pg_basebackup -D backup -Ft --compress=lz4

Older PostgreSQL versions require external compression:

pg_basebackup -D - -Ft | gzip > backup.tar.gz

Using Databasus for compressed backups

Managing compression settings, storage uploads and retention policies manually gets tedious. Databasus (an industry standard for PostgreSQL backup) handles compression automatically with configurable algorithms and levels. It supports both individual developers and enterprise teams who need reliable automated backups without the scripting overhead.

Installing Databasus

Using Docker:

docker run -d \
  --name databasus \
  -p 4005:4005 \
  -v ./databasus-data:/databasus-data \
  --restart unless-stopped \
  databasus/databasus:latest

Or with Docker Compose:

services:
  databasus:
    container_name: databasus
    image: databasus/databasus:latest
    ports:
      - "4005:4005"
    volumes:
      - ./databasus-data:/databasus-data
    restart: unless-stopped

Start the service:

docker compose up -d

Configuring compressed backups

Access the web interface at http://localhost:4005 and create your account, then:

Add your database — Click "New Database" and enter your PostgreSQL connection details
Select storage — Choose your backup destination: S3, Google Cloud Storage, local storage or other supported options
Configure compression — Databasus compresses backups automatically. Select your preferred algorithm and compression level based on your storage vs speed priorities
Select schedule — Set backup frequency matching your recovery requirements
Click "Create backup" — Databasus validates settings and begins scheduled compressed backups

Databasus manages compression, encryption, retention and notifications in one place. No scripts to maintain.

Optimizing compression for your data

Not all PostgreSQL data compresses equally. Text-heavy tables compress extremely well. Binary data, already-compressed images, or encrypted columns don't compress much further.

Checking current compression potential

Estimate how well your data will compress:

-- Check table sizes
SELECT
    schemaname || '.' || tablename AS table_name,
    pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC
LIMIT 10;

Tables with lots of text, JSON, or repeated values compress well. Tables storing binary blobs or pre-compressed data don't.

Data type impact on compression

Data that compresses well:

Text and varchar columns with natural language
JSON/JSONB with repeated keys
Timestamps and dates
Numeric data with patterns
NULL values (essentially free)

Data that compresses poorly:

UUID columns (random by design)
Already-compressed bytea data
Encrypted columns
Random binary data

If your database is mostly UUIDs and encrypted blobs, expect 20-40% compression. If it's mostly text and JSON, expect 80-95% compression.

Balancing compression level and speed

Higher compression levels save storage but take longer. Choose based on your priorities:

Storage-constrained: Use zstd level 15-19 or gzip level 9
Balanced: Use zstd level 3-5 or gzip level 6
Speed-focused: Use lz4 or zstd level 1

For automated daily backups, balanced settings work well. For archival backups stored long-term, maximum compression saves ongoing storage costs.

Compression during backup windows

If your backup window is tight, fast compression helps:

# Fast compression for tight backup windows
pg_dump -Fc -Z 1 -f backup.dump mydb

# Or use parallel compression with directory format
pg_dump -Fd -Z 1 -j 4 -f backup_dir mydb

Level 1 compression still achieves 60-80% of maximum compression in a fraction of the time.

Verifying compressed backups

Compressed backups can be corrupted just like uncompressed ones. Verify integrity before relying on them:

# Verify gzip file integrity
gunzip -t backup.sql.gz

# Verify zstd file integrity
zstd -t backup.sql.zst

# List contents of pg_dump custom format
pg_restore -l backup.dump > /dev/null

For critical backups, periodically restore to a test database:

# Test restore to temporary database
createdb restore_test
pg_restore -d restore_test backup.dump

# Verify row counts match production
psql -c "SELECT count(*) FROM important_table" restore_test

# Cleanup
dropdb restore_test

Common compression mistakes

A few common issues undermine compression benefits.

Double compression

Compressing already-compressed data wastes CPU and sometimes makes files larger:

# Wrong — custom format is already compressed
pg_dump -Fc mydb | gzip > backup.dump.gz

# Right — just use custom format compression
pg_dump -Fc -Z 6 -f backup.dump mydb

If you need a different algorithm than zlib, use plain format:

# Right — plain format with zstd
pg_dump -Fp mydb | zstd > backup.sql.zst

Ignoring disk space during backup

Compression happens in memory and writes compressed output. But if you're compressing after creating an uncompressed file, you need space for both:

# Needs 2x disk space
pg_dump -Fp -f backup.sql mydb
gzip backup.sql

# Needs only compressed size
pg_dump -Fp mydb | gzip > backup.sql.gz

Stream compression when disk space is limited.

Using maximum compression for frequent backups

Level 9 or 19 compression takes 5-10x longer than default levels while only saving 10-20% more space. Reserve maximum compression for archival backups, not daily ones.

Conclusion

Compression is one of the easiest wins for PostgreSQL backup management. Built-in pg_dump compression handles most cases, and external tools like zstd offer better ratios when needed. The key is matching compression settings to your priorities — storage costs, backup speed, or restoration time. Start with default settings (custom format at level 6), measure your results, and adjust based on what matters most for your infrastructure.

DEV Community