With PostgreSQL 17 incremental backups wer introduced in a built in way. This article describes the result of a workshop dedicated to study this new feature.
The Setup
We used the following docker composition to simulate the a cluster with traffic that we want backed up. For this we created the following docker-compose.yml with several containers:
services:
postgres_main:
image: postgres:17
environment:
POSTGRES_PASSWORD: postgres
POSTGRES_USER: postgres
POSTGRES_DB: testdb
PGDATABASE: testdb
volumes:
- ./docker-entrypoint-initdb.d:/docker-entrypoint-initdb.d
- pg_data:/var/lib/postgresql/data
- wal_archive:/mnt/wal_archive
- full_backup:/mnt/full_backup
- incremental_backup:/mnt/incremental_backup
command: >
postgres -c archive_mode=on
-c archive_command='cp %p /mnt/wal_archive/%f'
-c summarize_wal=on
postgres_restore:
image: postgres:17
profiles:
- restore
environment:
POSTGRES_PASSWORD: postgres
POSTGRES_USER: postgres
POSTGRES_DB: testdb
volumes:
- pg_data_restore:/var/lib/postgresql/data
- wal_archive:/mnt/wal_archive
- full_backup:/mnt/full_backup
- incremental_backup:/mnt/incremental_backup
command: >
postgres -c restore_command='cp /mnt/wal_archive/%f %p'
cli:
image: cli
build: .
stop_grace_period: 1s
environment:
PGPASSWORD: postgres
PGUSER: postgres
PGDATABASE: testdb
PGHOST: postgres_main
volumes:
- pg_data_restore:/mnt/data/restore
- pg_data:/var/lib/postgresql/data
- wal_archive:/mnt/wal_archive
- full_backup:/mnt/full_backup
- incremental_backup:/mnt/incremental_backup
- ./checksum.py:/usr/local/bin/checksum.py
entrypoint: ["/bin/sh", "-c"]
command:
- |
chown -R 999 /mnt/wal_archive
chown -R 999 /mnt/full_backup
chown -R 999 /mnt/incremental_backup
chown -R 999 /mnt/data/restore
sleep infinity
volumes:
pg_data:
pg_data_restore:
wal_archive:
full_backup:
incremental_backup:
Notes about the docker compose file
-
postgres_maincontains the database we wish to backup -
postgres_restoreis the database where we want to restore the database -
cliis to prepare the data for restoration, it enables us to connect to all clusters.
⚠️ Make sure to use PG17!
The main idea:
- Take a full backup as a starting point
- Take an incremental backup
- Take another incremental backup
- Repeat step 3
1. Initial full Backup
The first step, consists of creating an initial full backup with the following command:
pg_basebackup --pgdata=/mnt/full_backup
Some info about the pg_basebackup command
- The target directory is a mount point for the full backup, it’s also shared between containers (via the volumes).
- By default pgdata specifies the target directory where the backup will be stored. In this case the output will be written to
/mnt/full_backup
Output of the command
pg_basebackup will create a bunch of files, we will focus on the 2 important ones:
backup_label: this is a legacy description of the backup and here is an extract
START WAL LOCATION: 0/4000028 (file 000000010000000000000004)
CHECKPOINT LOCATION: 0/4000080
BACKUP METHOD: streamed
BACKUP FROM: primary
START TIME: 2025-04-24 09:08:31 UTC
LABEL: pg_basebackup base backup
START TIMELINE: 1
This file indicates the location in the WAL where the backup starts, and so is the checkpoint location. The other information speak of themselves.
backup_manifest: (available since pg 13) is linked to the feature we are currently talking about - the incremental backup.
This file will serve as a reference to determiner which files should be included in the incremental backup. Don’t hesitate to order some training and/or feel free to checkout the documentation [https://www.postgresql.org/docs/current/backup-manifest-files.html]
Here is an extract of this new backup_manifest:
{
"Path": "base/16384/3766",
"Size": 16384,
"Last-Modified": "2025-04-24 08:56:45 GMT",
"Checksum-Algorithm": "CRC32C",
"Checksum": "3c0ea625"
},
For this specific “extract” of the file ‘base/16384/3766’ we have:
- The checksum, that is the fingerprint 3c0ea625
- The last modification date 2025-04-24 08:56:45 GMT
Understanding the Checksum
We can check this fingerprint by ourselves with a short snippet:
Prerequisites:
- python3
- pip install crc32c (you may need –fix-broken-packages)
#!/usr/bin/env python3
import sys
import crc32c
def main():
if len(sys.argv) != 2:
print(f"Usage: {sys.argv[0]} <filename>")
sys.exit(1)
filename = sys.argv[1]
try:
with open(filename, 'rb') as f:
data = f.read()
except Exception as e:
print(f"Failed to read file: {e}")
sys.exit(1)
checksum = crc32c.crc32c(data)
print(f"CRC32C (normal) : 0x{checksum:08x}")
le_bytes = checksum.to_bytes(4, byteorder='big')[::-1]
print(f"CRC32C (little-endian) : 0x{le_bytes.hex()}")
if __name__ == "__main__":
main()
While no traffic is recorded on the pg cluster, the data will remain the same, as shown here.
Execute the checksum python script (file is in the path) with the parameter (file)
NOTE: You can find the file where the data of a specific table is stored with the following query:
SELECT
relname,
'base/' || pg_database.oid || '/' || relfilenode AS filename,
pg_database.oid AS db_oid,
pg_database.datname AS database,
nspname AS schema
FROM pg_class
JOIN pg_namespace ON pg_namespace.oid = pg_class.relnamespace
JOIN pg_database ON pg_database.oid = pg_database.oid
WHERE relfilenode IS NOT NULL
AND relname LIKE 'pgbench%';
Which in this case returns the output:
| relname | filename | database | schema |
|---|---|---|---|
| pgbench_accounts | base/16384/16397 | testdb | public |
| pgbench_accounts_pkey | base/16384/16405 | testdb | public |
| pgbench_branches | base/16384/16398 | testdb | public |
| pgbench_branches_pkey | base/16384/16401 | testdb | public |
| pgbench_history | base/16384/16399 | testdb | public |
| pgbench_tellers | base/16384/16400 | testdb | public |
| pgbench_tellers_pkey | base/16384/16403 | testdb | public |
So based on this we need to take a look at the file base/16384/16397 to determine checksum for the pgbench_accounts relation
Now execute the python script:
./checksum.py /var/lib/docker/volumes/demo-postgres-backup-incremental_pg_data/_data/base/16384/16397
>>> 0x57679e47 # CRC32C
>>>> { "Path": "base/16384/16397", "Size": 536870912, "Last-Modified": "2025-04-24 11:44:14 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "57679e47"},
To se a change on the checksum let’s make some changes: We update a column in the pgbench_account table
UPDATE pgbench_accounts SET abalance = abalance + 10;
Lets, check the fingerprint again:
./checksum.py /var/lib/docker/volumes/demo-postgres-backup-incremental_pg_data/_data/base/16384/16397
>>> 0x06cc374f # CRC32C
>>>> { "Path": "base/16384/16397", "Size": 1073741824, "Last-Modified": "2025-04-24 12:37:28 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "06cc374f" },
‼️When using a long update query, we can inspect the datadir and notice that the file is modified (checksum will be different), even if the transaction is not yet committed.
If we cancel the query, even if the data is not modified from the logical point (transaction rollback), the data on the disk will contain the uncommitted modifications. Due to the visibility map those modifications are not visible by any transaction but the file is modified, and the checksum will be different
⚠️ Wal summarize need to be activated (see our docker-compose)
2. Now Let’s Increment!
The first increment contains the diff from the full backup. Then the second increment references the previous one and should only contain the diff from the first incremental.
To set this up we will use the following command
pg_basebackup --checkpoint=fast --incremental=/mnt/full_backup/backup_manifest --pgdata=/mnt/incremental_backup/0/
-
--checkpoint=fastis set in order not wait for the next checkpoint -
--incrementalis where the magic appears, it must point to the backup_manifest of the last increment or full backup and is the origin of the diff -
-- pgdataspecifies the destination directory where the incremental backup will be stored
The creation of a new increment can be repeated multiple times. Where each increment contains only a copy of the blocks/pages that have changed since the last given increment.
Integrating Incremental Backups for Restoration
Now let’s put the incremental backup back together with the full backup and begin the database restoration process.
In order to do this we will use pg_combinebackup to merge the full backup with the incremental backups.
This command accepts multiple arguments, allowing us to specify as many incremental backups as we need. The first argument must be a full backup followed by the increments in a chronological order. If this order is not respected we will run into issues.
pg_combinebackup -d -o /mnt/data/restore /mnt/full_backup /mnt/incremental_backup/0 /mnt/incremental_backup/1
This command will give us a full backup in /mnt/data/restore. So now we can start a restored database from the combined backup.
In terms of the size of the back up we can see, that the full backup is much larger than the the increment:
| Size | Item |
|---|---|
| 1.6G | full_backup |
| 25M | incremental_backup |
And the more changes there are in the database the larger is the incremental backup:
| Size | Changes | Comment |
|---|---|---|
| 25M | 0 | |
| 25M | 1 | change of 2 records in a partitioned table |
| 28M | 2 | change of 1000 records in a partitioned table |
RTO (Recovery Time Objective) and RPO (Recovery Point Objective):
Lets talk about the impact of the recovery time objective (RTO). This is the time that is needed to rebuild a full backup with all the increments and recreate a recovery database.
It is possible to to have different scenarios:
- One full backup once a week and one incremental backup every day.
- One nightly full backup and incremental backups on an hourly rate.
Note: The second scenery is only relevant for a database with a lot of traffic.
The most consuming part is the reply of the WAL file, this means that we need to reduce the number of WAL to replay. One way to do this is to make the last incremental backup as close as possible to the recovery target. The second scenario is a good candidate to cover this. With incremental backups it is possible to handle this in a way so that not all of the data in the Database is transferred every hour but just the tables that contain modification in comparison to the last incremental backup.
Another independent scenery that could be imagined is that we make use of the increments and the full backup to create a new füll backup once in a while. Then on the next increment will be based on this new full backup. In this way backup traffic on the cluster can be reduced.
Summary:
In the blog pst we have analysed how to use incremental backups and how to set them back together to create a new folder so that we can start a restoration of the original database.
On the side the checksum and the manifest files that are used for incremental backups are explained and analysed.
As an out come we see is that by using incremental backups the subsequent backup gets smaller since only what changed since the last incremental run is saved.




Top comments (0)