(Originally posted at: https://kgrz.io/postgresql-backup-notes.html)
Some of my very rough notes when going nearly cover-to-cover of the
PostgreSQL backup and
restore
documentation section. It's one of the most detailed pieces of
documentation I've ever read, so this acts as a very high-level
summary. Most of the content is useful for general context surrounding
data backups of any kind.
I'm using /data
directory to signify the data storage directory. This
depends on the chosen configuration, however.
Backups
Three main types of backup strategies:
- SQL dump/load (stop-the-world)
- Backup
/data
directory (stop-the-world) - Continuous Archiving
The first two typically need lots of extra space on the database server
to store the backup before you can upload it to some off-site storage.
"stop-the-world" in this context is not an official nomenclature. In
these strategies, most likely, the database needs to be shut down at
some point.
Continous archive-based backups are used for a leader-follower setup, or
even delta backups—where there's a base backup, and subsequent data as
deltas that can be used to restore the entire database.
Application-supported remote backups are quite simple, and so this is
the best strategy if the database servers are space-constrained.
SQL Dump/Load
This is the easiest strategy. pg_dump
takes a backup, while
pg_restore
command consumes the output of that backup. This strategy
is the simplest to cron-ify a backup, without external dependencies:
take a backup, upload the files to remote storage, test the backup on a
different machine, and do this every night.
pg_dump
saves the database into a.sql
statement. Requires large
enough space to hold both the database and the backup script.File sizes might be limited by kernel/OS, so that's something to look
ahead while deciding to use this.Restore from the
pg_dump
output might also need extra configuration
tweaking around connection times: too less, and the database might
close the connection before the entire script has run.
Copy /data
directory
PostgreSQL's directory layout is straight-forward—once you get to know
it. Most of the data is put in one directory, and this includes the two
main components needed for any future restores: the data files, and the
temporary append-only log files. If the database is shut down, you're
free to copy over the data
directory to another machine, and start off
from it. Configuration files typically aren't placed in the data
directory, so they might need to be copied as well.
Any strategies that have to rely on the file-system layout of
PostgreSQL, or features provided by the file system itself.
Two routes here: frozen snapshots of the file system, or using tools
like rsync, tar etc.
frozen snapshots
If the underlying file system supports atomic volume snapshots (btrfs,
zfs, Apple's APFS for example), one can snapshot the entiredata
directory. Lots of caveats around how good the snapshot mechanism is
implemented exist.The backup can be taken without stopping the server. During restore,
this strategy would require replaying the logs as there might be some
commits that weren't turned to data files from the append only log.
rsync
, tar
, et al.
rsync
,gzip
,tar
thedata
directory. These utilities don't
take consistent snapshots of the disk, so it's best to shutdown the
server. Shutting down the server forces a full flush of the data to
disk.An example two step process with
rsync
:
run rsync shutdown the server rsync --checksum
What's interesting is that this two-step process is similar to the one
used in online backups section. This is like a one-step delta backup
process if we stretch it enough: the first backup is a base backup that
contains the data committed till that point, then the second rsync
takes the delta and copies that over.
Continuous Archiving
This system can be used to setup a replicated system, consisting of a
leader and potentially multiple followers. The data from the leader is
pushed, and each of the followers might pull the data. Where this data
is stored is customisable. There are many ways to setup replication in
PostgreSQL, and the documentation for it is exhaustive. The archival
part deals with the first part: taking the backup and pushing it
somewhere.
This strategy piggy-backs on the fact that a WAL log may be used to
replay and restore a database. There are many caveats and configuration
tweaks to how long the WAL log files are retained, the size of those log
files and the naming of the files. It's best to ship the log files as
and when they are created to an external storage service. Rather than do
this manually via rsync
et. al., PostgreSQL provides a way:
archive_command
setting in the configuration, which takes a script.
WAL logs should be secured while transmission and remote storage,
because these contain the actual data. (that goes for the main
database too, fwiw)archive_command
should exit with0
code. Otherwise, the command
gets retried. Thepg_wal
directory may potentially get filled, and
cause the server to crash!archive_command
should be designed to ensure it doesn't override
existing files on the remote system.Missing WAL logs from the archive might hamper future restore, so
regular base backups will help keep the error surface area a little
small.old base backup + too many WAL logs to restore increase the restore
time. It's important to determine the maths behind this to figure out
how much downtime you might need and tweak the base backup frequency,
and WAL file size accordingly.
General mechanism:
- One base backup as a starting point
- Continuous deltas in the form of the append-only log (WAL) files
The base backup marks the point where the backup would start
(checkpoint).
base backup
Two ways to take a base backup:
-
Use the
pg_basebackup
command from an external machine (or the same
machine, with a different data directory setting), providing the
connection info to connect to the leader.- Multiple commands can be run from multiple machines, but might depend on replication slots configuration on the leader.
- Might use one/two connections depending on the variant of backup used: copy WAL logs at the end (1) or stream WAL logs parallelly (2).
- Does not run if
/data
directory is not empty.
Two-step process via
rsync
. PostgreSQL provides two SQL statements
for signalling the server that the user is taking a backup, and that
a checkpoint has to be created:pg_start_backup
,pg_stop_backup
.
SELECT pg_start_backup('some_label') rsync /data SELECT * from pg_stop_backup();
Restore
stop-the-world restores
Data dumps taken with pg_dump
or the file system strategy mentioned
above can be restored by pg_restore
or just starting the server.
Needless to say, this strategy causes either data loss or needs
downtime, depending on the operations chosen.
If a system has a simple
pg_dump
cron job that ships the archive to
remote storage, when the leader crashes or dies, the time to
detection, copying the archive to the follower,pg_restore
completion times is the amount of downtime that's required.The cron job, if configured at a certain time in the day, differs from
the time the crash happens, the delta in the data until that time on
the leader is a potential loss in data.When the leader crashes/dies, but you still have access to the
physical data disks, recovery using file system snapshot is possible,
and that may potentially recover all the data up till the point of the
last commit. Because this recovery would also have the WAL files
handy, the replay will make sure as much data as possible is
recovered.
Continuous Archive restores
If the system is setup with continuous archiving, it may be possible to
recover all the data. Restore times depend on how fast the base backup
archive, WAL logs can be copied over to the new server, and the WAL log
replay.
Replication
There are many ways to do this, too, depending on the underlying infra:
shared disk (two machines accessing the same disk), file system
replication (a write to one drive is mirrored to a different machine
atomically), side-car middlewares that execute a given statement on
multiple machines simultaneously, or even application-level middlewares
that do this. Streaming/Point-in-time replication is one preferred
approach that can piggy back on the continuous archive backup strategy.
Streaming/Point-in-time replication strategy uses wal logs shipped to a
remote server using archive_command
from the leader to be used to
replay the logs on a follower continously.
- Note that in streaming replication is possible without using
archive_command
, provided the data ingestion throughput never exceeds the rate of the follower streaming the logs directly from the leader, and applying them locally (also depends on the network latency).
If the follower is not able to keep up with the logs, the logs on
leader might get recycled, and the follower will keep waiting for the
now-non-existent WAL file. Force-starting the follower in case of
failure will result in data loss.
Top comments (0)