Manager Handbook for Centralized AntDB-T — P6

Physical backup
Suitable scenarios:
•Scenarios where both full and incremental backups are required
•Need of quick database recovery
•Overall migration between the same version
AntDB's physical backup tool
•Linux file copy command, cp scp rsync command, with tar packing (commonly known as "cold backup", need to stop the library to operate)
•pg_basebackup
pg_basebackup
pg_basebackup — get a base backup of an AntDB cluster
Outline

pg_basebackup [option...]

Description
pg_basebackup is used to obtain a base backup of a running AntDB database cluster. Obtaining these backups does not affect other clients of the database, and can be used for point-in-time recovery and as a log transfer or a start point of the stream replication back-up server.
pg_basebackup performs an exact replication of the database cluster's files while ensuring that the server automatically enters and exits backup mode. Backups are always obtained from the entire database cluster, it is not possible to back up individual database or database object. Regarding selective backups, a tool like pg_dump must be used.
Note this when using pg_basebackup to back up:
The business can read and write to the database normally during the backup
Only the entire database cluster can be backed up, not a single database
You must use as a super user or a user with REPLICATION privileges to perform the backup operation
A whitelist (pg_hba.conf) can be configured to back up either the primary or a backup database

pg_basebackup --help
    -D, --pgdata=DIRECTORY receive base backup into directory
    -F, --format=p|t       output format (plain (default), tar)
    -r, --max-rate=RATE    maximum transfer rate to transfer data directory(in kB/s, or use suffix "k" or "M")
    -X, --wal-method=none|fetch|stream    include required WAL files with specified method
    -z, --gzip             compress tar output
    -Z, --compress=0-9     compress tar output with given compression level
    -c, --checkpoint=fast|spread  set fast or spread checkpointing
    -P, --progress         show progress information

option
The following command row options control the position and format of the output：

-D directory
--pgdata=directory
Set the destination directory to write the output. If the directory does not exist, pg_basebackup creates the directory (and any missing parent directories). If it already exists, it must be empty.

When the backup is in tar mode, the destination directory can be specified as - (dash), thus writing the tar file to stdout.

This option is required.

-F format
--format=format
Select the format for the output. The format can be one of the following:

p
plain
Write the output as a flat file with the same layout as the source server data directory and tablespace. When the cluster has no additional table spaces, the entire database is placed in the target directory. If the cluster contains additional table spaces, the master data directory is placed in the target directory, but all other table spaces are placed in the same absolute path as they are on the source server.
This is the default format.
t
tar
Write the output as a tar file to the destination directory. The contents of the main data directory will be written to a file named base.tar, while other tablespaces will be written to separate tar files named after the OIDs of that tablespace.
If the destination directory is specified as - (dash), the tar content will be written to standard output, which is suitable for pipelining to gzip (e.g.). This is only allowed if the cluster has no additional table space and the WAL stream is not being used.
-R
--write-recovery-conf
Create a standby.signal file and append the connection settings to the postgresql.auto.conf file in the target directory (or a basic archive file using tar format). This simplifies the process of setting up an alternate server with the backup results.
The postgresql.auto.conf file records the connection settings (if any) and the replication slot used by the pg_basebackup, so that the same settings are used later in the stream replication.

-T olddir=newdir
--tablespace-mapping=olddir=newdir
Relocate the table spaces in the directory olddir to newdir during the backup. For this to be valid, olddir must exactly match the path specification of the table space defined on the source server. (But if the table spaces in olddir are not included in the backup, it is not an error). At the same time, newdir is a directory in the file system of the receiving host. As with the main target directory, newdir does not have to already exist, but if it does, it must be empty. olddir and newdir must be absolute paths. If a path happens to contain an = symbol, it can be escaped with a backslash. This option can be used multiple times for multiple table spaces.

If you relocate a table space in this way, the symbolic links in the main data directory are updated to point to the new location. Therefore, the new data directory can already be used by a new server instance where all table spaces are in the updated location.
--waldir=waldir
Specifies the location to use for the write-ahead log directory. Waldir must be an absolute path. You can specify a transaction log directory only if the backup is in flat file mode. Set the directory to write to the WAL (write-ahead log) file. By default, WAL files will be placed in pg_wal subdirectories of the destination directory, but this option can be used to place them elsewhere. Waldir must be an absolute path. As with the main target directory, waldir does not have to already exist, but if it does, it must be empty. You can specify this option only if the backup is in flat file mode.
-X method
--wal-method=method
Include the required WAL (write-ahead log) files in the backup. This includes all write-ahead logs generated during backups. Unless the method NONE is specified, postmaster can be started in the target directory without referencing the log archive, making the output a completely independent backup.
The following methods of collecting write-ahead logs are supported:
n
none
Do not include write-ahead logs in backups.


f
fetch
Write-ahead log files are collected at the end of the backup. Therefore, it is necessary to set the wal_keep_size parameters of the source server high enough so that the log data that is needed before the end of the backup is not removed. If the required log data has been recycled before transfer, the backup will fail and become unusable.
If the tar format is used, the write-ahead log file will be included in the base .tar file.
s
stream
Stream write-ahead log data while backups are in progress. A second connection to the server is opened and the write-ahead logs are streamed in parallel while the backup is running. Therefore, it will require two replication connections, not just one. As long as the client can keep up with the write-ahead log data, this method eliminates the need to keep up with additional write-ahead logs on the source server.
If the tar format is used, the write-ahead log file is written to a separate file named pg_wal.tar (if the server's version is more than 10, the file will be named pg_xlog.tar).
This value is the default value.
-z
--gzip
Enable gzip compression of tar file output, using the default compression level. Compression is only available when using the tar format, and is automatically appended with the suffix ".gz" to all tar file names
-Z level
--compress=level
Enable gzip compression to the tar file output, and specify the compressor category (0 to 9, 0 is no compression, 9 is optimal compression). Compression is only available when using the tar format, and the suffix ".gz" is automatically appended to all tar file names.
The following command-line options control the generation of backups and the invocation of the program:
-c fast|spread
--checkpoint=fast|spread
Set the checkpoint mode to fast or spread (default) (see Section 25.3.3).
-C
--create-slot
Specifies that a replication slot named by the --slot option should be created before starting a backup. If the slot already exists, an error is thrown.
-l label
--label=label
Set labels for backups. If not specified, a default value of "pg_basebackup base backup" is used.
-n
--no-clean
By default, when a pg_basebackup aborts due to an error, it removes directories (such as target directories and write-ahead log directories) that it previously created when it realized that it could not complete the work. This option disables this cleaning and can therefore be used for debugging.
Note that the tablespace directory is not cleared either way.
-N
--no-sync
By default, pg_basebackup will wait for all files to be safely written to disk. This option causes pg_basebackup to return without this wait, which is faster, but it also means that subsequent operating system crashes may corrupt the underlying backup. Usually this option is useful for testing and should not be used when creating a production installation.
-P
--progress
Enable progress reporting. Enabling this option will publish a high-level progress report during the backup. Since the database may change during backup, this is only an approximation and may not end exactly at 100%. In particular, when the WAL log is included in the backup, the total amount of data cannot be estimated in advance, and the estimated target size in this case increases after it is estimated without VAL.
-r rate
--max-rate=rate
Sets the maximum transfer rate at which data is collected from the source server. This helps limit the impact of pg_basebackup on the server. Values are measured in kilobytes per second. Use the suffix M to represent megabytes per second. The suffix k is also acceptable and has no effect. Valid values are between 32 kilobytes per second and 1024 megabytes per second.
This option always affects the transfer of the data directory. Only if the collection method is fetch, the transfer of WAL files will be affected.
-S slotname
--slot=slotname
This option can only be used with -X stream. It causes the WAL stream to use the specified replication slot. If the purpose of the underlying backup is to be used as a stream replication fallback using a replication slot, it should use the same replication slot name as in the primary_slot_name. This ensures that the primary server does not remove any required WAL data that occurs between the end of that base backup and the start of streaming replication on the new standby database.
The specified replication slot must already exist unless option -C is also used.
If this option is not specified and the server supports temporary replication slots (after version 10), a temporary replication slot is automatically used for WAL streaming.
-v
--verbose
Enable verbose mode. Some extra steps are output during startup and shutdown, and if progress reporting is also enabled, the exact file name that is currently being processed is displayed.
--manifest-checksums=algorithm
Specifies the checksum algorithm that should be applied to each file included in the backup manifest. Currently, the available algorithms are NONE, CRC32C, SHA224, SHA256, SHA384, and SHA512. The default value is CRC32C.

The SHA hash function provides a cryptographically secure summary of each file for users who wish to verify that the backup has not been tampered with, while the CRC32C algorithm provides faster checksum calculations. It is good at catching errors due to unexpected changes, but is not resistant to malicious modifications. Note that in order to be useful to adversaries who have access to the backup, the backup manifest must be stored securely elsewhere, or it must be verified to ensure that no modifications have been made since the backup was made.
The SHA hash function provides a cryptographically secure summary of each file for users who wish to verify that the backup has not been tampered with, while the CRC32C algorithm provides faster checksum calculations. It is good at catching errors due to unexpected changes, but is not resistant to malicious modifications. Note that in order to be useful to adversaries who have access to the backup, the backup manifest must be stored securely elsewhere, or it must be verified to ensure that no modifications have been made since the backup was made.
pg_verifybackup can be used to check the integrity of a backup against the backup inventory.

--manifest-force-encode
Force that all file names in the backup manifest be hexadecimal encoded. If this option is not specified, only non-UTF8 file names are hexadecimal encoded. This option is primarily used to test whether the tool that reads the backup manifest file handles this situation correctly.


--no-estimate-size
Prevents the server from estimating the total amount of backup data that will be streamed, resulting in the backup_total column in the pg_stat_progress_basebackup view being NULL at all times.
Without this option, the backup starts by enumerating the size of the entire database, then goes back and sends the actual content. This can make the backup take a little longer, especially before the first data is sent. This option will help avoid such estimates if the estimate is too long.
This option is not allowed when using --progress.
--no-manifest
Disable the generation of backup manifests. If this option is not specified, the server generates and sends a backup manifest that can be verified using the pg_verifybackup. The inventory is a list of every file present in the backup, except for all WAL files that may be included. It also stores the size, last modified time, and optional checksum for each file.
--no-slot
Prevents temporary replication slots from being created for backups.
By default, if you select Log Stream, but do not specify a slot name with the option -S, a temporary replication slot is created (if supported by the source server).
The main purpose of this option is to allow a base backup to be made when no free replication slots are available on the server. Using a replication slot is almost always the best way to do this, as it prevents the WAL needed during the backup from being deleted.
--no-verify-checksums
If checksum verification is enabled on the server where the base backup is taken, disable checksum verification.
By default, the checksum is validated and a checksum failure results in a non-zero exit state. However, the base backup will not be removed in this case, as if the --no-clean option were used. Checksum validation failures are also reported in the pg_stat_database view.
The following command-line options control connections to the source server:
-d connstr
--dbname=connstr
Specify parameters for connecting to the server, such as a connection string; These override any conflicting command-line options.
For consistency with other client applications, this option is called --dbname. However, because pg_basebackup are not connected to any specific database in the cluster, any database names in the connection string are ignored.

-h host
--host=host
Specifies the host name of the machine running the server. If the value starts with a slash, it is used as a directory for Unix domain sockets. The default value is taken from the PGHOST environment variable (if set), otherwise a Unix domain socket connection is attempted.
-p port
--port=port
Specifies the TCP port on which the server is listening for connections or the local Unix domain socket file extension. By default, the value from the PGPORT environment variable (if set) or a default value compiled in the program is used.
-s interval
--status-interval=interval
Specifies the number of seconds between state packets sent back to the origin server. A smaller value allows you to monitor the server's backup progress more accurately. A value of zero completely disables this periodic state update, but an update is sent when the server needs it to avoid disconnections caused by timeouts. The default value is 10 seconds.
-U username
--username=username
Specifies the user name for the connection.
-w
--no-password
Prevent the issuance of export order prompts. If the server requires password authentication and there is no other way to provide the password (for example, a .pgpass file), the connection attempt will fail. This option is useful for batch tasks and scripts where none of the users enter the password.
-W
--password
Force pg_basebackup prompt for a password before connecting to the source server.
This option is not essential because if the server requires password authentication, pg_basebackup will automatically prompt for a password. However, pg_basebackup will waste a connection attempt to discover that the server wants a password. In some cases it is worth using -W to avoid additional connection attempts.
Other options are also available:
-V
--version
Print the pg_basebackup version and exit.
-?
--help
Displays help for pg_basebackup command-line arguments and exits.

For example

Common cold standby commands:
$ pg_basebackup -h host_ip -p port -U user_name -D /data/test/test_backup.tar  -F t -X f -f fast

To create a base backup of the server mydbserver and store it in the local directory /usr/local/pgsql/data:
$ pg_basebackup -h mydbserver -D /usr/local/pgsql/data

To create a backup of the local server, generate a compressed tar file for each of these tablespaces, store it in directory backup, and display a progress report during runtime:
$ pg_basebackup -D backup -Ft -z -P

To create a backup of the local database for a single tablespace and compress it using bzip2:
$ pg_basebackup -D - -Ft -X fetch | bzip2 > backup.tar.bz2

To create a backup of a local database where the tablespaces in /opt/ts are relocated to ./backup/ts:
$ pg_basebackup -D backup/data -T /opt/ts=$(pwd)/backup/ts

If compression is required, add -z -Z 5 (gzip compression 0-9, 0 means no compression, 9 means best compression ratio)
pg_basebackup -h host_ip -p port -U user_name -D /data/test/test_backup.tar.gz  -F t -X f -f fast -z -Z 5

If you want to use other compression algorithms, you can use pipelines
pg_basebackup -h host_ip -p port -U user_name -F t -X f -f fast -D - | zstd -T 12 - > test_backup

DEV Community

Manager Handbook for Centralized AntDB-T — P6

Top comments (0)