A Brief Overview of Base Backup & Point-in-Time Recovery in PostgreSQL

#apache #postgressql #apacheage #bitnine

Backups in online databases can be categorised into two main categories:

physical backups
logical backups

Although both have their own advantages and disadvantages, logical backup mainly has one major disadvantage being that it takes too much time for performing the backup. Specifically, it takes an unusually long amount of time to backup a very large database and even more time to restore this database from the backup.

In PostgreSQL however, full physical online backups have been available since version 8.0 and a snapshot of the whole running database cluster is known as a base backup.

Point-In-Time Recovery (PITR) is also a feature that has been available since version 8.0 helps restore the database cluster to any point in time by making use of a base backup and archive logs that are developed using the continuous archiving feature. Let's take an example and say that you made a critical mistake such as deleting all the tables, this feature allows you to restore the database back to the point just before the mistake was made.

In this short overview, we will look at the following topics:

- What base backup is

Firstly, the regular technique to design a base backup are as follows:

(1) Issue the pg_backup_start command (Version 14 or earlier, pg_start_backup)
(2) Take a snapshot of the database cluster with the archiving command you want to use
(3) Issue the pg_backup_stop command (Version 14 or earlier, pg_stop_backup)

This is a relatively simple and easy to use procedure that is helpful for system administrators as it requires no special tools and only uses common tools such as copy command or any similar tool used for archiving and creating a backup. Additionally, no table locks are required and all of the database users can query the database while being unaffected by the backup operations.

- How PITR works

Now let's take a look at how PITR works. Let's assume that you made a crucial mistake at 5:15 PKT on 21st January 2023. Now you will remove the current database cluster and restore the new one using the backup that has been made before. Next, you will set the parameter_restore_command and also set a time for the parameter recovery_target_time to the point where the mistake was made. Now when PostgreSQL boots up, enters into PITR recovery mode and if there is a recovery file in the backup cluster it starts the recovery mode.

- What timelineId is

In PostgreSQL, a timeline is used to differentiate the original database cluster from the recovered one and is one of the most fundamental concepts of PITR. let's take a look at timelineId in this section.

Each timeline is given a timelineId which is basically a 4-byte unsigned integer that starts at 1.
Each database cluster is assigned an individual timelineId and the timelineId of the original database cluster is created by the initdb utility and is 1 by default. And whenever a database cluster is recovered, the timelineId is incremented by 1

- What timeline history file is

When a PITR process is completed, a timeline history file is created with names like

00000003.history
This is created under the archival directory. This file keeps a record of which timeline it was initially branched off from and at what time.

The timeline history file consists of at least one line and each of the lines are composed of the following three things:

timelineId - it is a timelineId of the archive logd that are used to recover.
LSN - This points to the location where the switch of the WAL segment takes place
Reason - a human readable explanation of why the timeline was changed.

DEV Community

A Brief Overview of Base Backup & Point-in-Time Recovery in PostgreSQL

Backups in online databases can be categorised into two main categories:

Top comments (0)