Summary of Chapter# 10: "Base Backup & Point-in-Time Recovery" from the book "The Internals of PostgreSQL" Part-2

#postgres #apacheage #database #agenssql

This blog aims to assist you in understanding the final concepts of Chapter:10 [Base Backup & Point-in-Time Recovery] from the book The Internals of PostgreSQL.

Note: Ensure that you have a thorough understanding of
Chapter 10 Part-1 and basics of PostgreSQL before we proceed to Chapter 10 Part-2, as it forms the foundation for our exploration.

So, Let's Continue:

Differences in PITR vs. Normal Recovery

WAL segments/Archive logs source

Normal recovery mode: pg_wal subdirectory (or pg_xlog in versions 9.6 or earlier).
PITR mode: Archive logs are read from an archival directory defined in archive_command.

Checkpoint location source

Normal recovery mode: Read from the pg_control file.
PITR mode: Read from a backup_label file.

PITR Process Steps:

Read the 'CHECKPOINT LOCATION' from the backup_label file to find the REDO point.
Read relevant parameters like restore_command and recovery_target_time from postgresql.conf (versions 12 or later) or recovery.conf (versions 11 or earlier).
Start replaying WAL data from the REDO point, obtained from the 'CHECKPOINT LOCATION.' WAL data is read from archive logs copied to a temporary area using the restore_command parameter.
Replay continues until reaching the specified recovery_target_time or the end of the archiving logs if no target time is set.
When recovery completes, a timeline history file (e.g., '00000002.history') is created in the pg_wal (or pg_xlog) subdirectory. If the archiving log feature is enabled, a similar file is created in the archival directory.
Commit and abort actions in WAL records include timestamps, allowing PostgreSQL to decide whether to continue recovery based on the recovery_target_time.
If the timestamp in an XLOG record exceeds the recovery_target_time, the PITR process concludes.

timelineId and Timeline History File

A timeline in PostgreSQL is used to distinguish between the original database cluster and the recovered ones.
It is a central concept of PITR.

timelineId

Each timeline is given a corresponding timelineId, a 4-byte unsigned integer starting at 1.
An individual timelineId is assigned to each database cluster.
The timelineId of the original database cluster created by the initdb utility is 1. Whenever a database cluster recovers, the timelineId is increased by 1.

Relation of timelineId between an original and a recovered database clusters in PostgreSQL is depicted in the figure below:

** Relation of WAL segment files between an original and a recovered database clusters** in PostgreSQL is depicted in the figure below:

Timeline History File

When a PITR process completes, a timeline history file with names like '00000002.history' is created under the archival directory and the pg_xlog subdirectory (in versions 10 or later, pg_wal subdirectory). This file records which timeline it branched off from and when.
The timeline history file contains at least one line, and each line is composed of the following three items:
(1) timelineId – The timelineId of the archive logs used to recover.
(2)LSN – The LSN location where the WAL segment switches happened.
(3) reason – A human-readable explanation of why the timeline was changed.

Point-in-Time Recovery with Timeline History File

The timeline history file plays an important role in the second and subsequent PITR processes.
The parameter recovery_target_time sets the time you made the new mistake, and the parameter recovery_target_timeline is set at '2' in order to recover along its timeline.

Recover the database at 12:15:00 along the timelineId 2 in PostgreSQL is depicted in figure below:

(1) PostgreSQL reads the value of 'CHECKPOINT LOCATION' from the backup_label file.
(2) Some values of parameters are read from the recovery.conf; in this example, restore_command, recovery_target_time, and recovery_target_timeline.
(3) PostgreSQL reads the timeline history file '00000002.history' which is corresponding to the value of the parameter recovery_target_timeline.
(4) PostgreSQL does replaying WAL data by the following steps:
1. From the REDO point to the LSN '0/A000198', which is written in the 00000002.history file, PostgreSQL reads and replays WAL data of appropriate archive logs whose timelineId is 1.
1. From the one after LSN '0/A000198' to the one before the timestamp '2023-9-15 12:15:00', PostgreSQL reads and replays WAL data (of appropriate archive logs) whose timelineId is 2.
(5) When the recovery process completes, the current timelineId will advance to 3, and a new timeline history file named 00000003.history is created in the pg_wal subdirectory (pg_xlog if versions 9.6 or earlier) and the archival directory.
When you do PITR more than once, you should explicitly set a timelineId for using the appropriate timeline history file.
In this way, timeline history files are not only history logs of database cluster, but also the recovery instruction documents for PITR process.