DEV Community

Cover image for Summary of Chapter# 10: "Base Backup & Point-in-Time Recovery" from the book "The Internals of PostgreSQL" Part-2
Vinay Kumar Talreja
Vinay Kumar Talreja

Posted on • Edited on

Summary of Chapter# 10: "Base Backup & Point-in-Time Recovery" from the book "The Internals of PostgreSQL" Part-2

This blog aims to assist you in understanding the final concepts of Chapter:10 [Base Backup & Point-in-Time Recovery] from the book The Internals of PostgreSQL.

Note: Ensure that you have a thorough understanding of
Chapter 10 Part-1 and basics of PostgreSQL before we proceed to Chapter 10 Part-2, as it forms the foundation for our exploration.

So, Let's Continue:

Differences in PITR vs. Normal Recovery

WAL segments/Archive logs source

  • Normal recovery mode: pg_wal subdirectory (or pg_xlog in versions 9.6 or earlier).

  • PITR mode: Archive logs are read from an archival directory defined in archive_command.

Checkpoint location source

  • Normal recovery mode: Read from the pg_control file.

  • PITR mode: Read from a backup_label file.

PITR Process Steps:

  • Read the 'CHECKPOINT LOCATION' from the backup_label file to find the REDO point.

  • Read relevant parameters like restore_command and recovery_target_time from postgresql.conf (versions 12 or later) or recovery.conf (versions 11 or earlier).

  • Start replaying WAL data from the REDO point, obtained from the 'CHECKPOINT LOCATION.' WAL data is read from archive logs copied to a temporary area using the restore_command parameter.

  • Replay continues until reaching the specified recovery_target_time or the end of the archiving logs if no target time is set.

  • When recovery completes, a timeline history file (e.g., '00000002.history') is created in the pg_wal (or pg_xlog) subdirectory. If the archiving log feature is enabled, a similar file is created in the archival directory.

  • Commit and abort actions in WAL records include timestamps, allowing PostgreSQL to decide whether to continue recovery based on the recovery_target_time.

  • If the timestamp in an XLOG record exceeds the recovery_target_time, the PITR process concludes.


timelineId and Timeline History File

  • A timeline in PostgreSQL is used to distinguish between the original database cluster and the recovered ones.

  • It is a central concept of PITR.

timelineId

  • Each timeline is given a corresponding timelineId, a 4-byte unsigned integer starting at 1.

  • An individual timelineId is assigned to each database cluster.

  • The timelineId of the original database cluster created by the initdb utility is 1. Whenever a database cluster recovers, the timelineId is increased by 1.

Relation of timelineId between an original and a recovered database clusters in PostgreSQL is depicted in the figure below:

Image description

** Relation of WAL segment files between an original and a recovered database clusters** in PostgreSQL is depicted in the figure below:

Image description

Timeline History File

  • When a PITR process completes, a timeline history file with names like '00000002.history' is created under the archival directory and the pg_xlog subdirectory (in versions 10 or later, pg_wal subdirectory). This file records which timeline it branched off from and when.

  • The timeline history file contains at least one line, and each line is composed of the following three items:

  • (1) timelineId – The timelineId of the archive logs used to recover.

  • (2)LSN – The LSN location where the WAL segment switches happened.

  • (3) reason – A human-readable explanation of why the timeline was changed.


Point-in-Time Recovery with Timeline History File

  • The timeline history file plays an important role in the second and subsequent PITR processes.

  • The parameter recovery_target_time sets the time you made the new mistake, and the parameter recovery_target_timeline is set at '2' in order to recover along its timeline.

Recover the database at 12:15:00 along the timelineId 2 in PostgreSQL is depicted in figure below:

Image description

  • (1) PostgreSQL reads the value of 'CHECKPOINT LOCATION' from the backup_label file.

  • (2) Some values of parameters are read from the recovery.conf; in this example, restore_command, recovery_target_time, and recovery_target_timeline.

  • (3) PostgreSQL reads the timeline history file '00000002.history' which is corresponding to the value of the parameter recovery_target_timeline.

  • (4) PostgreSQL does replaying WAL data by the following steps:

    1. From the REDO point to the LSN '0/A000198', which is written in the 00000002.history file, PostgreSQL reads and replays WAL data of appropriate archive logs whose timelineId is 1.
    1. From the one after LSN '0/A000198' to the one before the timestamp '2023-9-15 12:15:00', PostgreSQL reads and replays WAL data (of appropriate archive logs) whose timelineId is 2.
  • (5) When the recovery process completes, the current timelineId will advance to 3, and a new timeline history file named 00000003.history is created in the pg_wal subdirectory (pg_xlog if versions 9.6 or earlier) and the archival directory.

  • When you do PITR more than once, you should explicitly set a timelineId for using the appropriate timeline history file.

  • In this way, timeline history files are not only history logs of database cluster, but also the recovery instruction documents for PITR process.


I hope, this blog has helped you in understanding the final concepts of Base Backup & Point-in-Time Recovery in PostreSQL.

Check out summary of Chapter : 11 Part-1

If you want to understand PostgreSQL In-Depth.

Top comments (0)