HRmemon

Posted on Jul 9, 2023

PostgreSQL: Base Backup and Point-in-Time Recovery Explained

#postgres #beginners #programming

In this era of innovation, we find ourselves on the brink of extraordinary discoveries and transformative breakthroughs. From groundbreaking technologies to scientific marvels, our world is constantly evolving, pushing the boundaries of what we once believed was impossible. In this journey of knowledge and imagination, we explore the wonders that await us.

In this content, we delve into the realm of database backup and recovery, specifically focusing on PostgreSQL’s base backup and Point-in-Time Recovery (PITR) functionalities. We’ll uncover the power of base backups in capturing a consistent state of databases and the advantages they offer, especially for large-scale systems. Furthermore, we’ll unravel the remarkable capabilities of PITR, which allow us to restore databases to specific points in time, offering a safety net against mistakes and unforeseen issues.

Join us as we navigate through the intricate concepts of base backup and PITR, understanding their significance and learning how to leverage these features in PostgreSQL. The world of data management holds immense potential, and it’s time to unlock its secrets.

Understanding Base Backup

When it comes to online database backup, PostgreSQL offers two primary methods: logical and physical backups. While logical backups have their advantages, they can be time-consuming, especially for large databases. This is where physical backups come to the rescue. PostgreSQL introduced online physical full backup support in version 8.0, and the snapshot of a running database cluster, known as a base backup, forms the foundation of this backup approach.

A base backup allows you to capture a consistent state of your database at a particular point in time. It enables you to back up and restore large databases efficiently, saving valuable time in critical scenarios. This feature is highly beneficial for practical systems that require fast and reliable backup and restoration capabilities.

Exploring Point-in-Time Recovery (PITR)

In addition to base backups, PostgreSQL provides Point-in-Time Recovery (PITR) functionality, starting from version 8.0. PITR allows you to restore your database cluster to any specific point in time using a base backup and the archive logs generated by continuous archiving.

Imagine making a critical mistake in your database, such as accidentally truncating all tables. With PITR, you can restore your database to the state just before the mistake occurred, effectively undoing the damage caused by human error or system failures. This capability offers valuable peace of mind and safeguards your data against unforeseen issues.

Key Concepts in Chapter 10

In Chapter 10 of PostgreSQL, we delve deeper into the intricacies of Base Backup and Point-in-Time Recovery. Here's an overview of the topics covered:

1. Base Backup

We start by exploring the standard procedure for creating a base backup using low-level commands. This procedure involves three key steps:

Execute the pg_backup_start command (or pg_start_backup in versions prior to 14).
Take a snapshot of the database cluster using the archiving command of your choice.
Conclude the backup process by issuing the pg_backup_stop command (or pg_stop_backup in versions prior to 14).

This simple procedure is designed to be user-friendly, allowing database system administrators to create base backups using common tools like the copy command or similar archiving tools. Notably, this approach doesn't require table locks, ensuring uninterrupted query execution for all users—a significant advantage over other open-source RDBMS solutions.

For those seeking an even simpler method, PostgreSQL offers the pg_basebackup utility, which internally employs the low-level commands mentioned above.

2. pg_backup_start (pg_start_backup)

The pg_backup_start command serves as the initial step in preparing for a base backup. It performs several crucial operations, including:

Enabling full-page write mode.
Switching to the current Write-Ahead Log (WAL) segment file (versions 8.4 and later).
Performing a checkpoint to create a REDO point explicitly.
Creating a backup_label file, located in the top-level base directory. This file contains vital information about the base backup, such as the checkpoint location.

The backup_label file holds essential details, including the checkpoint location, backup method, backup source (primary or standby), start time, label, and start timeline. These details play a vital role in the recovery process and ensure the integrity of the restored database.

3. pg_backup_stop (pg_stop_backup)

The pg_backup_stop command is the final step in completing a base backup. It carries out the following operations:

Reverting to non-full-page write mode if it was changed during pg_backup_start.
Writing a XLOG record to mark the end of the backup.
Switching the Write-Ahead Log (WAL) segment file.
Creating a backup history file, which contains the contents of the backup_label file along with the timestamp of the pg_backup_stop execution.
Deleting the backup_label file, as it is no longer required in the original database cluster once the backup is complete.

By executing these operations, the backup process is finalized, and you are left with a comprehensive base backup that can be used for point-in-time recovery and other crucial tasks.

How Point-in-Time Recovery Works

In PostgreSQL Chapter 10, we explore the inner workings of Point-in-Time Recovery (PITR). PITR allows PostgreSQL to replay Write-Ahead Log (WAL) data from archive logs onto a base backup, starting from the REDO point created by the pg_backup_start command, up to the desired recovery target.

Let's understand how PITR works through the following steps:

Identifying the Recovery Target: Suppose you made a mistake at 12:05 GMT on July 9, 2023. To recover from this mistake, you need to remove the existing database cluster and restore it using a base backup created before the mistake occurred. Additionally, you'll set the restore_command parameter and the recovery_target_time parameter in either the recovery.conf file (version 11 or earlier) or the postgresql.conf file (version 12 or later).
Enabling PITR Mode: When PostgreSQL starts up, it enters PITR mode if it detects a recovery.conf file (version 11 or earlier) or a recovery.signal file (version 12 or later), along with a backup_label in the database cluster.
Replaying WAL Data: PostgreSQL replays the WAL data from the base backup's REDO point to the specified recovery target. The WAL data is read from the archive logs, which are copied from the archival directory to a temporary area using the command specified in the restore_command parameter. In the example, PostgreSQL replays the WAL data until the timestamp '2023-7-9 12:05:00', as specified by the recovery_target_time parameter. If no recovery target is set, PostgreSQL will replay the WAL data until the end of the archive logs.
Completion and Timeline History: Once the recovery process is complete, PostgreSQL creates a timeline history file, such as '00000002.history', in the pg_xlog subdirectory (or pg_wal subdirectory in version 10 or later). If the archiving log feature is enabled, the same named file is also created in the archival directory. This file plays a crucial role in managing the timeline history of the database.

During the replay process, PostgreSQL checks the timestamps of commit and abort actions recorded in the XLOG data. If a recovery target time is set, PostgreSQL compares the target time with each timestamp and decides whether to continue the recovery or not. If a timestamp exceeds the recovery target time, the PITR process concludes.

It's important to note that the PITR process closely resembles the normal recovery process described in Chapter 9. However, there are two key differences: the source of WAL segments/archive logs and the location from which the checkpoint location is read.

In PITR mode, WAL segments and archive logs are read from the archival directory specified in the archive_command parameter, rather than the pg_xlog or pg_wal subdirectory under the base directory. Additionally, the checkpoint location is read from the backup_label file, as opposed to the pg_control file used in normal recovery mode.

By following these steps, PostgreSQL achieves Point-in-Time Recovery, enabling precise restoration of the database cluster to a specific point in time.

Stay tuned as we continue to explore more topics and features in PostgreSQL. The world of database management is full of exciting possibilities!

timelineId and Timeline History File

In PostgreSQL, timelines play a crucial role in distinguishing between the original database cluster and the recovered ones during Point-in-Time Recovery (PITR). This section explores two key aspects associated with timelines: timelineId and timeline history files.

10.3.1. timelineId

Each timeline in PostgreSQL is assigned a unique timelineId, represented by a 4-byte unsigned integer starting at 1.

A timelineId is specific to each database cluster. The original database cluster created using the initdb utility has a timelineId of 1. When a database cluster is recovered, the timelineId is incremented by 1. For example, in the previous example, the recovered cluster had a timelineId of 2.

To understand the PITR process in terms of timelineId, let's consider the following steps:

Remove the current database cluster and restore a base backup from the past to return to the starting point of recovery. This is represented by the red arrow curve in following figure:

Start the PostgreSQL server, which replays the WAL data from the archive logs, starting from the REDO point created by pg_backup_start and continuing until the recovery target, following the initial timeline (timelineId 1). This is represented by the blue arrow line in the above figure
Assign a new timelineId (in this case, 2) to the recovered database cluster, and PostgreSQL continues running on the new timeline.

Above figure illustrates the relationship between the timelineId of the original and recovered database clusters. It shows how the timelineId changes during the recovery process, reflecting the transition from the original cluster to the recovered one.

In Chapter 9, we briefly mentioned that the first 8 digits of a WAL segment filename correspond to the timelineId of the respective segment. When the timelineId changes, the WAL segment filenames are also updated accordingly.

To provide further clarity on the recovery process, let's consider an example where we recover the database cluster using two archive logs: '000000010000000000000009' and '00000001000000000000000A'. In this case, the newly recovered database cluster is assigned the timelineId 2, and PostgreSQL creates WAL segments starting from '00000002000000000000000A'. The following figure illustrates this situation.

10.3.2. Timeline History File

Once a PITR process is complete, PostgreSQL creates a timeline history file with a name like '00000002.history'. This file is stored in both the archival directory and the pg_xlog subdirectory (or pg_wal subdirectory in version 10 or later). The timeline history file records the timeline from which it branched off and the timestamp of the branching.

The naming rule for a timeline history file follows this pattern:

"8-digit new timelineId".history

A timeline history file consists of at least one line, and each line contains the following three items:

timelineId: The timelineId of the archive logs used for recovery.
LSN: The Log Sequence Number (LSN) location where the WAL segment switch occurred.
reason: A human-readable explanation of why the timeline was changed.

Here's an example of a timeline history file:

postgres> cat /home/postgres/archivelogs/00000002.history
1     0/A000198   before 2023-7-9 12:05:00.861324+00

The example indicates the following information:

The recovered database cluster (timelineId 2) is based on the base backup with a timelineId of 1 and is recovered up until just before '2023-7-9 12:05:00.861324+00' by replaying the archive logs until the LSN 0/A000198.

Each timeline history file provides a complete history of an individual recovered database cluster. Moreover, these files are essential for the PITR process itself. The details of their usage will be explained in the following sections.

By understanding timelineId and the significance of timeline history files, you gain insights into the history and progress of database recovery during PITR in PostgreSQL.

Stay tuned for the next sections, where we'll continue exploring the fascinating world of PostgreSQL and its advanced features!

Note: The usage of recovery.conf was replaced by recovery.signal (touch /usr/local/pgsql/data/recovery.signal) and postgresql.conf starting from version 12.

Point-in-Time Recovery with Timeline History File

In Point-in-Time Recovery (PITR) processes, the timeline history file plays a crucial role, particularly in the second and subsequent recovery attempts. Let's explore how the timeline history file is utilized in a second recovery scenario.

Suppose you made a mistake at 12:15:00 in the recovered database cluster with a timelineId of 2. To recover the database cluster at the target time along timelineId 2, you need to create a new recovery.conf file with the following configuration:

restore_command = 'cp /mnt/server/archivedir/%f %p'
recovery_target_time = "2023-7-9 12:15:00 GMT"
recovery_target_timeline = 2

By restarting the PostgreSQL server and entering PITR mode, you can initiate the recovery process. The process is illustrated in the following figure

![ Recover the database at 12:15:00 along the timelineId 2.](

)

Here's a step-by-step breakdown of the recovery process:

PostgreSQL reads the value of 'CHECKPOINT LOCATION' from the backup_label file.
Relevant values from the recovery.conf file are read, including restore_command, recovery_target_time, and recovery_target_timeline.
PostgreSQL reads the timeline history file '00000002.history', corresponding to the recovery_target_timeline parameter.
PostgreSQL replays the Write-Ahead Log (WAL) data by following these steps:
- From the REDO point to the LSN '0/A000198' recorded in the 00000002.history file, PostgreSQL reads and replays the appropriate archive logs belonging to timelineId 1.
- From the point after LSN '0/A000198' to the point just before the timestamp '2023-7-9 12:15:00', PostgreSQL reads and replays the appropriate archive logs belonging to timelineId 2.
When the recovery process is complete, the current timelineId advances to 3. A new timeline history file named '00000003.history' is created in the pg_xlog subdirectory (or pg_wal subdirectory for version 10 or later) and the archival directory.

An example of the new timeline history file '00000003.history' is as follows:

postgres> cat /home/postgres/archivelogs/00000003.history
1         0/A000198     before 2023-7-9 12:05:00.861324+00
2         0/B000078     before 2023-7-9 12:15:00.927133+00

When performing PITR multiple times, it is essential to explicitly set the recovery_target_timeline parameter to use the appropriate timeline history file.

Timeline history files serve as both historical logs of the database cluster and recovery instruction documents for the PITR process. They provide valuable insights into the recovery history and guide PostgreSQL in determining the appropriate replay of archive logs.

By leveraging timeline history files, PostgreSQL ensures accurate and effective Point-in-Time Recovery, giving you the ability to recover your database cluster to specific points in time.

Stay tuned as we continue our journey through PostgreSQL and its advanced features!

Conclusion

As we conclude our exploration of PostgreSQL's base backup and Point-in-Time Recovery, we stand amazed by the immense capabilities and resilience offered by these features. The ability to efficiently capture a consistent state of databases through base backups and restore them to precise points in time through PITR showcases the remarkable advancements in database management.

With base backups, the process of backing up and restoring large databases becomes streamlined and time-efficient, providing practical systems with fast and reliable backup and restoration capabilities. The simplicity and user-friendliness of the base backup procedure, along with the option to use the convenient pg_basebackup utility, make it an invaluable tool for database administrators.

Moreover, Point-in-Time Recovery grants us the power to undo critical mistakes, whether caused by human error or system failures. The ability to roll back a database to a specific moment in time offers peace of mind and ensures the integrity of our data.

Throughout this journey, we have delved into the intricacies of base backup and PITR, exploring the essential steps, commands, and concepts associated with these processes. From understanding the snapshot-like nature of base backups to comprehending the role of timeline history files in PITR, we have gained insights into the inner workings of PostgreSQL's data recovery mechanisms.

As we continue to embrace the possibilities that lie ahead, let us harness the power of base backup and Point-in-Time Recovery in PostgreSQL, ensuring the security, reliability, and flexibility of our databases. The world of data management awaits us with endless opportunities, and armed with this knowledge, we are ready to embark on new and exciting endeavours.

Reference

Chapter 10: Base Backup & Point-in-Time Recovery
Streamlined and grammar correction by ChatGPT

Top comments (1)

John Donath • Oct 9

What a helpful and clearifying document!!!

DEV Community