DEV Community

HRmemon
HRmemon

Posted on

Transaction Log and WAL Segment Files

Welcome to another exciting chapter of our PostgreSQL blog series! In this chapter, we will dive into the fascinating world of transaction logs and WAL (Write-Ahead Logging) segment files. Understanding the internals of PostgreSQL's transaction log is crucial for database administrators and developers alike, so let's get started.

9.2 Transaction Log and WAL Segment Files

The transaction log in PostgreSQL serves as a crucial component for ensuring data integrity and providing the ability to recover from failures. Logically, PostgreSQL writes XLOG (transaction log) records into a virtual file that has an impressive length of 8 bytes (16 ExaBytes). However, handling such a massive file is not practical or feasible. To overcome this challenge, PostgreSQL divides the transaction log into smaller files known as WAL (Write-Ahead Logging) segments, each typically 16 MB in size.

The size of WAL segment files can be configured using the --wal-segsize option during the creation of a PostgreSQL cluster using the initdb command (version 11 or later). By dividing the transaction log into manageable segments, PostgreSQL ensures efficient utilization of disk space and simplifies the management of log files.

Let's take a closer look at the naming convention of WAL segment files. The file name is a hexadecimal 24-digit number generated based on the timelineId and the LSN (Log Sequence Number). The timelineId is a 4-byte unsigned integer that is primarily used for Point-in-Time Recovery (PITR) purposes. For the sake of simplicity, we'll consider the timelineId as fixed to 0x00000001 for this chapter.

The first WAL segment file is named 000000010000000000000001. As each segment is filled with XLOG records, the subsequent files are named in ascending order. For example, if the first segment is completely used, the next segment will be named 000000010000000000000002. This naming convention continues in a sequential manner, incrementing the middle 8-digit number whenever the last 2 digits reach their maximum value (0xFF).

9.3 Internal Layout of WAL Segment

Now that we have an understanding of how WAL segment files are named and divided, let's explore their internal layout. A WAL segment file, typically 16 MB in size, is internally divided into pages of 8192 bytes (8 KB). The first page of the segment contains a header defined by the structure XLogLongPageHeaderData, while all subsequent pages contain the page information defined by the structure XLogPageHeaderData. XLOG records are written into each page in descending order, starting from the beginning of the page.

By dividing the WAL segment file into pages, PostgreSQL ensures efficient storage and retrieval of XLOG records. The page header information provides crucial details for organizing and managing the log data effectively.

9.4 Internal Layout of XLOG Record

An XLOG record consists of a header portion and associated data portions. Let's start by examining the structure of the header portion. In this chapter, we'll focus on the structure used in PostgreSQL versions 9.4 and earlier, as it underwent changes in version 9.5.

The XLOG record's header portion is defined by the structure XLogRecord, which contains several essential fields:

typedef struct XLogRecord
{
   uint32          xl_tot_len;   /* total length of the entire record */
   TransactionId   xl_xid;       /* transaction ID */
   uint32          xl_len;       /* total length of rmgr data */
   uint8           xl_info;      /* flag bits, see below */
   RmgrId          xl_rmid;      /* resource manager for this record */
   XLogRecPtr      xl_prev;      /* pointer to the previous record in the log */
   pg_crc32        xl_crc;       /* CRC for this record */
} XLogRecord;
Enter fullscreen mode Exit fullscreen mode

The XLogRecord structure represents the general header portion of an XLOG record. It contains information such as the total length of the record, the transaction ID, the length of resource manager (rmgr) data, flag bits, the resource manager ID, a pointer to the previous record in the log, and a CRC value for integrity checks.

The xl_rmid and xl_info fields are related to resource managers, which are responsible for handling operations associated with WAL features, such as writing and replaying XLOG records. Each resource manager has its own identifier (RmgrId), and the number of resource managers tends to increase with each PostgreSQL version.

Resource managers handle specific operations within the XLOG records. For example, when an INSERT statement is issued, the xl_rmid and xl_info fields in the XLOG record's header portion are set to 'RM_HEAP' and 'XLOG_HEAP_INSERT,' respectively. During database recovery, the respective resource manager's function, such as heap_xlog_insert(), replays the XLOG record.

It's worth noting that in version 9.5 and later, the XLogRecord structure was modified to refine the XLOG record format. One of the changes involved the removal of the xl_len field, which reduced the record's size by a few bytes.

Now that we have explored the structure of the XLOG record's header portion, let's move on to the data portion, which underwent changes in version 9.5.

9.4.3 Data Portion of XLOG Record (version 9.5 or later)

In PostgreSQL version 9.4 and earlier, XLOG records did not have a common format. Each resource manager had to define its own format, which made it challenging to maintain the source code and implement new WAL-related features. To address this issue, starting from version 9.5, a common structured format that does not depend on resource managers was introduced.

The data portion of an XLOG record in version 9.5 and later can be divided into two parts: the header and the data. Let's take a look at the common XLOG record format shown in Fig. 9.9.

The header part of the data portion can contain zero or more XLogRecordBlockHeaders and zero or one XLogRecordDataHeaderShort (or XLogRecordDataHeaderLong). At least one of these headers must be present. If the record stores a full-page image (backup block), the XLogRecordBlockHeader includes an XLogRecordBlockImageHeader. If the block is compressed, it also includes an XLogRecordBlockCompressHeader.

The data part consists of zero or more block data and zero or one main data. The block data corresponds to the XLogRecordBlockHeaders, while the main data corresponds to the XLogRecordDataHeader.

Starting from version 9.5, PostgreSQL introduced the option to compress full-page images within XLOG records using the LZ compression method. Enabling this feature by setting the wal_compression parameter to enable can provide several benefits. It reduces the I/O cost for writing records and decreases the consumption of WAL segment files. However, it's important to note that enabling compression consumes additional CPU resources.

Let's explore a few examples of XLOG records in version 9.5 or later, as shown in Fig. 9.10.

9.4.3.1 Backup Block

A backup block created by an INSERT statement is illustrated in Fig. 9.10(a). It consists of four data structures and one data object:

  • The structure XLogRecord (header portion)
  • The structure XLogRecordBlockHeader, including an XLogRecordBlockImageHeader
  • The structure XLogRecordDataHeaderShort
  • A backup block (block data)
  • The structure xl_heap_insert (main data)

The XLogRecordBlockHeader contains variables that identify the block in the database cluster, such as the relfilenode, fork number, and block number. The XLogRecordBlockImageHeader includes the length of the block and its offset number. These two header structures can store the same data as the BkpBlock used in version 9.4.

The XLogRecordDataHeaderShort stores the length of the xl_heap_insert structure, which is the main data of the record. The main data of a backup block record is not used except in special cases, such as logical decoding and speculative insertions. During replay, this redundant data is ignored. Improvements in this area may be considered in the future.

It's important to note that the main data of backup block records depends on the statements that create them. For example, an UPDATE statement appends xl_heap_lock or xl_heap_updated.

9.4.3.2 Non-Backup Block

Next, let's examine a non-backup block record created by an INSERT statement, as shown in Fig. 9.10(b). It consists of four data structures and one data object:

  • The structure XLogRecord (header portion)
  • The structure XLogRecordBlockHeader
  • The structure XLogRecordDataHeaderShort
  • An inserted tuple (specifically, an xl_heap_header structure and the entire inserted data)
  • The structure xl_heap_insert (main data)

The XLogRecordBlockHeader contains the relfilenode, fork number, and block number to specify the block into which the tuple is inserted, as well as the length of the data portion of the inserted tuple. The XLogRecordDataHeaderShort stores the length of the new xl_heap_insert structure, which serves as the main data of this record.

The new xl_heap_insert structure only contains two values: the offset number of the tuple within the block and a visibility flag. The simplification of the xl_heap_insert structure was possible because most of the data contained in the old structure is already stored in the XLogRecordBlockHeader.

Finally, let's consider an example of a checkpoint record, as shown in Fig. 9.10(c). It is composed of three data structures:

  • The structure XLogRecord (header portion)
  • The structure XLogRecordDataHeaderShort, containing the length of the main data
  • The structure CheckPoint (main data)

The xl_heap_header structure is defined in src/include/access/htup.h, while the CheckPoint structure is defined in src/include/catalog/pg_control.h.

Although the new format may seem a bit complex, it is well-designed to facilitate parsing by resource managers. Additionally, the size of many types of XLOG records is usually smaller than in the previous format. You can calculate the sizes of these records and compare them using the information provided in Figs. 9.8 and 9.10. It's worth noting that the new checkpoint record may have a larger size compared to the previous format due to the inclusion of additional variables.

In the next section, we'll explore further aspects of XLOG records, including the handling of block images and non-backup blocks in more detail. Stay tuned for an insightful continuation of our exploration into PostgreSQL's transaction log and WAL segment files.

References

Top comments (0)