DEV Community

Youssef
Youssef

Posted on

PostgreSQL Summary Pt8

Continuing on part 7, i'll be discussing XLOG records, and the part they play in database recovery.

Writing of XLOG Records

We'll explore the internal function exec_simple_query() by invoking the following query: INSERT INTO tbl VALUES ('A');

The pseudo code of exec_simple_query():

exec_simple_query() @postgres.c

(1) ExtendCLOG() @clog.c                  /* Write the state of this transaction
                                           * "IN_PROGRESS" to the CLOG.
                                           */
(2) heap_insert()@heapam.c                /* Insert a tuple, creates a XLOG record,
                                           * and invoke the function XLogInsert.
                                           */
(3)   XLogInsert() @xlog.c (9.5 or later, xloginsert.c)
                                          /* Write the XLOG record of the inserted tuple
                                           *  to the WAL buffer, and update page's pd_lsn.
                                           */
(4) finish_xact_command() @postgres.c     /* Invoke commit action.*/   
      XLogInsert() @xlog.c  (9.5 or later, xloginsert.c)
                                          /* Write a XLOG record of this commit action 
                                           * to the WAL buffer.
                                           */
(5)   XLogWrite() @xlog.c                 /* Write and flush all XLOG records on 
                                           * the WAL buffer to WAL segment.
                                           */
(6) TransactionIdCommitTree() @transam.c  /* Change the state of this transaction 
                                           * from "IN_PROGRESS" to "COMMITTED" on the CLOG.
                                           */
Enter fullscreen mode Exit fullscreen mode

Explanation:

  1. ExtendCLOG() writes the state of this transaction IN_PROGRESS in the (in-memory) CLOG.
  2. heap_insert() inserts a heap tuple into the target page on the shared buffer pool, creates this page's XLOG record, and invokes XLogInsert()
  3. XLogInsert() writes the XLOG record created by the heap_insert() to the WAL buffer at LSN_1, and then updates the modified page's pd_lsn from LSN_0 to LSN_1
  4. finish_xact_command(), which is invoked to commit this transaction, creates this commit action's XLOG record, and then XLogInsert() writes this record into the WAL buffer at LSN_2.
  5. XLogWrite() writes and flushes all XLOG records on the WAL buffer to the WAL segment file. If the parameter wal_sync_method is set to 'open_sync' or 'open_datasync', the records are synchronously written because the function writes all records with the open() system call specified the flag O_SYNC or O_DSYNC. If the parameter is set to 'fsync''fsync_writethrough' or 'fdatasync', the respective system call – fsync()fcntl() with F_FULLFSYNC option, or fdatasync() – will be executed. In any case, all XLOG records are ensured to be written into the storage
  6. TransactionIdCommitTree() changes the state of this transaction from IN_PROGRESS to COMMITTED on the CLOG.

WAL Writer Process

It is a background process to check the WAL buffer periodically and write all unwritten XLOG records into the WAL segments.

The purpose of this process is to improves performance by avoiding bursts of of XLOG records writes when a large amount of data are committed at once. 

This process cannot be disabled, and it checks the WAL buffer every 200 ms by default, this interval can by changed using the configuration parameter wal_writer_delay.'

Checkpoint Processing in PostgreSQL

Checkpointing in PostgreSQL is performed by the checkpointer background process, it has two responsibilities:

  1. Preparation of database recovery.
  2. Cleaning of dirty pages on the shared buffer pool.

It starts when one of the following occurs:

  1. A superuser issues CHECKPOINT command manually.
  2. The interval time set for checkpoint_timeout from the previous checkpoint has exceeded 300secs.
  3. (version 9.4 or earlier) The number of WAL segment files (3 by default) set for checkpoint_segments has been consumed since the previous checkpoint.
  4. (version 9.5 or later) The total size of the WAL segment files in the pg_xlog (called pg_wal in version 10 or later) has exceeded the value of the parameter max_wal_size (1GB (64 files) by default).
  5. PostgreSQL server stops in smart or fast mode.

pg_control File

This file contains fundamental information required by the checkpointer process, which is essential for database recovery.

This file stores over 40 items, however i'll be talking about three specific items:

  1. State: The state of database server at the time of the latest checkpointing starts.
    There are seven states in total, some of which are:
    1. start up is the state that system is starting up.

    1. shut down is the state that system is going down normally by the shutdown command
    2. in production is the state that system is running.
  2. Latest checkpoint location: LSN Location of the latest checkpoint record. 

  3. Prior checkpoint location (deprecated in version 11): LSN Location of the prior checkpoint record .

A pg_control file is stored in the global subdirectory under the base-directory; its contents can be shown using the pg_controldata utility.

To summarize

The checkpointer creates the checkpoint record which contains the REDO point, stores the checkpoint location into the pg_control file.
This allows PostgreSQL to recover itself by replaying WAL data from the REDO point which is obtained from the checkpoint record provided by the pg_control file.

References

  1. https://www.interdb.jp/pg/pgsql09.html

Top comments (0)