Internals of PostgreSQL: Chapter 6: Vacuum Processing

#apacheage #postgressql #opensource #database

In PostgreSQL, vacuum processing is a vital maintenance procedure that guarantees the database's effective operation. Dead tuples are mostly removed, and transaction ids are frozen. An overview of vacuum processing and its many components is given in this chapter.

6.1 Concurrent Vacuum Outline:
The concurrent VACUUM process, which involves the following tasks for certain tables or all of the database's tables, is high-level summarized in this section.

Defragmenting live tuples for each page, eliminating index tuples referring to dead tuples, and removing dead tuples are all included in the process of removing dead tuples.
Old transaction ids that need to be frozen: Update frozen txid-related system catalogues and freeze old transaction ids of tuples, if necessary.
removing any extraneous clog components if at all possible.
Updating the processed tables' Free Space Map (FSM) and Visibility Map (VM).
Updating a number of hoover processing-related statistics and system catalogues.

Each table has three blocks, one for processing the freeze and removing index tuples, one for removing dead tuples and updating FSM/VM on a page-by-page basis, and one for cleanup and updating statistics/system catalogues. The process is organized as pseudocode. Final steps include post-processing activities including statistics and system catalogue updates and clog file removal.

6.2 Visibility Map:
To increase the effectiveness of vacuum processing, PostgreSQL version 8.4 added the Visibility Map (VM). The VM is a per-table map that keeps track of each page's visibility in the table file. During vacuum processing, it aids in skipping pages without dead tuples, lowering overall costs.

The VM functions by letting vacuum processing know which pages contain dead tuples so that those pages can be skipped. Each VM is kept in its own file with the 'vm' suffix and is made up of one or more 8 KB pages. The table file, as well as the Free Space Map (FSM) file, are linked to the VM file.

6.3 Freeze Processing:
In a vacuum, there are two freeze processing modes: the leisurely mode and the eager mode.

6.3.1 Lazy Mode:
Using the target tables' Visibility Maps (VM), lazy mode freeze processing only searches pages that have dead tuples. The oldest transaction id (OldestXmin) less the vacuum_freeze_min_age configuration parameter is used to determine the freezeLimit txid. Tuples whose transaction ids (t_xmin) are less than the freezeLimit txid are frozen by the freeze procedure.

Given that it can skip pages without dead tuples, the lazy mode might not completely freeze all of the tuples. During this mode, the statistics pertaining to vacuuming are updated.

6.3.2 Eager Mode:
The eager mode makes up for lazy mode's drawbacks. Regardless of the presence of dead tuples, it analyses every page, updates the relevant system catalogues, deletes pointless clog files, and does thorough freeze processing.

When the vacuum_freeze_table_age configuration parameter is less than OldestXmin, eager mode is activated (the oldest frozen transaction id for the database). All pages are scanned in this mode, and each page has duties related to freezing and updating completed.

Other vacuum processing-related topics are briefly covered in this chapter, including the parallel vacuuming of indexes and index phase cleanup that is possible with the VACUUM command (as of version 13). It makes it clear that autovacuum does not support the PARALLEL option, which is unique to the VACUUM command.

It also notes the addition of information about page visibility and whether or not tuples are frozen to the Visibility Map (VM) in version 9.6. During vacuum processing, the enhanced VM offers more effective visibility checking.

The chapter covers the Free Space Map's (FSM) significance in vacuum processing. Each table's pages are tracked by the FSM, allowing vacuum to identify which pages need additional processing. It describes how the accuracy of the FSM is maintained by updating it during hoover operations.

The effect of vacuum processing on database performance is also discussed in this chapter. It highlights the necessity of frequent vacuuming to avoid bloat and keep up good query efficiency. It emphasizes the importance of configuring the hoover system with the proper parameters based on the workload and database consumption patterns.

The autovacuum procedure, which automates the execution of hoover operations based on established thresholds and configuration parameters, is the chapter's final topic. It illustrates how autovacuum minimizes the requirement for manual intervention by operating in the background to ensure continual database upkeep.