DEV Community

Youssef
Youssef

Posted on

PostgreSQL Summary Pt6

In this part i'll be discussing the VACUUM process in Postgres.

VACUUM is a maintenance process that facilitates the persistent operation of PostgreSQL. Its two main tasks are removing dead tuples and freezing transaction ids.

To remove dead tuples, vacuum processing provides two modes:

  1. Concurrent VACUUM: Often called VACUUM, removes dead tuples for each page of the table file. Other transactions can read the table while this process is running.
  2. Full VACUUM: Removes dead tuples and defragments live tuples. Other transactions *cannot access tables while Full VACUUM is running.

Outline of Concurrent VACUUM

Vacuum processing performs the following tasks for specified tables or all tables in the database.

  1. Removing dead tuples
    • Remove dead tuples and defragment live tuples for each page.
    • Remove index tuples that point to dead tuples.
  2. Freezing old txids
    • Freeze old txids of tuples if necessary.
    • Update frozen txid related system catalogs (pg_database and pg_class).
    • Remove unnecessary parts of the clog if possible.
  3. Others
    • Update the FSM and VM of processed tables.
    • Update several statistics (pg_stat_all_tables, etc).

Following is pseudocode of concurrent VACUUM, and it can be described in three blocks.

FOR each table
    Acquire ShareUpdateExclusiveLock lock for the target table
    /* The first block */
    Scan all pages to get all dead tuples, and freeze old tuples if necessary 
    Remove the index tuples that point to the respective dead tuples if exists

    /* The second block */
    FOR each page of the table
        Remove the dead tuples, and Reallocate the live tuples in the page
        Update FSM and VM
    END FOR

  /* The third block */
   Clean up indexes
   Truncate the last page if possible
   Update both the statistics and system catalogs of the target table
   Release ShareUpdateExclusiveLock lock
END FOR

/* Post-processing */
Update statistics and system catalogs
Remove both unnecessary files and pages of the clog if possible
Enter fullscreen mode Exit fullscreen mode

First Block

This block performs freeze processing and removes index tuples that point to dead tuples.
First, PostgreSQL scans a target table to build a list of dead tuples and freeze old tuples if possible. The list is stored in maintenance_work_mem in local memory.
Then it performs a cleanup stage, where it removes index tuples by referring to the dead tuple list.
This process is costly. In version 10 or earlier, the cleanup stage is always executed. In version 11 or later, if the target index is B-tree, whether the cleanup stage is executed or not is decided by the configuration parameter vacuum_cleanup_index_scale_factor.

Second Block

This block removes dead tuples and updates both the FSM and VM for each page of the table.

  • Note: unnecessary line pointers are not removed and they will be reused in future. This is because, if line pointers are removed, all index tuples of the associated indexes must be updated.

Third Block

The third block performs the cleanup after the deletion of the indexes, and also updates both the statistics and system catalogs related to vacuum processing for each target table. Moreover, if the last page has no tuples, it is truncated from the table file.

Post-processing

When vacuum processing is complete, PostgreSQL updates both several statistics and system catalogs related to vacuum processing, and it removes unnecessary parts of the clog if possible.

Visibility Map

Since vacuum processing involves scanning whole tables, it is a costly process. VM was introduced in version 8.4 to reduce this cost.

Each table has an individual visibility map that holds the visibility of each page in the table file, allowing Postgres to quickly identify dead tuples. This allows VACUUM to process only the dead tuples and skip the rest.

Freeze Processing

Freeze processing has two modes:

Lazy Mode

freeze processing scans only pages that contain dead tuples using the respective VM of the target tables.

When starting freeze processing, PostgreSQL calculates the freezeLimit txid and freezes tuples whose t_xmin is less than the freezeLimit txid.
The freezeLimit txid is defined as:
freezeLimit_txid = (OldestXmin − vacuum_freeze_min_age)
where OldestXmin is the oldest txid among currently running transactions.

Eager Mode

scans all pages regardless of whether each page contains dead tuples or not, and it also updates system catalogs related to freeze processing and removes unnecessary parts of the clog if possible.

Eager mode is performed when the following condition is satisfied.
pg_database.datfrozenxid < (OldestXmin − vacuum_freeze_table_age)
where pg_database.datfrozenxid represents the columns of the pg_database system catalog and holds the oldest frozen txid for each database.

Autovacuum Daemon

Vacuum processing has been automated with the autovacuum daemon; thus, the operation of PostgreSQL has become extremely easy.

References

  1. https://www.interdb.jp/pg/pgsql06.html

Top comments (0)