In this part i'll be discussing the VACUUM process in Postgres.
VACUUM is a maintenance process that facilitates the persistent operation of PostgreSQL. Its two main tasks are removing dead tuples and freezing transaction ids.
To remove dead tuples, vacuum processing provides two modes:
- Concurrent VACUUM: Often called VACUUM, removes dead tuples for each page of the table file. Other transactions can read the table while this process is running.
- Full VACUUM: Removes dead tuples and defragments live tuples. Other transactions *cannot access tables while Full VACUUM is running.
Outline of Concurrent VACUUM
Vacuum processing performs the following tasks for specified tables or all tables in the database.
- Removing dead tuples
- Remove dead tuples and defragment live tuples for each page.
- Remove index tuples that point to dead tuples.
- Freezing old txids
- Freeze old txids of tuples if necessary.
- Update frozen txid related system catalogs (pg_database and pg_class).
- Remove unnecessary parts of the clog if possible.
- Others
- Update the FSM and VM of processed tables.
- Update several statistics (pg_stat_all_tables, etc).
Following is pseudocode of concurrent VACUUM, and it can be described in three blocks.
FOR each table
Acquire ShareUpdateExclusiveLock lock for the target table
/* The first block */
Scan all pages to get all dead tuples, and freeze old tuples if necessary
Remove the index tuples that point to the respective dead tuples if exists
/* The second block */
FOR each page of the table
Remove the dead tuples, and Reallocate the live tuples in the page
Update FSM and VM
END FOR
/* The third block */
Clean up indexes
Truncate the last page if possible
Update both the statistics and system catalogs of the target table
Release ShareUpdateExclusiveLock lock
END FOR
/* Post-processing */
Update statistics and system catalogs
Remove both unnecessary files and pages of the clog if possible
First Block
This block performs freeze processing and removes index tuples that point to dead tuples.
First, PostgreSQL scans a target table to build a list of dead tuples and freeze old tuples if possible. The list is stored in maintenance_work_mem in local memory.
Then it performs a cleanup stage, where it removes index tuples by referring to the dead tuple list.
This process is costly. In version 10 or earlier, the cleanup stage is always executed. In version 11 or later, if the target index is B-tree, whether the cleanup stage is executed or not is decided by the configuration parameter vacuum_cleanup_index_scale_factor.
Second Block
This block removes dead tuples and updates both the FSM and VM for each page of the table.
- Note: unnecessary line pointers are not removed and they will be reused in future. This is because, if line pointers are removed, all index tuples of the associated indexes must be updated.
Third Block
The third block performs the cleanup after the deletion of the indexes, and also updates both the statistics and system catalogs related to vacuum processing for each target table. Moreover, if the last page has no tuples, it is truncated from the table file.
Post-processing
When vacuum processing is complete, PostgreSQL updates both several statistics and system catalogs related to vacuum processing, and it removes unnecessary parts of the clog if possible.
Visibility Map
Since vacuum processing involves scanning whole tables, it is a costly process. VM was introduced in version 8.4 to reduce this cost.
Each table has an individual visibility map that holds the visibility of each page in the table file, allowing Postgres to quickly identify dead tuples. This allows VACUUM to process only the dead tuples and skip the rest.
Freeze Processing
Freeze processing has two modes:
Lazy Mode
freeze processing scans only pages that contain dead tuples using the respective VM of the target tables.
When starting freeze processing, PostgreSQL calculates the freezeLimit txid and freezes tuples whose t_xmin is less than the freezeLimit txid.
The freezeLimit txid is defined as:
freezeLimit_txid = (OldestXmin − vacuum_freeze_min_age)
where OldestXmin is the oldest txid among currently running transactions.
Eager Mode
scans all pages regardless of whether each page contains dead tuples or not, and it also updates system catalogs related to freeze processing and removes unnecessary parts of the clog if possible.
Eager mode is performed when the following condition is satisfied.
pg_database.datfrozenxid < (OldestXmin − vacuum_freeze_table_age)
where pg_database.datfrozenxid represents the columns of the pg_database system catalog and holds the oldest frozen txid for each database.
Autovacuum Daemon
Vacuum processing has been automated with the autovacuum daemon; thus, the operation of PostgreSQL has become extremely easy.
Top comments (0)