Introduction
Concurrent VACUUM is a process that removes dead tuples, defragments live tuples, and freezes old txids for specified tables or all tables in a PostgreSQL database.
The process consists of three blocks:
- scanning for dead tuples and freezing old tuples.
- removing dead tuples and updating the FSM and VM on a page-by-page basis.
- performing cleanup and updating statistics and system catalogs.
The process uses a ring buffer and does not cache processed pages in shared buffers. The PARALLEL option can be used to process vacuuming index and cleaning index phases in parallel for multiple indexes.
Visibility Map
The visibility map (VM) is a feature introduced in PostgreSQL version 8.4 to reduce the cost of vacuum processing.
Each table has an individual VM that holds the visibility of each page in the table file.
The VM holds information about which pages contain dead tuples, which allows vacuum processing to skip pages that do not have dead tuples.
Each VM is composed of one or more 8 KB pages and is stored with the 'vm' suffix.
In version 9.6, the VM was enhanced to show page visibility and information about whether tuples are frozen or not in each page, which improves the efficiency of freeze processing.
Freeze Processing
Freeze processing in PostgreSQL has two modes: lazy mode and eager mode.
- lazy mode: Freeze processing scans only pages that contain dead tuples using the respective VM of the target tables,
- eager mode Scans all pages regardless of whether they contain dead tuples or not.
The freezeLimit txid is calculated based on the oldest transaction ID (OldestXmin) among currently running transactions, and tuples whose t_xmin is less than the freezeLimit txid are frozen.
The statistics related to vacuuming are updated before completing the vacuum process. While lazy mode may not be able to freeze tuples completely, eager mode scans all pages and updates system catalogs related to freeze processing.
Removing Unnecessary Clog Files
The clog (or commit log) in PostgreSQL stores transaction states and is used to determine if a transaction is committed or aborted.
When the minimum frozen transaction ID (datfrozenxid) in the pg_database system catalog table is updated, PostgreSQL can remove unnecessary clog files and pages.
To check the frozen transaction IDs in the database, you can use the following SQL query:
SELECT datname, datfrozenxid FROM pg_database;
To see the clog files and their sizes, you can use the ls
command on the pg_xact
directory (or pg_clog
in version 9.6 or earlier
Full Vacuum
In PostgreSQL, VACUUM is used to remove dead tuples and free up space in tables, but it doesn't reduce the size of the table file itself. This can lead to wasted disk space and performance issues.
To address this, PostgreSQL provides the Full VACUUM mode. When Full VACUUM is executed, PostgreSQL acquires an AccessExclusiveLock lock for the table and creates a new table file.
It then copies only live tuples from the old file to the new file, freezes tuples if necessary, removes the old file, rebuilds all associated table indexes, updates the Free Space Map (FSM) and Visibility Map (VM), and updates associated statistics and system catalogs.
During Full VACUUM, nobody can access the table, and at most twice the disk space of the table is used temporarily.
The extension pg_freespacemap can be used to check the average freespace ratio of a table and determine when Full VACUUM may be necessary.
Top comments (0)