Hannan2910

Posted on May 10, 2023

Vacuum Processing

#database #postgres #apache #opensource

Today im gonna talk about Vacuum Processing in PostgreSQL. This might be a bit long so bear with me.

Vacuum Processing

Vacuum processing is an essential maintenance process in PostgreSQL that helps ensure the reliable operation of the database by removing dead tuples and freezing transaction IDs. Dead tuples are rows in tables that are no longer needed and can be safely removed, while freezing transaction IDs prevents the possibility of data corruption. Vacuum processing has two modes: Concurrent VACUUM and Full VACUUM. Concurrent VACUUM removes dead tuples for each page of the table file, and other transactions can read the table while this process is running. In contrast, Full VACUUM removes dead tuples and defragments live tuples for the whole file, and other transactions cannot access tables while Full VACUUM is running.

Despite being crucial to PostgreSQL's performance, the development and improvement of vacuum processing have been slower compared to other functions. Until version 8.0, vacuum processing had to be executed manually, but automation was introduced in 2005 with the implementation of the autovacuum daemon. Since vacuum processing involves scanning whole tables, it can be a costly process. However, PostgreSQL version 8.4 introduced the Visibility Map (VM) to improve the efficiency of removing dead tuples by keeping track of pages containing only live tuples, which allows the vacuum process to skip unnecessary scans. In version 9.6, further improvements were made to the freeze process by enhancing the VM. The PostgreSQL documentation includes detailed sections on various aspects of vacuum processing, such as concurrent VACUUM, the VM, freeze processing, removing unnecessary clog files, the autovacuum daemon, and Full VACUUM, providing valuable information to optimize vacuum processing for PostgreSQL databases.

Outline of Concurrent Vacuum

Vacuum processing in PostgreSQL performs the following tasks for specified tables or all tables in the database:

Removing dead tuples: It removes dead tuples from tables and defragments live tuples on a page-by-page basis. It also removes index tuples that point to dead tuples.
Freezing old txids: It freezes old transaction IDs (txids) of tuples if necessary. This involves updating frozen txid-related system catalogs and ensuring data integrity.
Removing unnecessary parts of the clog: It removes unnecessary portions of the commit log (clog) if possible, freeing up space.

In more detail, the vacuum process consists of three blocks within a loop for each table and post-processing steps:

The first block scans the target table, identifying dead tuples and freezing old tuples. It builds a list of dead tuples stored in memory.
The second block removes dead tuples, defragments live tuples, and updates the free space map (FSM) and visibility map (VM) on a page-by-page basis.
The third block performs cleanup after removing index tuples and updates statistics and system catalogs related to vacuum processing for each table. If the last page of a table is empty, it is truncated from the table file.

Finally, post-processing involves updating statistics, system catalogs, and removing unnecessary parts of the commit log.

Overall, vacuum processing in PostgreSQL aims to optimize database performance by removing dead tuples, freezing transaction IDs, and maintaining data integrity while minimizing resource usage.

Visibility Map

The introduction of the Visibility Map (VM) in PostgreSQL 8.4 aimed to reduce the cost of vacuum processing. The VM is a table-specific map that tracks page visibility, indicating whether pages contain dead tuples or not. By using the VM, the vacuum process can skip pages that do not have dead tuples, optimizing performance and reducing resource usage.

The VM is stored as one or more 8 KB pages in a separate file. It complements the Free Space Map (FSM) and provides valuable information about page visibility. This allows the vacuum process to efficiently target and clean up pages that require attention, improving the overall efficiency of vacuum processing in PostgreSQL.

Freeze Processing

Freeze processing in PostgreSQL has two modes: lazy mode and eager mode. In lazy mode, the freeze process scans only the pages that contain dead tuples, using the visibility map (VM) of the target tables. This mode skips pages that don't have dead tuples, resulting in more efficient processing. On the other hand, eager mode scans all pages, regardless of whether they have dead tuples or not. It updates relevant system catalogs, removes unnecessary files and pages from the commit log, and provides a more thorough freeze process. The mode used depends on specific conditions, with lazy mode being the default but eager mode being triggered when certain criteria are met.

To improve the efficiency of the eager mode, PostgreSQL introduced enhancements in version 9.6. The VM was enhanced to include information about whether all tuples on a page are frozen. This improvement allows the freeze process in eager mode to skip pages that contain only frozen tuples, further optimizing the processing. By leveraging the VM's information, unnecessary scanning of pages with fully frozen tuples is avoided, resulting in faster and more efficient freeze processing.

In summary, freeze processing in PostgreSQL has two modes: lazy mode and eager mode. The default mode is lazy, where only pages with dead tuples are scanned. Eager mode, triggered under specific conditions, scans all pages and performs comprehensive freeze processing. The introduction of the visibility map and subsequent enhancements in eager mode in version 9.6 improved the efficiency by allowing the skipping of pages with fully frozen tuples. These modes help optimize the freeze process and maintain data integrity in the database.

Removing Unnecessary Clog Files

In PostgreSQL, the clog (commit log) is responsible for storing transaction states. When the minimum frozen transaction ID, represented by pg_database.datfrozenxid, is updated, PostgreSQL attempts to remove unnecessary clog files. By identifying the clog file containing the minimum frozen transaction ID, PostgreSQL can safely remove older clog files and their associated pages, freeing up disk space while maintaining the necessary transaction information for data integrity in the database.

Autovacuum Daemon

With the introduction of the autovacuum daemon in PostgreSQL, vacuum processing has been automated, making the operation of the database much more convenient. The autovacuum daemon triggers multiple autovacuum_worker processes at regular intervals. By default, it wakes up every minute (as defined by autovacuum_naptime) and initiates three workers (as defined by autovacuum_max_workers). These autovacuum workers execute vacuum processing concurrently for different tables, ensuring a gradual and efficient cleanup of dead tuples while minimizing the impact on ongoing database activities. This automation greatly simplifies the management of vacuuming and contributes to the overall performance and stability of the PostgreSQL database.

Full VACUUM

Concurrent VACUUM, although important for maintaining database performance, has limitations in reducing table size. Even if a significant number of dead tuples are removed, the physical size of the table remains unchanged. This is demonstrated in the given example (Figure 6.8) where a table with three pages and six tuples per page has dead tuples removed through a DELETE command followed by a VACUUM command. Despite the removal of dead tuples, the table size does not shrink. This inefficiency leads to wasted disk space and negatively impacts database performance, as accessing even a subset of tuples requires loading the entire pages from disk. To address this issue, additional operations like table reorganization or cluster are required to physically compact the table and reclaim unused space.

Thats all for today I know its long but is encapsulate the whole topic. I hope you find this helpful.

DEV Community

Vacuum Processing

Vacuum Processing

Outline of Concurrent Vacuum

Visibility Map

Freeze Processing

Removing Unnecessary Clog Files

Autovacuum Daemon

Full VACUUM

Top comments (0)

Read next

9 Open Source Libraries to Supercharge Your Next Project 🔋⚡️

🔥How to fetch HTML from server

Code Archeologist: AI-Powered Git Repository Analysis with PostgreSQL

Building a Full-Stack User Management System with Next.js 14, GraphQL, Prisma, and PostgreSQL