Introduction to Heap Table File Layout
PostgreSQL stores data in heap tables, which are divided into pages of fixed length. The pages are numbered sequentially, and each page contains heap tuples, line pointers, and header data. The header data contains information about the page, such as the LSN of XLOG record, checksum value, pd_lower, pd_upper, and pd_special. A tuple identifier (TID) is used to identify a tuple within the table.
Writing Heap Tuples
When a new tuple is added to a page, a new line pointer is also added to the line pointer array. The line pointer points to the new tuple, and the pd_upper points to the new tuple as well. The pd_lower is updated to point to the end of the line pointer array. This process is repeated for each new tuple that is added to the page.
Reading Heap Tuples
There are two typical access methods for reading heap tuples: sequential scan and B-tree index scan. In a sequential scan, all tuples in all pages are read by scanning all line pointers in each page. In a B-tree index scan, an index file is used to find the TID of the desired heap tuple. The TID is then used to read the desired heap tuple without unnecessary scanning in the pages.
Helpful Sources:
https://github.com/apache/age
https://www.interdb.jp/pg/pgsql01.html
Top comments (0)