DEV Community

m-hashir
m-hashir

Posted on

The Internals of PostgreSQL - Chapter 1

This chapter describes the logical and physical structure of a database cluster as well as the internal layouts of the heap table file and the methods of writing and reading data to a table in PostgreSQL.
PostgreSQL runs a single database cluster upon which a collection of databases are managed. It does not actually run a group of database servers as PostgreSQL runs on a single host. Logically, a database is a collection of database objects which are data structures that store or reference data. There are many examples of it such as a table or a view. In PostgreSQL, even the databases are database objects themselves. These objects are managed by their object identifiers (or OIDs) which are unsigned 4- byte integers stored in system catalogs, which stores their relation.
In regard to physical structure, the cluster is basically one directory knowns as base directory, the path of which is usually the environmental variable PGDATA. A database is usually a subdirectory under the base subdirectory and the database subdirectory stores files of tables and indexes. The base directory also contains other subdirectories for data and configuration files. Tablespaces in PostgreSQL are different than in conventional RDBMS, such that it is one directory that contains data outside of base directory.

The layout of the complete cluster is in the form of many subdirectories, in which the base subdirectory contains the database subdirectories. The base directory of the cluster also contains subdirectories for configuration and other metadata. The databases, which are subdirectories of the base subdirectory, have names similar to their OIDs. In regard to the files for tables and indexes, they are managed by their OIDs while these data files are managed by relfilenode, both of which can be idenitical. However, using some commands such as TRUNCATE, CLUSTER, REINDEX, the relfilenode values can be changed. The database subdirectories also contain files for free space map and visibility map, which can be referred to as the 1st and 2nd fork of the data files. Tablespaces are files present outside of the base directory whose directory is addressed by a symbolic link and link name is same as that of the tablespace.
Data files are divided into pages of a fixed length of 8 KB, which are numbered starting from 0 and new ones are appended when the current one is full. They include three kinds of data: heap tuple which is the record itself, line pointers which hold a pointer to the heap tuple and header data which contains general information about the page.
When data is being written, for a table with one page and one heap tuple, the pd_lower points to the first line pointer, and both the line pointer and pd_upper point to the first tuple. When a second tuple is inserted, the second line pointer is added after the first one, pd_lower is updated to the second line pointer, and pd_upper to the second tuple, with other header data adjusted accordingly. Data is read in two manners, the first being sequential scan where all pages are read by all line pointers in page and B-tree index scan where an index file contains index tuples consisting of an index key and a TID pointing to the target heap tuple. When the index tuple with the desired key is found, the corresponding TID is used to directly retrieve the desired heap tuple, eliminating the need for additional page scanning.

Top comments (0)