DEV Community

Cover image for A quick recap for chapter 1 in "The Internals of PostgreSQL" book
Ahmed Mohamed
Ahmed Mohamed

Posted on

A quick recap for chapter 1 in "The Internals of PostgreSQL" book

To understand how PostgreSQL works, it's important to have a grasp of its logical and physical structures, as well as its internal layout. This article will provide an overview of the logical and physical structures of a PostgreSQL database cluster, as well as the layout of heap table files.

Logical Structure of Database Cluster

A PostgreSQL database cluster is a collection of databases, each containing a variety of database objects like tables, indexes, sequences, and views. These objects are internally managed by object identifiers (OIDs), which are unsigned 4-byte integers. The relations between database objects and their respective OIDs are stored in system catalogs, like pg_database and pg_class. PostgreSQL servers run on a single host and manage a single database cluster.

Logical structure of a database cluster

Physical Structure of Database Cluster

A PostgreSQL database cluster is comprised of a base directory, subdirectories, and lots of files. When initializing a new database cluster, a base directory is created under the specified directory, and a database is a subdirectory under the base subdirectory. Tables and indexes are stored as files under the subdirectory of the database to which they belong, and there are several other subdirectories containing particular data and configuration files.

An example of database cluster

Layout of Files Associated with Tables and Indexes

Each table or index whose size is less than 1GB is stored as a single file under the database directory it belongs to. Tables and indexes are managed by individual OIDs, while their data files are managed by the variable relfilenode. The relfilenode values of tables and indexes generally match their respective OIDs.

Internal Layout of a Heap Table File

Inside the data file (heap table and index, as well as the free space map and visibility map), it is divided into pages (or blocks) of fixed length, the default is 8192 byte (8 KB). Each page within a table contains three kinds of data: heap tuples, line pointers, and header data. Heap tuples are the record data itself, and they are stacked in order from the bottom of the page. Line pointers hold a pointer to each heap tuple and form a simple array, which plays the role of an index to the tuples. Header data, defined by the structure PageHeaderData, is allocated in the beginning of the page and contains general information about the page.

Page layout of a heap table file

Conclusion

Understanding the logical and physical structures of a PostgreSQL database cluster, as well as the internal layout of heap table files, is crucial for developers With this knowledge, you'll be well-equipped to work with and manage PostgreSQL databases with ease.

Top comments (0)