Introduction
PostgreSQL, an open-source object-relational database system, has gained immense popularity for its robustness, extensibility, and flexibility. As developers and database enthusiasts, we often interact with PostgreSQL at a high level, utilising its powerful features to build and manage our applications. But have you ever wondered what lies beneath the surface? Understanding its internal mechanisms is crucial for developers and database administrators alike.
Luckily for us, renowned author Hironobu SUZUKI has written a comprehensive guide in the form of a book called The Internals of PostgreSQL. I will be covering each chapter of the book in separate blog posts as part of a step-by-step blog series centred around understanding PostgreSQL.
In this first blog, I will be covering Chapter 1 'Database Cluster, Databases and Tables' which serves as a gateway into understanding the inner workings of PostgreSQL and all its subsystems from the ground up.
Database Clusters
Prior to reading this chapter, I had encountered the term 'database cluster' several times while using MongoDB and PostgreSQL but had never really tried to understand what it meant. Fortunately for me, this first chapter starts off by explaining what a database cluster is both logically and physically.
Logical Structure
In PostgreSQL, a database cluster is a collection of databases managed by one PostgreSQL server running on a single host. This may seem obvious but there is a common misconception that a database cluster is a group of database servers which this book explains is entirely untrue.
A database is a collection of database objects used to store or reference data such as tables, indexes, views, functions, sequences, and so on. In PostgreSQL, databases are also database objects which are internally managed through OIDs (Object Identifiers). OIDs are unsigned 4-byte integers that are stored in system catalogs according to the type of objects.
Physical Structure
The physical structure of a database cluster in PostgreSQL involves directories. The cluster is the base directory and its path is usually set to PGDATA. Under the base directory are subdirectories for tables. The database directory names are identical to their OIDs. At least one file (tables or indexes)* are stored under the subdirectory along with configuration files. These data files are managed by a variable called relfilenode whose values usually match respective OIDs.
PostgreSQL also supports tablespaces which are directories that store data outside of the base directory. It is created when you issue the CREATE TABLESPACE statement.
*of size less than 1GB. If greater than 1GB, a new file is created and used.
Internal Layout of a Heap Table File
The data file is covered above is divided into pages or blocks of a fixed length*. These pages/blocks are numbered sequentially from 0. These sequential numbers are called block numbers. If the file has reached its capacity, PostgreSQL adds an empty page to the end of the file to increase the size.
A page/block contains 3 kinds of data:
- heap tuple(s): it is a record data itself. It is covered in more detail in Section 5.2 and Chapter 9.
- line pointer(s): holds a pointer to each heap tuple and is 4-bytes long.
- header data: contains general information about the page and is 24-bytes long. Its major variables are pd_lsn, pd_checksum, pd_lower, pd_upper, and pd_special.
*default size is 8 KB
The Methods of Writing and Reading Tuples
Writing Heap Tuples
A table consisting of just one page which contains one heap tuple would have the pd_lower values pointing to the first line pointer. Both the line pointer and pd_upper would point to the first heap tuple. If you add a second heap tuple, it will of course come after the first. The second line pointer will be pushed onto the first and point to the second tuple. pd_lower will then point to the second line pointer and pd_upper will point to this second heap tuple.
Reading Heap Tuples
There are two typical access methods:
- Sequential scan: all tuples in all pages are read sequentially by scanning all line pointers in each page. This means that PostgreSQL goes through each page and examines all line pointers to retrieve the tuples. This method is useful when you want to scan the entire table.
- B-tree index scan: an index file contains index tuples consisting of an index key and a TID (tuple identifier) that points to the target heap tuple. If the index tuple with the desired key is found, PostgreSQL retrieves the corresponding heap tuple using the obtained TID value. By using the index, PostgreSQL can directly locate the desired tuple without unnecessary scanning of all pages.
Top comments (0)