DEV Community

Cover image for PostgreSQL Database Cluster, Databases, and Tables: A Beginner's Guide
hammadsaleemm
hammadsaleemm

Posted on

PostgreSQL Database Cluster, Databases, and Tables: A Beginner's Guide

PostgreSQL is a powerful open-source relational database management system (RDBMS) used by many organizations and individuals around the world. If you are new to PostgreSQL, it's essential to understand the logical and physical structure of a PostgreSQL database cluster. In this article, we'll introduce you to the basic concepts of a PostgreSQL database cluster, databases, and tables.

1.1. Logical Structure of Database Cluster

A PostgreSQL database cluster is a collection of databases managed by a PostgreSQL server running on a single host. A database is a collection of database objects that include tables, indexes, views, sequences, and functions. In PostgreSQL, databases themselves are also database objects and are logically separated from each other. All other database objects belong to their respective databases and are internally managed by object identifiers (OIDs). The structure snapshot is as follows:

Image description

The relations between database objects and the respective OIDs are stored in appropriate system catalogs, depending on the type of objects. For example, OIDs of databases and heap tables are stored in pg_database and pg_class, respectively. You can find out the OIDs you want to know by issuing SQL queries. Like this:

Image description

1.2. Physical Structure of Database Cluster

A PostgreSQL database cluster consists of a base directory, which is created when you initialize a new database cluster using the initdb utility. The base directory contains several subdirectories and many files. A database is a subdirectory under the base directory, and each table and index is at least one file stored under the subdirectory of the database to which it belongs.

PostgreSQL also supports tablespaces, which are directories that contain some data outside of the base directory. Tablespaces are a feature unique to PostgreSQL and are different from tablespaces in other RDBMS. A snapshot is attached for a better understanding:

Image description

1.2.1. Layout of a Database Cluster

The layout of the PostgreSQL database cluster is described in the official documentation. Main files and subdirectories are listed under the base directory, including pg_xlog, pg_wal, pg_twophase, and pg_multixact.

1.2.2. Layout of Databases

A database is a subdirectory under the base directory, and the database directory names are identical to the respective OIDs. For example, when the OID of the database sampledb is 16384, its subdirectory name is 16384.

1.2.3. Layout of Files Associated with Tables and Indexes

Each table or index whose size is less than 1GB is a single file stored under the database directory it belongs to. Tables and indexes are internally managed by individual OIDs, while data files are managed by the variable relfilenode. The relfilenode values of tables and indexes basically match their respective OIDs.

1.2.4. The Internal Layout of a Heap Table File

Inside the data file (heap table and index, as well as the free space map and visibility map), the file is divided into fixed-length pages or blocks. The default page size in PostgreSQL is 8192 bytes (8 KB). Each page is numbered sequentially from 0, and such numbers are called block numbers. If a file is filled up, PostgreSQL adds a new empty page to the end of the file to increase its size.

The internal layout of a page within a table contains three kinds of data: heap tuples, line pointers, and header data. Heap tuples are record data that are stacked in order from the bottom of the page. Line pointers are 4 bytes long and hold a pointer to each heap tuple. They form a simple array, which serves as an index to the tuples. Each index is numbered sequentially from 1 and called an offset number. When a new tuple is added to the page, a new line pointer is also pushed onto the array to point to the new one.

A snapshot is as follows:

Image description

1.2.5. Reading and Writing Tuples:

o read heap tuples, there are two typical access methods: sequential scan and B-tree index scan. In sequential scan, all tuples in all pages are sequentially read by scanning all line pointers in each page. This method can be time-consuming for large tables.

On the other hand, B-tree index scan is a more efficient method of accessing tuples. An index file contains index tuples, each of which is composed of an index key and a TID pointing to the target heap tuple. If the index tuple with the key that you are looking for has been found, PostgreSQL reads the desired heap tuple using the obtained TID value. This method avoids unnecessary scanning in the pages and provides a faster way to access specific tuples within a table.

In conclusion, understanding the methods of writing and reading heap tuples is essential for working with data in PostgreSQL. By using efficient methods such as B-tree index scans, developers can optimize their applications and improve their system's performance. A snapshot is as follows:

Image description

1.2.6. Conclusion

Understanding the logical and physical structure of a PostgreSQL database cluster is essential for managing and optimizing database performance. We hope that this article has given you a basic understanding of PostgreSQL database clusters, databases, and tables. I will be sharing more as I go deeper into this vast and interesting field...

References:

https://www.interdb.jp/pg/pgsql01.html#_1.1.

Top comments (0)