PostgreSQL is a powerful, open source object-relational database system that uses and extends the SQL language. It is designed to handle a wide range of workloads, from small single-machine applications to large-scale, high-performance web applications with many concurrent users. It offers many features that make it a popular choice for developers, including ACID compliance, extensibility, and a large community of developers contributing to its development and support.
In this post we will dive into the structure and layout of Postgres databases.
Firstly, a Database Cluster is a collection of databases in a PostgreSQL server. A database subsequently, is a collection of Database Objects which are data structures used to either store or reference data. All Database Objects are managed by their unique Object Identifiers (OIDs).
The Database Cluster is physically a single directory known as the base directory. Furthermore, a database is a subdirectory under the base directory. The tables and indexes are stored as files under their respective database directories. Tables and Indexes under 1GB are stored as one file, while those with a larger size are split into multiple files.
Tablespaces in Postgres are additional data areas outside the base directory.
Data Files in Postgres are divided into pages of length 8192 bytes. They are numbered sequentially and the internal layout depends on the data file types
Reading and Writing Heap Tuples
In PostgreSQL, when a new tuple is inserted into a table, it is placed after the previous one on the same page. The page directory (pd_lower and pd_upper) is then updated to point to the new tuple. Other header data is also rewritten accordingly.
There are two common methods to access the heap tuples in PostgreSQL: sequential scan and B-tree index scan. Sequential scan reads all tuples in all pages by scanning all line pointers in each page. B-tree index scan uses an index file containing index tuples with keys and TIDs pointing to the target heap tuple. If the index tuple is found, PostgreSQL reads the desired heap tuple using the obtained TID value, avoiding unnecessary scanning in the pages.
Top comments (0)