DEV Community: Muhammad Adil Shahid

Understanding the Internal Layout of the Heap Table in PostgreSQL

Muhammad Adil Shahid — Fri, 21 Jul 2023 12:16:41 +0000

The data file that basically contains the heap table and index is divided into pages of fixed length. If the data file space gets filled completely, postgreSql adds a new empty page to the end of the file.

A page within a table contains three kinds of data:

Heap tuple(s): A heap tuple is a record data itself. They are stacked in order from the bottom of the page.
Line pointer(s): A line pointer that is also known as item pointer is a 4 byte long pointer that points to each heap tuple. Line pointers form a structure of array that represent the index of the tuples.
Header data: It is the data that is added at the beginning of the page and contains general information about the page. Header data is defined by the structure PageHeaderData and 24 byte long.

Writing Heap Tuples

While writing heap tuples, there are two main paramters:

pd_lower is used to point to the line pointer of the page
pd_upper is used to point to the heap tuple of the page

Reading Heap Tuples

While reading heap tuples there are two scans in postgresql:

In sequential scan, all tuples in all pages are sequentially read by scanning all line pointers in each page.
In a B-tree index scan, an index file contains index tuples. Each index file is composed of an index key and a TID that points to the target heap table.
PostgreSQL reads the heap tuple using the TID

References:

https://www.interdb.jp/pg/pgsql01.html

A brief overview on Heap Only Tuple and Index Only Scans

Muhammad Adil Shahid — Sat, 08 Jul 2023 20:12:00 +0000

Here we are going to discuss two features of postgresSQL that are very useful in enhancing its performance i.e.

Heap Only Tuple (HOT)
Index Only Scan

Heap Only Tuple

So basically in postgresSQL, when a row/tuple is updated, it is added as the new version. That tuple will have two versions the old one that is marked as deleted and the updated one that is marked as updated. But there is a problem in this case i.e. we have to manage two versions of a row.

To solve this problem, we use HOT. In Heap only tuple, we check the space on the page where old version tuple is located and if there is enough space, we create the updated version on the same page where old version is present.

Index Only Scan

In psotgresSQL, when a database query needs the index and data of column, we have to go to the table to get that column. This process involves a lot of disk I/O and eventually this thing results in very slow execution of query.

To solve this problem, here comes the index only scans. To reduce that I/O cost, index-only scans directly use the index key without accessing the corresponding table page when all entries of the SELECT statement are included in the index key. But there is certain criteria to use index-only scans:

The query must retrieve only those columns that are included in index.
Index type must support index-only scans such as b-tree index.

References

https://www.interdb.jp/pg/pgsql07.html
https://www.postgresql.org/docs/current/indexes-index-only-scans.html

Vacuum Processing in PostgresSQL

Muhammad Adil Shahid — Thu, 29 Jun 2023 19:26:24 +0000

Vacuum processing is one of the core mechanism in PostgresSQL. It basically involves the removal of dead tuples in the database. In order to remove the dead tuples, it involves two modes

Concurrent VACUUM
Full VACUUM

Concurrent VACUUM:

Concurrent VACUUM also known as VACUUM. It removes dead tuple for each page of the table and other transactions can read the data while the process is still running. It involves three blocks:

First Block:

This block performs the freeze processing and removes index tuples that points to dead tuples.

Second Block:

This block removes the dead tuples and update both Free space map (FSM) and Visibility map (VM).

Third Block:

This block performs the cleanup after the deletion of indexes and update both the statistics and the system catalog.

Post-processing:

When the vacuum processing is complete, after updating both statistics and system catalogs related to vacuum processing, it removes unnecessary parts of the clog if possible.

Visiblity Map:

Vacuum processing is quite costly so visibility map was introduced. Each table has a visibility map that holds the visibility of the page in table file and determines the dead tuples.

Freeze Processing:

Freeze Processing has two modes

Lazy Mode: When freeze processing scans only pages that contain dead tuples using the visibility map of the target tables.
Eager Mode: It scans all pages to inspect all tuples in tables and update relevant system catalogs and removes unnecessary files.

Full VACUUM:

Concurrent vacuum is essential but it is not sufficient so here comes the full vacuum. It not only removes the dead tuples but also reduced the table size. But two things should be considered here:

Nobody can access the table when Full VACUUM is processing.
At most twice the disk space is used temporarily.

References:

https://www.interdb.jp/pg/pgsql06.html

Concurrency Control in PostgresSQL

Muhammad Adil Shahid — Thu, 29 Jun 2023 18:16:15 +0000

Concurrency control is the process of maintaining atomicity and isolation when two transactions run concurrently in the database.

There are three types of concurrency control techniques:

Multi-version Concurrency Control (MVCC) is the technique that allows multiple transactions in database without blocking each other. PostgresSQL and some other RDMS uses the version of MVCC called as Snapshot Isolation (SI).
Strict Two-Phase Locking (S2PL) is the technique that uses locks while accessing the shared resources in database. It means that if one transactions is happening, the lock will prevent other transactions to access the shared resource.
Optimistic Concurrency Control works by reducing the need of locks during transactions in database. It supposes that the conflicts are not often. As described by its name, OCC works optimistically and allows transactions to happen but if the conflict occurs, OCC revert the transaction that causes the conflict.

Transaction ID:

The unique identifier (txid) assigned by the transaction manager to a transaction is known as transaction ID.

PostgresSQL has three types of txids:
0 means invalid txid.
1 means Bootstrap that is used in the initialization of database cluster.
2 means Frozen txid that is used in the for MVCC.

Commit Log:

Commit log holds the statuses of transactions. This log is allocated to the shared memory and is used throughout the transaction processing.
There are four transactions statuses in the commit log:

IN_PROGRESS when transaction is in progress.
COMMITTED when transaction completed successfully.
ABORTED when transaction encounters some errors.
SUB_COMMITTED when transaction goes under some custom implementations.

Transaction Snapshot:

Transaction snapshot refers to the dataset that stores all the information about the transaction like whether it is active, at a certain time for an individual transaction.

Serial Snapshot Isolation:

This concurrency method is used to make sure the high level of isolation. One transaction never gets the effect of other transactions in database and it can never read the data that is added by another transactions in the database but not committed successfully i.e. serialization.

References:

https://www.interdb.jp/pg/pgsql05.html

Foreign Data Wrappers (FDWs) in PostgresSQL

Muhammad Adil Shahid — Wed, 28 Jun 2023 20:49:59 +0000

In this blog we will understand foreign data wrappers.
Foreign Data Wrappers(FDWs) is the library in PostgresSQL that allows to access the data like tables from External sources. To manage the foreign tables that are on a remote server, FDWs use SQL Management of External Data (SQL/MED).

The analyzer/analyser creates the query tree of the input SQL using definitions of the foreign tables which are stored in catalog.
To get connected to the remote server, the planner uses specific libraries. For example, while connecting to the remote PostgresSQL server, the library that will be used by postgres_fdw is libpq.
use_remote_estimate is used to control the EXPLAIN command use. If the use_remote_estimate` option is ON, then the planner wil execute the EXPLAIN command to estimate the cost of each path.
The planner uses the plan tree's scan paths of the foreign tables to create the plain SQL statements and this process is known as deparsing.
After the process of deparsing, the executor takes those plain SQL statements and send them to the remote server. The method of sending those SQL statements depends on the developer of each extension. For instance, the foreign data wrappers in mysql i.e. mysql_fdw sends the SQL statement without using a transaction.

Then FDW receive the results from the remote server and converts it into the PostgresSQL readable data.

References:

https://www.interdb.jp/pg/pgsql04.html

Learning the Science behind Query Processing, its Cost Estimation and Join Operations in PostgresSQL

Muhammad Adil Shahid — Tue, 27 Jun 2023 20:20:24 +0000

Here we will discuss query processing, cost estimation of query and joins. So lets get started.

I. Query Processing

Query processing comprises of 5 parts explained below:
1. Parser:
This is the first part of query processing. It is responsible for generating the parse tree from SQL statement. It checks the syntax of the SQL statement while building parse tree.
2. Analyzer/Analyser:
Analyzer/Analyser takes the parse tree from the parser and do semantic analysis on it. After this analysis, it generates another tree known as query tree. The root of this query tree contains the metadata about the query such as the type of the query(SELECT, INSERT etc).
3. Rewriter:
Next comes the Rewriter, it transforms the query tree according to the rule systems given in pg_rules system catalog. It basically works on the optimization of the query tree.
4. Planner:
The planner receives the query tree from the rewriter and generates the plan tree. This tree describes the plan about the execution of the query.
5. Executor:
The executor takes the plan tree designed by the planner and executes it. The plan tree is composed of nodes. Each node contains some information that executor requires during processing.

II. Cost Estimation in Single-Table Query

PostgreSQL's query optimization is based on cost. Costs are estimated by the functions defined in costsize.c. In PostgresSQL, there are three types of costs.

The startup cost is the cost expended before the first tuple is fetched.
The run cost is the cost to fetch all tuples.
The total cost is the sum of the costs of both startup and run costs.

Creating the Plan Tree of a Single-Table Query

The planner creates a plan for the execution of the query and it involves three parts:

Preprocessing: In preprocessing, the planner do some simplification of target lists, normalization of boolean expression and flattening of AND/OR expressions
Getting the cheapest Access path: In this segment, the planner estimates the costs of all possible access paths and selects the most cheapest one. For this purpose, it creates a RelOpInfo structure to store the access paths and corresponding costs. From this structure, it selects the cheapest path.
Plan Tree: In the last part, the planner generates the plan tree from the access path. The root of this tree is a PlannedStmt structure. It contains 19 fields and here are 4 representative fields: commandType that stores the type of operation, rtable that stores rangeTable enteries, relationOids that stores oids of the related tables for this query, plantree stores a plan tree that is composed of all plan nodes.

Working of Executor:
The executor process the queries by taking the plan nodes from the end. EXPLAIN command is used to understand the working of executor as it shows the plan tree almost as it is.

III. Join Operations

PostgresSQL supports three join operations that are discussed below:

Nested loop join is the most fundamental join and it works by iterating through each row of a table and matching it with corresponding rows of another table. It is effective in case when one table is small than the other.
Merge Join traverse through both tables with sorted list of join keys and match the rows with same join keys.
Hash join as described by name uses hash function to join. In this method, hash table is created from one table and used to find the matching rows from the other table.

References:

https://www.interdb.jp/pg/pgsql03.html
https://www.interdb.jp/pg/pgsql0302.html
https://www.interdb.jp/pg/pgsql0303.html

Understanding the process and memory architecture in PostgreSQL

Muhammad Adil Shahid — Mon, 26 Jun 2023 19:14:25 +0000

PostgreSQL is a client/server type relational database management system.

It has multi-process architecture .
Runs on single host.

Now we will discuss the types of processes in it.

Processes in PostgresSQL server

1. Postgres Server Process

The parent process of all the process is Postgres server process, also known as postmaster in previous versions.

All other types of processes are initiated by this process.
The default port number on which Postgres server listens is 5432.
More than one PostgreSQL servers can run on the same host but their port number should be different.

2. Backend Process
Backend Process (also known as postgres) is started by postgres server process. Its responsibility is to handle all the queries of the connected client.

As it is allowed to operate only one database, you have to specify the database while connecting to the server.
It is connects to clients by a single TCP connection
PostgresSQL allows multiple clients connection simultaneously but you have specify the maximum number of clients that can connected at a time and this configuration is done through max_connections parameter. The default value of this parameter is 100.

3. Background Process
Each feature/function has its own background process. For example:

CHECKPOINT is used to write dirty buffers from memory to disks.
VACUUM is used to remove the dead tuples to free up the space.

4. Replication Associated Process
The processes that are used to manage the replication of database are known are replication associated processes.

They are used to perform the streaming replication in which a change in primary database is also done in other replicas of it.

5. Background Worker Process
These custom background processes that are implemented by user are known are background worker process.

In addition to the conventional database operations, these processes can also do other tasks. In short, user can add customized tasks in addition to regular operations

Memory architecture in PostgresSQL server

Next comes the memory architecture of PostgreSQL that has two types

1. Local Memory Area

This area is allocated by backend processes for their own use.
It is further divided into sub-areas whose sizes are either fixed or variable.

2. Shared Memory Area

This area is allocated by PostgreSQL server when it starts up.
It is divided into fixed sub-areas.

References:

https://www.interdb.jp/pg/pgsql02.html

Exploring the Structure and Layout of a Database Cluster

Muhammad Adil Shahid — Sun, 25 Jun 2023 18:23:03 +0000

Before diving into the structure of database cluster, we first need to understand what database cluster is.

Database cluster is the collection of databases. This collection is managed by a server named PostgreSQL server. This server runs on a single host. Now let's discuss the its structure.

Logical Structure of Database Clusters

Just like database cluster is the collection of databases, database is the collection of database objects. These objects are used to store and reference the data. Some examples of database objects are tables, index, views, sequence and function.
These database objects are managed in PostgreSQL by Object identifiers OIDs.
In PostgreSQL, databases are also database objects, logically separated from eachother.

Physical Structure of Database Cluster

Database cluster is the base directory while database is its subdirectory.

The subdirectory of the database contains at least one file of tables and indexes and particular data or configuration files.

Layout of database cluster, databases and tables

The base directories (database cluster) contains some files and subdirectories.
The databases are under the base directory and their names are same as that of Object identifiers.
Tables and indexes are managed by individual OIDs.
Data files are managed by relfilenode variable.

Tablespaces

The additional data area outside the base directory is known as tablespaces.
When you issue a command CREATE TABLESPACE, a tablespace is created under the provided directory.