DEV Community: Haseeb Ashraf

A Brief Overview of Base Backup & Point-in-Time Recovery in PostgreSQL

Haseeb Ashraf — Sat, 15 Jul 2023 07:37:14 +0000

Backups in online databases can be categorised into two main categories:

physical backups
logical backups

Although both have their own advantages and disadvantages, logical backup mainly has one major disadvantage being that it takes too much time for performing the backup. Specifically, it takes an unusually long amount of time to backup a very large database and even more time to restore this database from the backup.

In PostgreSQL however, full physical online backups have been available since version 8.0 and a snapshot of the whole running database cluster is known as a base backup.

Point-In-Time Recovery (PITR) is also a feature that has been available since version 8.0 helps restore the database cluster to any point in time by making use of a base backup and archive logs that are developed using the continuous archiving feature. Let's take an example and say that you made a critical mistake such as deleting all the tables, this feature allows you to restore the database back to the point just before the mistake was made.

In this short overview, we will look at the following topics:

- What base backup is

Firstly, the regular technique to design a base backup are as follows:

(1) Issue the pg_backup_start command (Version 14 or earlier, pg_start_backup)
(2) Take a snapshot of the database cluster with the archiving command you want to use
(3) Issue the pg_backup_stop command (Version 14 or earlier, pg_stop_backup)

This is a relatively simple and easy to use procedure that is helpful for system administrators as it requires no special tools and only uses common tools such as copy command or any similar tool used for archiving and creating a backup. Additionally, no table locks are required and all of the database users can query the database while being unaffected by the backup operations.

- How PITR works

Now let's take a look at how PITR works. Let's assume that you made a crucial mistake at 5:15 PKT on 21st January 2023. Now you will remove the current database cluster and restore the new one using the backup that has been made before. Next, you will set the parameter_restore_command and also set a time for the parameter recovery_target_time to the point where the mistake was made. Now when PostgreSQL boots up, enters into PITR recovery mode and if there is a recovery file in the backup cluster it starts the recovery mode.

- What timelineId is

In PostgreSQL, a timeline is used to differentiate the original database cluster from the recovered one and is one of the most fundamental concepts of PITR. let's take a look at timelineId in this section.

Each timeline is given a timelineId which is basically a 4-byte unsigned integer that starts at 1.
Each database cluster is assigned an individual timelineId and the timelineId of the original database cluster is created by the initdb utility and is 1 by default. And whenever a database cluster is recovered, the timelineId is incremented by 1

- What timeline history file is

When a PITR process is completed, a timeline history file is created with names like

00000003.history
This is created under the archival directory. This file keeps a record of which timeline it was initially branched off from and at what time.

The timeline history file consists of at least one line and each of the lines are composed of the following three things:

timelineId - it is a timelineId of the archive logd that are used to recover.
LSN - This points to the location where the switch of the WAL segment takes place
Reason - a human readable explanation of why the timeline was changed.

An overview of Write Ahead Logging - WAL in PostgreSQL

Haseeb Ashraf — Fri, 14 Jul 2023 23:00:33 +0000

Even when a system failure occurs, a database management system is required to not lose any data as it is extremely crucial and hence transaction logs are an extremely essential part of any database management system.

A transaction log is a historical log of all the past transactions that have occurred and it is made to keep sure that no data is lost during events such as a power outage or a server crash.

In the world of computer science, WAL is an acronym for Write Ahead Logging which is a protocol to write both changes and actions made in the database, although in PostgreSQL, this is known as Write Ahead Log. In this short overview, we will take a look at the following subsections:

- The logical and physical structure of WAL

The logical and physical structure of WAL is such that if follows the conventional insertion operations and database recovery techniques used in other databases and PostgreSQL as well.
PostgreSQL will write all data as modifications into a persistent storage to prepare for failures and this historical data is known as XLOG records or WAL data.

- The internal layout of WAL

Logically speaking, PostgreSQL writes *XLOG * records into a transaction log which is a virtual log that consists of a 8-byte long file. A WAL segment is a 16 MB file by default and it is further internally divided into pages of 8192 bytes (8 KB).

- Writing of WAL data

Now, moving on to understanding the writing of the XLOG files, by issuing the following statement:
testdb=# INSERT INTO tbl VALUES ('A');
By invoking the above statement, the internal function exec_simple_query() is invoked. This function writes and flushes all XLOG files to the WAL segment from the WAL buffer.
WAL writer process

Writing operations are usually done using DML (Data Manipulation Language) but even non-DML operations are capable of performing writing operations in PostgreSQL. WAL writer works as a background process to check on the WAL buffer periodically and writes all the unwritten XLOG records into the segments containing the WAL.

A Brief Overview of Buffer Manager in PostgreSQL

Haseeb Ashraf — Wed, 12 Jul 2023 10:22:37 +0000

The job of a buffer manager is to manage and oversee the data transfers that take place persistent memory and shared memory and thus, this part of the DBMS can have a significant impact on the performance and working of the RDBMS. In PostgreSQL however, the buffer manager works very well and efficiently.

In this brief overview, we will take a look at the following sections and get an understanding of the topics:

- Buffer manager structure
The buffer manager in postgreSQL consists of a buffer descriptor, a buffer table and a buffer pool. Things like the data file pages e.g. the tables and indexes and the freespace maps and visibility maps as well. Next, the buffer pool is in the form of an array which basically entails that each of the slots store a single page of the data file. The indices of a buffer pool array are known as buffer_ids

- Buffer manager locks
The buffer manager locks are used for many different purposes and uses many different locks. Now, we will look at the locks that are necessary for understanding the upcoming sections.

Firstly, there are Buffer Table Locks which help protect the integrity of the data that is contained in the entire buffer table. This can be used in both the exclusive and shared modes and is considered a relatively light weight lock. Whenever an entry is inserted or deleted, an exclusive lock is held by a backend process.

- How the buffer manager works
A buffer manager works whenever a backend process wants to access a desired page. When this task is to be performed, is uses a ReadBufferExtended function.
How the ReadBufferExtended function behaves depends on three different logical cases. Additionally, we will look at the PostgreSQL clock sweep page replacement algorithm in the final section.

- Ring buffer
Whenever PostgreSQL is reading or writing a very large table, a ring buffer is used instead of a buffer pool. A small and temporary buffer area is assigned as a ring buffer in the main memory. Bulk reading, bulk writing and Vacuum-processing are the three main operations where ring buffer is used.

*- Flushing of dirty pages *
The background writer and check pointer does processing of flushing dirty pages to storage. Both of these processes perform the same function however, each have different behaviors and roles.

The role of the background writer is to decrease the influence of large scale writing of checkpointing. Whereas the checkpoint process writes a checkpoint record to the WAL segment and whenever the checkpointing starts it flushes the dirty pages.

An Overview of Heap Only Tuple and Index-Only Scans

Haseeb Ashraf — Tue, 11 Jul 2023 09:25:23 +0000

In this short article, we will take a look at two features related to index scan. Namely, Heap Only Tuple and Index-Only Scans.

The Heap Only Tuple was initially implemented in Version 8.3 of PostgreSQL and its main purpose was to effectively make use of the pages of both the table and index when the updated table page is stored in the same store as the old row. Heap Only Processing, also known as HOT helps reduce the need of VACUUM processing.

When updating a row without HOT, inserting data into the index tuples takes up the index page space and thus the cost of vacuuming and inserting index tuples are very high. HOT helps reduce the impact of these problems.

When updating a row with HOT, in case the updated row is stored in the same page table that stores the previous row, PostgreSQL will not insert the corresponding index tuple and sets the update bits' value to zero.

Now, let's take a look at Index Only Scans, to reduce the cost of I/O, index-only scans will use the index key directly without having to access the table pages whenever a SELECT statement is called. This method has been provided by all commercial RDBMS systems and PostgreSQL specifically has introduced this feature in the Version 9.2

A Short Overview on Vacuum Processing in PostgreSQL

Haseeb Ashraf — Tue, 27 Jun 2023 16:46:04 +0000

Vacuum processing in PosgtreSQL is a maintenance process that helps to facilitate persistent operations.
The main tasks of vaccuum processing are to remove dead tuples and freeze transaction ids which are no longer active.

Two different modes are provided by vacuum processing to remove dead tuples, namely:

Concurrent VACUUM
Full VACUUM

Concurrent VACUUM, which is often simply referred simply as VACUUM, works by removing dead tuples for each page of the table file but meanwhile other transactions can still read the table while the process in running.

On the other hand, Full VACUUM works by removing dead tuples and also defragments the whole file but while Full VACUUM is running, the table cannot be accessed by other transactions.

Work on improving the functionality of vacuum has been rather slow compared to other functions despite it being essential in PostgreSQL. It is a costly and expensive process because vacuum processing involves scanning the whole table.

When it comes to freezing old tax ids, it removes unnecessary parts of the clog that are not being used actively if possible.

In the first block, freeze processing is performed and index tuples that point to old tuples are removed. Firstly, a target table is scanned by PostgreSQL to build a list of dead tuples and freeze old tuples if possible. This list is then stored in maintenance_work_mem in the local memory area.

After the scanning is done, index tuples are removed by postgreSQL by pointing to the deadtuples list. Internally, this process is known as the "cleanup stage" and needless to say, this is an expensive process.

Next, in the second block, dead tuples are removed and both the FSM and VM are updated page-by-page.

Lastly, the third block performs a cleanup after the indexes have been deleted and it also updates both the system catalogs and statistics that are related to the vacuum processing for each of the target tables.

On Overview of Concurrency Control in PostgreSQL

Haseeb Ashraf — Mon, 26 Jun 2023 19:41:44 +0000

Today, we will take a look at concurrency control in PostgreSQL.
It is a control mechanism that maintains isolation and atomicity, which are some of the important properties that are required to run many transaction parallelly in the database.
Overall, concurrency control can be categorized into three main techniques:

Multi-version Concurrency Control (MVCC)
Strict Two-Phase Locking (S2PL)
Optimistic Concurrency Control (OCC)

Whenever a data item is read through a transaction, one of the versions is selected by the system to make sure that the system ensures isolation of the transaction.
The main reason MVCC is considered advantageous is that readers and writers don't block each other.

In some versions of RDBMS and PostgreSQL, a variations of MVCC is used, known as Snapshot Isolation(SI).
Rollback segments are used by some database systems to implement SI. Whenever an item is being read, an appropriate version of an item is used by PostgreSQL in response to an individual transaction that is using visibility check rules.

Whenever a transaction is initiated in PostgreSQL, a unique identifier is assigned to each transaction known as a transaction id(txid). This txidis a 32-bit unsigned integer with approximately 4.2 billion unique combinations. Whenever a transaction starts, the built-in txid_current() function returns the current txid.

The above was a short overview of concurrency control in PostgreSQL.

Foreign Data Wrappers in PostgreSQL

Haseeb Ashraf — Fri, 23 Jun 2023 21:44:26 +0000

In this article, we will discuss two of the most interesting and practical features in PostgreSQL. Namely, Foreign Data Wrappers (FDW) and Parallel Query.

A table that is located on a remote server in SQL is known as a foreign table. In PostgreSQL, the tool used to manage foreign tables using SQL are known as Foreign Data Wrappers (FDW)

The foreign table from the remote server can be accessed after installing the required extension and tweaking the appropriate settings. Furthermore, even join operations can be executed with foreign tables that are located in different servers which are like the local tables.

Here is a brief overview of how the FDWs perform:

The analyzer generates a query tree from the input SQL.
A connection to the remote server is made using the executor/planner.
In case the use_remote_estimate option is turned on, which is switched of by default, the planner will execute the EXPLAIN command that helps the user by generating a cost estimate for each plan path.
Next, the executor will transfer a plain text SQL statement to the remote server and will consequently receive the results.
Lastly, the received data is processed by the executor if necessary. E.g. the executor will perform a join processing in case a multi-table query is executed.

There are also a number of useful multi-table operations in PostgreSQL that can be used to perform various functions. Some of which are:

Sort operations:
Sort operations such as ORDER BY are processed on the local server. The local server fetches all the required rows from the remote server before the sort operation is executed.
Aggregate functions:
Similar to sort operations, aggregate functions are also processed on the local server. Some examples of aggregate functions include: AVG() and COUNT(). The executor sends the relevant query to the remote server and then retrieves the relevant query results.

Query Processing in PostgreSQL

Haseeb Ashraf — Fri, 23 Jun 2023 12:02:45 +0000

Query processing ins PostgreSQL is the most complicated system and it efficiently processes the supported SQL. In this article, we will look at query processing, in particular focusing on query optimising.

In postgreSQL, a parallel query implemented in version 9.6 uses multiple background worker processes, all queried issued by a connected client are handled by a backend process. This backend process consists of five subsystems as shown below.

Parser
This helps to generate a parse tree from the SQL statements that were written in plain text.
The root node of this parse tree is the SelectStmt structure that is defined in the parsenodes.h. The SELECT query and the corresponding elements of the parse tree have the same numbering. Since the parser only checks the input syntax when generating the parse tree, it only generates an error when there is a syntax error in the query.
Analyser
Semantic analysis of the parse tree is carried out and a query tree is generated.
The query structure defined in the parsenodes.h is the root of the query tree. This structure also includes the metadata of its corresponding query. A query tree includes things such as a targetlist which is a list of columns that are the result of the query. The join tree saves the FROM clause and the WHERE clauses.
Rewriter
Given that such rules exist, the rewriter uses the rule system to transform the query tree.
The rewriter has a system that generates the rule system. It looks at the rules stored in the pg_rules system catalog and rewrites the query tree. The rule system itself is quite intriguing but we will omitt it from our short summary to prevent it from becoming too long.
Planner
The plan tree that can most effectively be executed from the query tree is generated by the planner.
The planner in the PostgreSQL is a purely cost-based optimisation system and it doesn't support rule based optimisation and hints. Consequently, this planner is the most complicated system in an RDBMS. A plan tree consists of elements called plan nodes and is connected to the plantree list. Each of the plan tree consists of information that the executor requires for processing.
Executor
This executes the query by accessing the tables and indices that were created by the plan tree in order.
The executor reads and writes indexes and tables in the database cluster with the help of the buffer manager that will be further described in upcoming summaries. The executor occupies some memory areas, line the temp_buffers and work_mem are allocated in advance and temporary files are created if necessary.

Process and Memory architecture in PostgreSQL

Haseeb Ashraf — Wed, 21 Jun 2023 14:05:58 +0000

The architecture of PostgreSQL is such that it is a client/server type relational database management system and has a multi-process architecture that runs on a single host.

A PostgreSQL server is a collection of several processes cooperatively managing one database cluster that contains the following types of processes:

A postgres server process is a parent of all processes related to a database management system.
Every single backend process is responsible for managing all queries and statements that are generated by any connected client.
Different background processes help with processes of each feature for the database management system.
The replication associated process is responsible for performing the streaming replication process.
Lastly, the background worker process, can work on any processing that is demanded by the users.

A postgres server process starts up by executing the pg_ctl command with the start option. Next, it allocates shared memory, begins various background processes, starts a few other necessary processes and waits for a connection request from any online clients. A backend process is started whenever it receives a connection request from a client.

A backend process starts by the postgres server process and handles all queries issued by a connected client. It uses a TCP connection to communicate with the client and then terminates when the client is disconnected. As a backend process is allowed to only operate a single database, the client has to specify explicitly the database they want to use when connecting.

Now let's talk about the memory architecture of postgreSQL.

The memory architecture of postgreSQL can be classified into two major classes:

Local memory area - this is assigned by each backend process for its own use.
Shared memory area - this is used collectively by all the processes running on a PostgreSQL server.

In addition to these, PostgreSQL also allocates various memory areas as mentioned below:

Sub areas for the various access control mechanisms.
Sub areas for various background process e.g. autovaccum and checkpointer.
Sub areas for transaction processing such as two-phase-commit and save-point.

This was a short summary of the memory and process architecture of PostgreSQL.

An Overview of Database Clusters, Databases, and Tables in PostgreSQL

Haseeb Ashraf — Sat, 17 Jun 2023 23:11:32 +0000

In PostgreSQL, a database cluster is not a group of database servers but instead it is a single host that runs and looks after the database cluster as a whole.

The logical structure of a database cluster is such that it is a database object and these objects are logically separate from one another. All of the PostgreSQL database objects are internally managed by assigning each of them their own separate object identifiers (OIDs). These OIDs are 4-byte integers that help store the relation between database objects in proper system catalogs, based on the kind of object.

As for the physical structure, the database cluster is mainly a single directory known as a base directory which includes a few subdirectories and several files within the subdirectories. In this structure, a database is a subdirectory with the base directory as the root with each of the tables and indexes are a single file saved in the subdirectory of the database where it originally belongs.

A tablespace in in PostgreSQl is an extra space to store data outside the base directory. It is created where the directory is specified when the CREATE TABLESPACE statement is issued.

The internal layout of the Heap Table File is as follows:

Heap tuple(s)
This is a record of the data. These tuples are stacked in such an order that they are from the bottom of the page.
Line pointer(s)
This is a 4 byte long pointer that points to each head tuple.
Header data
This is defined by the structure PageHeaderData and is assigned at the starting of the page.

In conclusion, this was a short overview of database clusters, databases, and tables in postgreSQL.