Understanding internals of PostgreSQL - Chapter 02

Welcome back to my series on the internals of PostgreSQL. In the previous post, I shared a summary of my learnings from chapter 01 of the book "The Internals of PostgreSQL" by Hironobu SUZUKI. In this post, I'll be sharing the key takeaways from chapter 02, which covers the process and memory architecture of PostgreSQL. Understanding these concepts is crucial for anyone working with PostgreSQL databases, as it can help you optimize database performance and avoid common mistakes. So, let's dive in!

Process Architecture

The process architecture of PostgreSQL is a key component of its client/server type relational database management system. It features a multi-process architecture that runs on a single host. A PostgreSQL server is a collection of multiple processes that cooperatively manage a database cluster. These processes include

the postgres server process
backend processes
background processes

There are other processes like replication associated processes and background worker process. The postgres server process is the parent of all processes related to the database cluster management, and it starts up by executing the pg_ctl utility with start option. The backend process, which is also known as postgres, handles all queries and statements issued by a connected client. PostgreSQL allows multiple clients to connect simultaneously, and the maximum number of clients is controlled by the configuration parameter max_connections. To deal with the issue of frequent connection and disconnection with a PostgreSQL server, a pooling middleware such as pgbouncer or pgpool-II can be used. Finally, the background processes are responsible for performing various features such as VACUUM and CHECKPOINT processes.

Memory Architecture

PostgreSQL's memory architecture can be broadly classified into two categories - local memory area and shared memory area. Each backend process allocates a local memory area for query processing, which is further divided into several sub-areas with fixed or variable sizes. The sub-areas include a transaction processing area, a query processing area, and various sub-areas for parsing, planning, and execution of queries. The major sub-areas in the local memory area are

work_mem: Executor uses this area for sorting tuples by ORDER BY and DISTINCT operations, and for joining tables by merge-join and hash-join operations.
maintenance_work_mem: Some kinds of maintenance operations (e.g., VACUUM, REINDEX) use this area.
temp_buffers: Executor uses this area for storing temporary tables.

On the other hand, a shared memory area is allocated by the PostgreSQL server during startup and is used by all the processes of the server. It is divided into several fixed-sized sub-areas, such as the buffer cache, the shared memory context, and the WAL buffers. The major sub-areas in the shared memory area are

shared buffer pool: used for loading pages within tables and indexes from a persistent storage and operating them directly.
WAL buffer: a buffering area for WAL data, which is a transaction log used to ensure that no data is lost in case of server failures.
commit log: used for storing the states of all transactions for Concurrency Control mechanism.

Additionally, PostgreSQL also allocates several areas for access control mechanisms, background processes like autovacuum and checkpointer, transaction processing, and other functionalities.

In conclusion, PostgreSQL's process and memory architecture are key components that make it a powerful and flexible database management system. Its multi-process architecture enables efficient handling of multiple clients and database management, while its memory architecture optimizes query processing and storage. Overall, PostgreSQL's robust architecture provides a solid foundation for reliable and scalable database operations.