DEV Community

m-hashir
m-hashir

Posted on

Internals of PostgreSQL - Chapter 3

This chapter describes how PostgreSQL implements query processing. It supports three features, which are Foreign Data Wrappers (FDW), Parallel Query and JIT compilation. A backend process handles all queries issued by a client, and have five subsystems, which are Parser, Analyzer, Rewriter, Planner and Executor.
The parser generates a parse tree that the subsequent systems read. The parser only checks for syntax errors, so it only returns a syntax error and does not check the semantics. The analyser, in turn, runs a semantic analysis of the parse tree and generates a query tree. The rewriter works to change the query tree keeping into consideration the rule systems of PostgreSQL. The planner then receives the rewritten query tree and generates a plan tree to be executed. It runs on cost-based optimized and is the most complex subsystem in RDBMS. A plan tree is composed of plan nodes, which have information that the executor requires for processing, and it traverses from end to root of the plan tree. The executor reads and writes tables and indexes in the database cluster via a buffer manager and also uses some memory areas and can create temporary files if necessary and can also use the concurrency control mechanism to maintain consistency.
As stated before, query optimization in PostgreSQL is based on cost wich are non-absolute dimensionless values but are indictors of relative performance. They are estimated by functions defined in costsize. There are three kinds of costs:

  1. Start-up: Cost expended before first tuple is fetched
  2. Run: Cost to fetch all tuples
  3. Total: Sum of both startup and run cost The cost of sequential scan is done by cost_seqscan() function. This has zero start-up cost and run cost is defined by the following formula: ‘run cost’=‘cpu run cost’+‘disk run cost’=(cpu_tuple_cost+cpu_operator_cost)×Ntuple+seq_page_cost×Npage For index scan, common cost function cost_index() is used. The startup cost for this is calculated by: ‘startup cost’={ceil(log2(Nindex,tuple))+(Hindex+1)×50}×cpu_operator_cost, Furthmore, the run cost is calculated by: ‘run cost’=(‘index cpu cost’+‘table cpu cost’)+(‘index IO cost’+‘table IO cost’). Here the table IO cost is defined by the following formula: ‘table IO cost’=max_IO_cost+indexCorrelation2×(min_IO_cost−max_IO_cost). Total cost will be calculated by taking the sum of both of these costs.

Sort path is used for sorting operations such as ORDER BY or preprocessing of merge join operations and is calculated by cost_sort() function. If all tuples can be stored in work_mem, quicksort is used otherwise temp file is created and file merge sort is used as an algorithm. The startup cost is the cost of sorting the target tuples, while the run cost is the cost of reading through all the tuples. They can be better defined by the following formulae:
‘start-up cost’=C+comparison_cost×Nsort×log2(Nsort),
‘run cost’=cpu_operator_cost×Nsort

A planner performs three steps, which are carrying out preprocessing, getting cheapest access path and creating plan tree. Preprocessing could be simplification, normalization and flattening. Getting the cheapest access path is done by estimating costs of all access paths and choosing the cheapest one. The last stage is generating a plan tree, which has four representative fields. The plan nodes also have seven representative fields themselves which can be cost, rows and left/right tree.
In single-table queries, the executor takes the plan nodes from end of the plan tree to the root and invokes function to process nodes.
PostgreSQl supports three join operations, which are as follows:

Nested Loop Join

The most fundamental join operation, PostgreSQL supports five variations of it.

  1. Nested Loop Join: It has zero startup cost and run cost proportional to size of outer and inner tables. The following formula is used for it:
  2. Materialized Loop Join: To optimize the scanning cost of the inner table in a nested loop join, PostgreSQL supports materialized nested loop join. Before execution, the inner table tuples are either stored in work_mem or a temporary file, allowing for more efficient processing during the join operation.
  3. Indexed Nested Loop Join: When an index exists on the inner table and can be used to lookup tuples that satisfy the join condition, PostgreSQL employs the indexed nested loop join. It performs efficiently by processing the join operation based on a single loop of the outer table. The index scan path is used to look up inner tuples based on the outer table's attribute value, known as a parameterized index path.
  4. Other Variations: Using an index on the outer table, particularly when its attributes are involved in the join condition or act as access predicates, can significantly reduce the cost of nested loop joins by narrowing the search range.

Merge Join

Merge join can only be used in natural joins and equi-join and is estimated by initial_cost_mergejoin() and final_cost_mergejoin() functions. It has four variations:

  1. Merge Join: If all tuples can be stored in memory, sorting operations will be able to carried out in the memory itself otherwise temporary files are used.
  2. Materialized Merge Join: Same in nested loop join, this supports materialized merge join to materialize the inner table to make the inner table scan more efficient.
  3. Other Variations: Similar to the nested loop join, the merge join in PostgreSQL also has variations based on which the index scanning of the outer table can be carried out.

Hash Join

The hash join in PostgreSQL is limited to natural joins and equi-joins. It operates differently based on table sizes, using a two-phase in-memory hash join for small inner tables (≤25% of work_mem) and a hybrid hash join with skew method for larger tables. Cost estimation is complex but generally assumes no conflicts, with start-up and run costs dependent on the sizes of the outer and inner tables.

  1. In-Memory Hash Join: The in-memory hash join in PostgreSQL operates within the work_mem using a hash table divided into batches or buckets, determined by ExecChooseHashTableSize(). It consists of a build phase, where inner table tuples are inserted into the batch, and a probe phase, where each outer table tuple is compared with the inner tuples in the batch for join conditions. This method improves join performance
  2. Hybrid Hash Join With Skew: When the inner table's tuples cannot fit into a single batch in work_mem, PostgreSQL utilizes the hybrid hash join with the skew algorithm. It involves multiple batches, some stored in work_mem and others in temporary files, to accommodate and process the tuples efficiently.
  3. Index Scans in Hash Join: Hash join in PostgreSQL uses index scans if possible.

Top comments (0)