DEV Community

Pawan Kukreja
Pawan Kukreja

Posted on

[Summary] Chapter#03 "The Internals of PostgreSQL" Query Processing (Part-03)

Join Operation:
PostgreSQL supports three join operations:

  • Nested Loop Join It is the fundamental join operation and it can be used in any join condition. It does not need any start-up cost. The run cost of nested loop join is proportional to product size of outer and inner table.

Materialised Nested Loop Join:
Materialised nested loop join used to reduce the total scanning cost of the inner table. Before running a nested loop join, the executor writes the inner table tuple to the work_mem or temporary file by scanning the inner table. It uses the inner table tuple more efficiently.

  • Merge Join
    Merge join is only used in natural joins and equi joins, and the cost of merge join is estimated by initial_cost_mergejoin() and final_cost_mergejoin() functions. Cost estimation is start-up cost of merge join is sum of sorting costs of both inner and outer tables, start-up cost is O(Nouter log2(Nouter)+Ninner log2(Ninner)).

  • Hash Join
    Hash join can be used only in natural joins and equi-joins. It depends on the size of table, if it is small enough then it will be two-phase in memory hash join, otherwise hybrid hash join is used with skew method, and the cost estimation is omitted.

  • In-Memory Hash Join:
    Hash table area is called batch in PostgreSQL and batch has hash slots called buckets and number of buckets described by ExecChooseHashTableSize() function.

  • Hybrid Hash Join with Skew:
    It is used when the tuple of the inner table cannot be stored into one batch in work_mem. In the first build and prob phase, PostgreSQL prepares multiple batches. The number of batches is the same as the number of buckets.

Join Access Paths and Join Nodes:

  • Join access path: An access path of the nested loop join is the join path structure and other join access paths, Merger path and hash path are based on it.
  • Join Nodes: This subsection shows the three join nodes:
  • NestedLoopNodes
  • MergeJoinNodes
  • HashJoinNodes

Creating the Plan Tree of Multiple-Table Query:

Preprocessing:
Preprocessing for multiple-table query will be described:

  • Planning and converting CTE
  • Pulling Subqueries Up
  • Transforming an outer join to an inner join

Getting Cheapest Path:
It is a very expensive process if planners consider all combinations of indexes and join methods and it will be infeasible if the number of tables exceeds a certain level. If a number less than 12 planner can get optimal results by applying dynamic programming, otherwise genetic algorithms.

Reference

Top comments (0)