[Summary] Chapter#03 "The Internals of PostgreSQL" Query Processing (Part-03)

#apacheag #postgres #bitnine #devcommunity

Join Operation:
PostgreSQL supports three join operations:

Nested Loop Join It is the fundamental join operation and it can be used in any join condition. It does not need any start-up cost. The run cost of nested loop join is proportional to product size of outer and inner table.

Materialised Nested Loop Join:
Materialised nested loop join used to reduce the total scanning cost of the inner table. Before running a nested loop join, the executor writes the inner table tuple to the work_mem or temporary file by scanning the inner table. It uses the inner table tuple more efficiently.

Merge Join
Merge join is only used in natural joins and equi joins, and the cost of merge join is estimated by initial_cost_mergejoin() and final_cost_mergejoin() functions. Cost estimation is start-up cost of merge join is sum of sorting costs of both inner and outer tables, start-up cost is O(Nouter log2(Nouter)+Ninner log2(Ninner)).
Hash Join
Hash join can be used only in natural joins and equi-joins. It depends on the size of table, if it is small enough then it will be two-phase in memory hash join, otherwise hybrid hash join is used with skew method, and the cost estimation is omitted.
In-Memory Hash Join:
Hash table area is called batch in PostgreSQL and batch has hash slots called buckets and number of buckets described by ExecChooseHashTableSize() function.
Hybrid Hash Join with Skew:
It is used when the tuple of the inner table cannot be stored into one batch in work_mem. In the first build and prob phase, PostgreSQL prepares multiple batches. The number of batches is the same as the number of buckets.

Join Access Paths and Join Nodes:

Join access path: An access path of the nested loop join is the join path structure and other join access paths, Merger path and hash path are based on it.
Join Nodes: This subsection shows the three join nodes:
NestedLoopNodes
MergeJoinNodes
HashJoinNodes

Creating the Plan Tree of Multiple-Table Query:

Preprocessing:
Preprocessing for multiple-table query will be described:

Planning and converting CTE
Pulling Subqueries Up
Transforming an outer join to an inner join

Getting Cheapest Path:
It is a very expensive process if planners consider all combinations of indexes and join methods and it will be infeasible if the number of tables exceeds a certain level. If a number less than 12 planner can get optimal results by applying dynamic programming, otherwise genetic algorithms.

Reference

DEV Community

[Summary] Chapter#03 "The Internals of PostgreSQL" Query Processing (Part-03)

Top comments (0)

Read next

Is sourcehut git access denied for anyone?

Frontend Challenge - December Edition

Context Dump: Simplifying AI File Preparation

Enhancing User Experience with React-Joyride: How I made User onboarding seamless