DEV Community

Pawan Kukreja
Pawan Kukreja

Posted on • Updated on

[Summary] Chapter#03 "The Internals of PostgreSQL" Query Processing (Part-02)

Cost Estimation in Single-Table Query:

PostgreSQL query is optimised on the basis of cost, and costs are estimated by function defined in costsize.c. All operations executed by the executor have corresponding cost functions, like sequential scans are estimated by cost_seqscan().
There are three types of cost:

  • Startup: Cost expanded before the first tuple is fetched. Like startup cost of the index scan node is the cost to read index pages to access the first tuple in the target table.

  • Run: This cost fetches all tuples.

  • Total: This cost is the sum of both startup and Run costs.

Sequential Scan:
This cost is estimated by cost_seqscan() function.

Index Scan:
PostgreSQL supports index methods like BTree, GiST, GIN, BRIN, and the cost of index scan is estimated using cost_index() function.

  • Sort: The cost of sorting is estimated by using cost_sort() function, if all tuples to be sorted are stored in work_mem and it uses the quick sort algorithm. The start-up cost of the sort path is the cost of sorting the target tuples, therefore the cost is O(Nsort * log2(Nsort)) where Nsort is the number of tuples to be sorted.

Creating the Plan Tree of Single-Table Query:
This plan describes “How a plan tree of a single table query is created and How a plan tree of a multiple-table query is created”.
Planner in PostgreSQL performs three steps as below:

  • Carry out preprocessing

  • Get the cheapest access path by estimating the costs of all possible access paths

  • Create the plan tree from the cheapest path
    Access paths are only used inside the planner to create the plan tree and the most fundamental data structure of access paths is the Path structure, and it is the unit of processing for estimating the cost.

Preprocessing:
Here are the main steps for preprocessing:

  • Simplification target list clauses and so on.
    For eg: 2+2 is rewritten to 4 by eval_const_expressions() function

  • Normalizing Boolean expressions
    For eg: NOT(NOT a) is rewritten as a

  • Flattening AND/OR expressions
    In PostgreSQL internals, AND and OR are n-ary operators and the planner always assumes that all nested AND and OR expressions are to be flattered.

Getting the cheapest Access path:
Following operation will be performed for selection of the cheapest path.

  • Create a RelOptInfo structure to store the access paths and the corresponding costs.
  • Estimate the costs of all possible access paths and add the access paths to the RelOptInfo structure.
  • Get the cheapest access path in the pathlist of the RelOptInfo structure
  • Estimate LIMIT, ORDER BY and ARREGISFDD costs if necessary.

Creating a Plan Tree:
The root of the plan tree is a PlannedStmt structure and it contain nineteen fields and following are four fields:

  • CommandType
  • rtable
  • relationOids
  • Plantree Plantree is composed of various plan nodes and plan nodes contains fourteen field. Following are seven representative fields
  • start-up cost

  • total_ cost

  • rows

  • targetlist

  • qual

  • lefttree

  • righttree

HOW THE EXECUTOR PERFORMS

In single-table queries, the executor takes the plan nodes in an order from the end of the plan tree to the root and then invokes the function that performs the processing of the corresponding nodes, each plan node has functions that are meant for executing the respective operation.

Reference

Top comments (0)