DEV Community

Raja Rakshak
Raja Rakshak

Posted on • Updated on

Internals of PostgreSQL : Chapter 3: Query Processing

The query processing subsystem of PostgreSQL is described in this chapter. It gives a general overview of the subsystems, such as the parser, analyzer, rewriter, planner, and executor, that are involved in query processing.

A summary of each subsystem is given below:

Parser: The parser creates a parse tree from a plain text SQL statement. If there are any syntax mistakes, it returns an error and checks the input's syntax. It does not, however, run any semantic analyses.

Analyzer: The analyzer creates a query tree by performing a semantic analysis on the parse tree the parser produced. It assigns metadata to the query tree, such as the specific clauses and the type of command (SELECT, INSERT, etc.).

Planner: The planner creates an optimized plan tree using the query tree that the rewriter provided. The planner determines the most practical plan by using a cost-based optimization approach to assess the costs of various processes. The planner in PostgreSQL only supports cost-based optimization and does not support hints or rule-based optimization.

Executor: Using the plan tree the planner developed as a guide, the executor runs the query by gaining access to the tables and indexes. When reading and writing data, it does so in the sequence listed in the plan tree. The concurrency management technique is used by the executor to preserve transaction consistency and isolation while utilizing memory spaces and temporary files as necessary.

The idea of costs in PostgreSQL's query optimization is also explained in this chapter. Costs are dimensionless quantities that are used to compare the adequacy of operation performance. Start-up costs, ongoing costs, and total costs are the three different sorts of expenses. The expenses associated with fetching each tuple are broken down into three categories: start-up costs, run costs, and total costs. The plan tree and related expenses for a query can be viewed using the EXPLAIN command.

Start-up cost, run cost, and total cost are the three different categories of costs in PostgreSQL. The cost incurred prior to fetching the first tuple is known as the start-up cost, whereas the cost incurred to fetch all tuples is known as the run cost. The sum of the start-up and operating costs is the total cost.

Let's look more closely at how the costs for sequential scans, index scans, and sort operations are estimated.

Sequential Scan: The cost_seqscan() function estimates the price of a sequential scan. The sum of the CPU run cost and the disc run cost is used to determine the run cost of a sequential scan.

Run cost = (CPU tuple cost + CPU operator cost) * Ntuple + seq_page_cost * Npage,

where seq_page_cost, CPU tuple cost, and CPU operator cost are parameters set in the postgresql.conf file. Ntuple and Npage represent the number of tuples and pages in the table, respectively.

Index Scan: The cost_index() function can be used to calculate the price of an index scan. The start-up cost plus the run cost make up the index scan cost.

2.1 Beginning Cost:
The price to read the index pages in order to access the first tuple in the target table is the start-up cost of an index scan. The formula is used to compute it:

Start-up costs are calculated as follows: 50 * cpu_operator_cost, ceiling(log2(Nindex_tuple)) + (Hindex + 1),

where Hindex is the height of the index tree and Nindex_tuple is the number of tuples in the index.

2.2 Run Cost: The CPU and IO expenses for the index and the table are included in the run cost of an index scan. The table CPU cost and table IO cost are not assessed if Index-Only Scans can be used.

The following formulas are used to compute the index CPU cost, table CPU cost, index IO cost, and table IO cost:

Index IO cost = ceil(selectivity * Nindex_page) * random_page_cost, Index CPU cost = selectivity * Nindex_tuple * (cpu_index_tuple_cost + qual_op_cost), Table CPU cost = selectivity * Ntuple * cpu_tuple,

Nindex_page is the number of index pages, cpu_index_tuple_cost and random_page_cost are parameters provided in the postgresql.conf file, and qual_op_cost is the evaluating cost of the index. Selectivity is the percentage of the search range of the index by the specified WHERE clause.
The total run cost is the sum of the index CPU cost, table CPU cost, index IO cost, and table IO cost.

Sort Operation:  The cost_sort() function can be used to calculate the price of a sorting operation. The start-up cost and the running cost are both included in the sort cost. The start-up cost relates to the expense of getting the data ready for sorting, whereas the run cost relates to the expense of doing the actual sorting.
The cost_sort() function considers factors such the quantity of tuples to be sorted, their average width, and the amount of memory required for sorting.

These are the fundamental ideas guiding cost estimate in PostgreSQL single-table queries. Depending on the precise configuration options, the qualities of the database, and the query, the real costs could change.

Top comments (0)