A complex query optimisation method that depends on cost estimate is used by PostgreSQL, a potent open-source relational database management system. By comparing and evaluating the relative performance of various operations, the database engine can select the most effective execution plan thanks to cost-based optimisation. We will examine the idea of cost estimate in single-table queries in this blog article, as well as how PostgreSQL computes the costs related to various actions.
PostgreSQL Costs:
PostgreSQL costs are dimensionless variables that give the user a way to compare the relative performance of operations. It's crucial to remember that expenses are comparative tools rather than absolute performance metrics. 'costsize.c' contains routines that the query optimizer uses to estimate costs. The executor's cost function corresponds to each action it does.
Three main sorts of expenses:
- Start-up costs
- Run cost
- Total costs
While the run cost is the expense incurred once the first tuple is obtained, the start-up cost is the expense incurred before that. Start-up and operating expenses together make up the overall cost.
Sequential Scan
Let's investigate the sequential scan cost estimate procedure, a fundamental PostgreSQL operation. A sequential scan normally has no startup costs, and the following calculation is used to determine run costs:
run cost is calculated as follows:
(cpu_tuple_cost + cpu_operator_cost) * Ntuple + seq_page_cost * Npage
In this case, "Ntuple" denotes the overall number of tuples in the table, while "Npage" denotes the number of table pages. The values of the 'cpu_tuple_cost', 'cpu_operator_cost', and'seq_page_cost' parameters are programmable variables set forth in the 'postgresql.conf' file.
Let's use a table with 10,000 tuples spread across 45 pages, named "tbl," as an example. We may calculate the run cost as follows by using the method described above:
run cost = (0.1 + 0.0025) x 10,000 x 45, or 170.0.
As a result, the sequential scan procedure would cost a total of 170.0. Typically, when using PostgreSQL's 'EXPLAIN' command, this data is shown.
Index Scan
In PostgreSQL, the 'cost_index ()' function is used to estimate the cost of an index scan. The cost estimation procedure is the same regardless of the individual index technique used (for example, BTree, GiST, GIN, or BRIN).
PostgreSQL takes both the start-up cost and the run cost into account when estimating the cost of an index scan. To access the first tuple in the target database, it is necessary to read index pages, which is represented by the start-up cost. The following equation establishes it:
start-up costs = ceil(log2(Nindex_tuple)) + (Hindex + 1) * 50 * cpu_operator_cost.
Here, the terms "Nindex_tuple" and "Hindex" stand for the height of the index tree and the number of tuples in the index, respectively.
The total of the CPU and IO expenses for the index and the table constitutes the run cost of an index scan. The cost calculation algorithms depend on variables like selectivity, tuple numbers, and configuration settings.
Cost estimation heavily relies on selectivity, which is the percentage of the search range that is included in the index based on the supplied WHERE clause. Utilising information from the pg_stats system view, such as histogram boundaries and most frequent values, PostgreSQL calculates selectivity.
By considering these variables, PostgreSQL can calculate the price of index scans precisely and make deft query optimization choices.
Cost:
A key component of PostgreSQL's query optimisation mechanism is estimated. For single-table queries, the query optimizer can choose the most effective execution plan by evaluating the costs related to various operations.
In this blog article, we looked at the mechanisms for sequential scans and index scans' cost estimate processes as well as the major variables at play.
This blog is a summary of chapter 3 part 2 from the book The Internals of PostgreSQL written by Hironobu SUZUKI.
Top comments (0)