Raja Rakshak

Posted on Jun 23, 2023 • Edited on Jun 29, 2023

Internals of PostgreSQL : Chapter 3: Query Processing

#apacheage #postgres #postgressql #database

The query processing subsystem of PostgreSQL is described in this chapter. It gives a general overview of the subsystems, such as the parser, analyzer, rewriter, planner, and executor, that are involved in query processing.

A summary of each subsystem is given below:

Parser: The parser creates a parse tree from a plain text SQL statement. If there are any syntax mistakes, it returns an error and checks the input's syntax. It does not, however, run any semantic analyses.

Analyzer: The analyzer creates a query tree by performing a semantic analysis on the parse tree the parser produced. It assigns metadata to the query tree, such as the specific clauses and the type of command (SELECT, INSERT, etc.).

Planner: The planner creates an optimized plan tree using the query tree that the rewriter provided. The planner determines the most practical plan by using a cost-based optimization approach to assess the costs of various processes. The planner in PostgreSQL only supports cost-based optimization and does not support hints or rule-based optimization.

Executor: Using the plan tree the planner developed as a guide, the executor runs the query by gaining access to the tables and indexes. When reading and writing data, it does so in the sequence listed in the plan tree. The concurrency management technique is used by the executor to preserve transaction consistency and isolation while utilizing memory spaces and temporary files as necessary.

The idea of costs in PostgreSQL's query optimization is also explained in this chapter. Costs are dimensionless quantities that are used to compare the adequacy of operation performance. Start-up costs, ongoing costs, and total costs are the three different sorts of expenses. The expenses associated with fetching each tuple are broken down into three categories: start-up costs, run costs, and total costs. The plan tree and related expenses for a query can be viewed using the EXPLAIN command.

Start-up cost, run cost, and total cost are the three different categories of costs in PostgreSQL. The cost incurred prior to fetching the first tuple is known as the start-up cost, whereas the cost incurred to fetch all tuples is known as the run cost. The sum of the start-up and operating costs is the total cost.

Let's look more closely at how the costs for sequential scans, index scans, and sort operations are estimated.

Sequential Scan: The cost_seqscan() function estimates the price of a sequential scan. The sum of the CPU run cost and the disc run cost is used to determine the run cost of a sequential scan.

Run cost = (CPU tuple cost + CPU operator cost) * Ntuple + seq_page_cost * Npage,

where seq_page_cost, CPU tuple cost, and CPU operator cost are parameters set in the postgresql.conf file. Ntuple and Npage represent the number of tuples and pages in the table, respectively.

Index Scan: The cost_index() function can be used to calculate the price of an index scan. The start-up cost plus the run cost make up the index scan cost.

2.1 Beginning Cost:
The price to read the index pages in order to access the first tuple in the target table is the start-up cost of an index scan. The formula is used to compute it:

Start-up costs are calculated as follows: 50 * cpu_operator_cost, ceiling(log2(Nindex_tuple)) + (Hindex + 1),

where Hindex is the height of the index tree and Nindex_tuple is the number of tuples in the index.

2.2 Run Cost: The CPU and IO expenses for the index and the table are included in the run cost of an index scan. The table CPU cost and table IO cost are not assessed if Index-Only Scans can be used.

The following formulas are used to compute the index CPU cost, table CPU cost, index IO cost, and table IO cost:

Index IO cost = ceil(selectivity * Nindex_page) * random_page_cost, Index CPU cost = selectivity * Nindex_tuple * (cpu_index_tuple_cost + qual_op_cost), Table CPU cost = selectivity * Ntuple * cpu_tuple,

Nindex_page is the number of index pages, cpu_index_tuple_cost and random_page_cost are parameters provided in the postgresql.conf file, and qual_op_cost is the evaluating cost of the index. Selectivity is the percentage of the search range of the index by the specified WHERE clause.
The total run cost is the sum of the index CPU cost, table CPU cost, index IO cost, and table IO cost.

Sort Operation: The cost_sort() function can be used to calculate the price of a sorting operation. The start-up cost and the running cost are both included in the sort cost. The start-up cost relates to the expense of getting the data ready for sorting, whereas the run cost relates to the expense of doing the actual sorting.
The cost_sort() function considers factors such the quantity of tuples to be sorted, their average width, and the amount of memory required for sorting.

These are the fundamental ideas guiding cost estimate in PostgreSQL single-table queries. Depending on the precise configuration options, the qualities of the database, and the query, the real costs could change.

Struggling with slow API calls?

Dan Mindru walks through how he used Sentry's new Trace View feature to shave off 22.3 seconds from an API call.

Get a practical walkthrough of how to identify bottlenecks, split tasks into multiple parallel tasks, identify slow AI model calls, and more.

Top comments (0)

Try REST API Generation for Snowflake

DevOps for Private APIs. Automate the building, securing, and documenting of internal/private REST APIs with built-in enterprise security on bare-metal, VMs, or containers.

Auto-generated live APIs mapped from Snowflake database schema
Interactive Swagger API documentation
Scripting engine to customize your API
Built-in role-based access control

Learn more

DEV Community

Internals of PostgreSQL : Chapter 3: Query Processing

Struggling with slow API calls?

Top comments (0)

Try REST API Generation for Snowflake

Read next

Unlocking Efficiency: The Best Developer Tools for 2025

How to Draw Shapes on a Tkinter Canvas in Python in 2025?

Mastering the Money Matters of Open Source: Navigating the Financial Landscape

Open Source NFT Platforms: Empowering Indie Creators

Okay