DEV Community

Sadeed Ahmad
Sadeed Ahmad

Posted on

The Internals of PostgreSQL: Query Processing - Part 1

Query processing is a critical aspect of any database management system, and PostgreSQL excels in this domain with its efficient and versatile query processing capabilities. In this article, we delve into some valuable insights to gain a comprehensive understanding of PostgreSQL's query processing mechanisms.

Backend Subsystems

We cover the first two systems in this part 1.

Parser
The parser generates a parse tree from an SQL statement in plain text.

Analyzer
The analyzer carries out a semantic analysis of a parse tree and generates a query tree.

Image description

Parser

The parser in PostgreSQL is responsible for transforming an SQL statement written in plain text into a parse tree. This parse tree serves as a structured representation of the SQL statement, allowing subsequent subsystems to interpret and analyze it. The root node of the parse tree is the SelectStmt structure, defined in parsenodes.h.

Image description
The numbering of elements in the SELECT query and the corresponding parse tree elements are identical. For instance, the first item in the target list, which is the column 'id' of the table, is numbered (1), while the WHERE clause is numbered (4), and so on.
The parser is responsible for checking the syntax of the input query when generating a parse tree. Therefore, it only reports an error if there is a syntax error in the query.
However, the parser does not verify the semantics of the input query. For example, if the query contains a table name that does not exist, the parser does not report an error. The analyzer is responsible for performing semantic checks.

Analyzer

After the parser generates a parse tree, the analyzer performs a semantic analysis and produces a query tree.
The Query structure, which is defined in parsenodes.h, serves as the root of the query tree. This structure contains metadata about the query.

Image description

Here is a brief description of the query tree:
The targetlist is a list of columns that represent the output of the query. In this example, the targetlist consists of two columns: 'id' and 'data'. If the input query tree uses the '*' (asterisk) symbol, the analyzer/analyser will replace it with all of the columns explicitly.
The range table is a list of relations that are used in the query.
The join tree stores the FROM and WHERE clauses.
The sort clause is a list of SortGroupClause.

Top comments (0)