1.2.3 Query Tree Node Types: Query, RangeTblEntry, TargetEntry

#postgres #database #internals #query

After the Analyze stage from 1.2.2, the raw parse tree has become a Query tree. This section is about the shape of that result. How it differs from a raw parse tree, what abstraction sits at its core, and what assumptions the next stages (rewriter and planner) get to start from when they receive a Query tree.

Tokens descend into catalog coordinates

A raw parse tree and a Query tree are both trees, but they sit on different planes. A raw parse tree mirrors the syntactic structure of SQL text, so its nodes look a lot like tokens. Where a table name appeared, you find a node carrying that name as a token. Where a column appeared, you find another name token. The types of expressions are not yet determined.

A Query tree is what those tokens look like after they have been resolved against the catalog. The same positions hold different representations now. Where a table name was, there is now an RTE carrying that table's OID and lock mode. Where a column name was, there is now an index pair: "the M-th attribute of the N-th item in rtable". Where an operator token sat, there is now an OID pointing to the function that implements that operator. The same shape of tree carries the same kinds of information at the same positions, but the information has dropped one level down from surface SQL text into catalog coordinates.

For SELECT id, name FROM users WHERE age > 18, the contrast looks like this:

The descent into catalog coordinates is right there in the picture. Table names become OIDs, column names become pairs of (rtable index, attribute number), operator tokens become typed expression nodes. The rewriter and planner work on those coordinates and have nothing left to resolve.

Range table: every FROM input in one place

The Query carries a single List that holds every item appearing in the FROM clause. PG calls this list the range table, and an item in it a RangeTblEntry. A plain table, a subquery (FROM (SELECT ...)), a result of an explicit JOIN, a set-returning function call (FROM generate_series(...)), an inline row set built with VALUES (...), a CTE (WITH ... AS (...)). Whatever form a FROM input takes, it gets folded into a single entry in this list.

The kind tag lives in one field, rtekind, on the RTE. There is no separate node type per kind of input; the same RangeTblEntry object carries different rtekind values to distinguish them. The object type is one, and the value of rtekind decides which auxiliary fields are meaningful. For a plain table the OID and lock mode fields matter; for a subquery the sub-Query field matters; and so on.

You can see the payoff of this consolidation in the shape of the rewriter and planner code. Both stages, having received a Query, work to a single model: "take the N items in rtable, glue them together via jointree, and evaluate targetList against the result." A subquery, a JOIN, a function call all flow through the same model. Adding a new kind of FROM input is a matter of adding one line to RTEKind and filling in a few kind-specific branches.

The range table plays another role. Column references look at it. In a Query tree, a column is not a name; it is two integers, "attribute M of rtable item N". A column reference's meaning is bound to the order of items in the range table. That is why a rewriter or planner that adds, removes, or reorders rtable items must update the indices in existing column references at the same time.

Target list: SELECT, INSERT, UPDATE, RETURNING all in one shape

The other key List on a Query is the target list. The output columns of a SELECT, the values to assign in INSERT/UPDATE, and the columns returned by RETURNING all live in lists of the same shape with the same kind of node. One target list per Query for the SELECT/INSERT/UPDATE body, plus a second one for RETURNING when present.

The shape is one, but its meaning shifts slightly with context. Each entry has a field called resno (short for result number, indicating where the entry lands in the result), and the interpretation of this number depends on whether you are looking at SELECT, INSERT, or UPDATE. Take this table and see how SELECT and INSERT differ.

CREATE TABLE users (
    id    serial PRIMARY KEY,  -- attno 1
    name  text,                 -- attno 2
    age   int                   -- attno 3
);

For SELECT, resno is the ordinal position of the output column.

SELECT name, age FROM users;

This SELECT's target list has two entries. The first, holding the name expression, has resno 1; the second, holding age, has resno 2. They land in the first and second positions of the result set, respectively. Even though name has attno 2 and age has attno 3 in the table, the result positions in SELECT are independent of those attnos: they go 1, 2, in the order they appear in the SELECT list.

For INSERT, resno is the attribute number of the destination column.

INSERT INTO users (name, age) VALUES ('alice', 20);

This INSERT's target list also has two entries, but the first's resno is not 1, the second's not 2. They are 2 and 3, the attnos of name and age in the table definition. The number that lands in resno is not the order in which the columns were written, but the position of the column in the table itself. UPDATE's SET clause follows the same rule: attno in resno.

The fact that resno carries different meanings in different contexts feels odd at first, but folding two meanings into one List buys clear leverage. Expression evaluation code, target list traversal, RETURNING handling all assume one node type. The branching shrinks. The context-dependent meaning is settled when the List is built by the transformation stage, and downstream stages just read what is already there in resno.

The target list also carries items invisible to the user. When an expression appears in ORDER BY or GROUP BY but not in the SELECT list, that expression is added to the target list with a "junk" mark. Sorting and grouping can only happen by evaluating the value of that expression, so a place is needed to hold and evaluate it. Junk columns get evaluated but stripped from the final output. The visible result columns are clean from the application side, while internally a temporary column needed for sorting tags along.

Where the Query goes

Right after a Query tree is built, the rewriter takes it. RULE application, view expansion, and row-level security (RLS) policy injection all happen directly on the Query tree at this stage. The output is still a Query tree, and it can grow from zero to several Queries (when a rule expands one statement into several). Details are covered in chapter 1.3.

The rewriter hands the finished Query off to the planner. The planner's job is to convert a Query tree into a PlannedStmt. The strategy for how this SQL will be executed gets baked into the PlannedStmt: which indexes to use, what JOIN order to take, whether to go parallel. Chapter 1.4 covers this in depth.

What is interesting is that the next stage, the executor, does not look at the Query tree. The Query definition in parsenodes.h says it in one line: "the Query structure is not used by the executor." The executor's input is a PlannedStmt, and the Query tree, once turned into a plan, does not flow further down. The Query tree is the common currency of parser/analyzer/rewriter/planner, and that is where its lifetime ends.

This separation of lifetimes is reflected directly in how prepared statements work. A prepared statement is a mechanism that holds the analyzed result or plan in memory so that running the same SQL repeatedly does not have to start from the parser every time. There are two layers of holding. One layer holds the Query tree (already resolved against the catalog) in memory; the layer above holds the PlannedStmt that has gone through the planner. What is commonly called the plan cache refers to these two layers together. A held Query tree lets the next execution start from the planner; a held PlannedStmt is handed straight to the executor. If the catalog changes in between, an invalidation message arrives, the held results are discarded, and analysis or planning runs again. Because the per-stage outputs of parser, analyzer, rewriter, planner, and executor are cleanly separated, deciding which layer to invalidate and where to restart is straightforward.

What this means in practice

First, EXPLAIN works in the same shape for SELECT, INSERT, UPDATE, DELETE, and MERGE. Five kinds of statement live in the same Query data structure, and the planner produces a single PlannedStmt regardless of kind. From the application side, you don't have to wonder whether you need a different tool to inspect an INSERT plan than a SELECT plan. From the tooling side, monitoring tools can deal with plans through one interface without branching on statement kind.

Second, the same SQL can produce different Query trees depending on when it was analyzed. Because catalog coordinates get baked into the Query tree, the same SQL text produces a different tree if the catalog state seen at analysis time differs. A prepared statement that produces different plans across two connections, a plan cache that re-analyzes after a catalog invalidation, the same code pointing at different tables under different search_path settings: all of these emerge on top of this fact. From the application side, this is a single-line summary of where the intuition "same text, same behavior" breaks.