DEV Community

Cover image for 1.3.3 INSERT/UPDATE/DELETE Targetlist Normalization (Default Expansion)
JoongHyuk Shin
JoongHyuk Shin

Posted on

1.3.3 INSERT/UPDATE/DELETE Targetlist Normalization (Default Expansion)

The two rewriter tasks we have seen so far, view expansion and RLS policy injection, were unpacked under the assumption that the query is a SELECT. INSERT/UPDATE/DELETE go through one more preparatory stage on top of those. Even when the user only writes INSERT INTO accounts (owner) VALUES ('alice'), what PostgreSQL ends up looking at is a query where the id, balance, and created_at slots already contain expressions. That preparation does not exist for SELECT. The subject of this section is how that preparation works, and why it has to happen before user-defined rules get applied.

DML preparation has to happen before rule application

The rewriter is divided into two phases. The first phase applies user-defined rules; the second applies view expansion and RLS to each resulting Query (fireRIRrules, seen in 1.3.1 and 1.3.2). SELECT queries have essentially nothing to do in the first phase. Only INSERT/UPDATE/DELETE receive additional processing inside the first phase.

The first step of that additional processing is targetlist normalization. For INSERT, the rewriter fills in default expressions for columns the user did not name (when those columns have defaults), and replaces any DEFAULT keyword the user wrote with the actual expression. UPDATE goes through the same normalization function, but with less to do because there are no missing slots to fill. DELETE skips the stage entirely.

Why does this normalization have to come before rule application? The answer comes from looking at how the RULE system (1.3.1) refers to columns. The accounts table used throughout this section is defined as follows.

CREATE TABLE accounts (
    id          integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    owner       text NOT NULL,
    balance     numeric DEFAULT 0,
    created_at  timestamptz DEFAULT now(),
    deleted_at  timestamptz
);
Enter fullscreen mode Exit fullscreen mode

The last column, deleted_at, has no DEFAULT clause and is not an identity column. In other words, it is a column with no default. This shape is common in the soft-delete pattern, and the way this column is handled later becomes one interesting case in this section.

Suppose an audit rule is attached to this table.

CREATE RULE log_account_insert AS ON INSERT TO accounts
    DO ALSO INSERT INTO audit_log (account_id, owner_name, logged_at)
                          VALUES (NEW.id,    NEW.owner,  now());
Enter fullscreen mode Exit fullscreen mode

The rule says "whenever someone inserts a row into accounts, also insert a row into audit_log." Inside the rule body, NEW.id and NEW.owner are virtual expressions that mean "the id column, the owner column of the row the user is about to insert." When the rule author writes these references, they make one assumption: no matter what INSERT the user throws, that row will have an id slot and an owner slot, with values in them.

Now suppose the normalization step did not exist, and the rule received the user's original query as is. The user runs INSERT INTO accounts (owner) VALUES ('alice'). The original query's targetlist contains a single entry, owner = 'alice'. The id is missing. When the rule fires, it looks into the trigger query's targetlist to resolve NEW.id. There is no matching slot. At this point PostgreSQL's rule mapping (the precise mechanism is covered later in this section) replaces the unmatched slot with a NULL constant. So the rule itself does not error out. But what ends up in audit_log is (NULL, 'alice', ...). Meanwhile the user's base INSERT goes through the planner separately, picks up an actual integer (say 17) from the sequence into the id slot, and lands (17, 'alice', ...) in accounts. The audit table is supposed to track "what row got inserted," and yet it never receives the new row's id. That is not what the rule author had in mind.

The opposite case: normalization has finished, then the rule fires. The rewriter has already filled the three missing columns (id, balance, created_at) with expressions, so the trigger query the rule sees has all four columns sitting in their proper slots. NEW.id points to the sequence expression now occupying that slot, and NEW.owner points to 'alice'. The rule mapping has nothing to fall back to NULL for. The account_id slot in the audit_log INSERT receives the same sequence expression, so at execution time audit_log gets a value tied to the same new row id that accounts received. The tracking the rule author originally assumed finally works.

The same INSERT may have some columns explicitly given and others omitted. The rule has no way to tell. The rewriter has to run before the rule and fill in the missing columns in attno order, so that the user's query the rule sees is in the standard shape. The position of this preparation is determined by what the next stage expects as input.

This is qualitatively different from what the rewriter does for SELECT. View expansion and RLS are transformations that add conditions that were not there in the user's query; DML preparation is a transformation that fills in slots that were left empty in the user's query. Both touch the tree, but with different motivations.

INSERT auto-injects default expressions into missing columns

Suppose the user fires this query against the accounts table just defined.

INSERT INTO accounts (owner) VALUES ('alice');
Enter fullscreen mode Exit fullscreen mode

The targetList of the Query tree produced by the analyzer contains only the single entry owner = 'alice'. The id, balance, created_at, and deleted_at are missing. The rewriter fills the empty slots as follows.

  • id is defined as GENERATED ALWAYS AS IDENTITY, a SQL-standard identity column. PostgreSQL creates a dedicated sequence (a catalog object called accounts_id_seq) for such columns and, on every INSERT, draws the next integer from that sequence and fills the column with it. What the rewriter places into this slot is the expression node that retrieves the next value from the sequence (NextValueExpr).
  • balance has DEFAULT 0 written in the column definition. That definition is stored as an expression in the pg_attrdef catalog, and the rewriter pulls the expression out of there and places it into the slot.
  • created_at follows the same route, and the now() function call expression goes into its slot.
  • deleted_at has no default. The rewriter does not place any expression into this slot.

After three columns get filled, the targetList ends up shaped like (NextValueExpr node, 'alice', 0, now(), empty) across five positions. The user's query had only one column (owner), but the query PostgreSQL ends up looking at has four columns filled with expressions and one (deleted_at) left as an empty slot. The decision of what to fill in each column (or whether to leave it empty) belongs to a single responsibility function (build_column_default); when an expression is produced, it gets wrapped in a new TargetEntry and inserted into the targetList.

The decision follows three priority steps. First, if the column is an identity column, the node that retrieves the next value from a sequence. Second, if the column definition contains DEFAULT <expression>, that expression. Third, if neither applies but the column's data type itself has a default, that type default. In the example above, id took the first path, while balance and created_at took the second. deleted_at falls through all three.

Columns without a default: how the empty slot resolves to NULL

When all three priority steps fail, build_column_default returns NULL, and the rewriter does not create a TargetEntry for that column in the INSERT targetList. The attno slot itself exists, since the table has five columns by definition, but one of those slots (here, deleted_at) is simply left empty rather than carrying a TargetEntry. This might look like it breaks the promise stated earlier ("the query the rule sees is in the standard shape with every column sitting in its slot"). If the slot is empty, surely the rule pointing to NEW.deleted_at should get the wrong value or trigger an error.

That is not how it works. PostgreSQL has been designed so that every stage downstream of the empty slot interprets it as NULL in a consistent way. A concrete scenario makes this clear. Suppose another rule is also attached to accounts.

CREATE RULE forward_deleted_at AS ON INSERT TO accounts
    DO ALSO INSERT INTO sync_queue (account_id, deleted_at)
                         VALUES (NEW.id, NEW.deleted_at);
Enter fullscreen mode Exit fullscreen mode

This rule references NEW.deleted_at. But the normalized targetList of the user's INSERT INTO accounts (owner) VALUES ('alice') does not contain a TargetEntry for deleted_at (no default, so no expression was placed there). When the rule mapping step looks into the trigger query's targetList for the deleted_at slot, it does not find one.

At this point PostgreSQL's rule mapping substitutes a NULL constant in place of the missing TargetEntry. So the deleted_at slot of the sync_queue INSERT receives NULL. The rule's promise is upheld in a normal way. Pointing to NEW.deleted_at simply produces NULL; there is no rule breakage, no neighboring column getting accidentally dragged into the slot.

The base table side works the same way. When the planner produces the plan for the INSERT and sees no TargetEntry for deleted_at in the normalized targetList, it fills that slot with a NULL constant to complete the row. The row that lands in accounts ends up as (1, 'alice', 0, now(), NULL).

At both of those sites, the sync_queue INSERT produced by rule mapping and the base accounts row produced by the planner, the NULL is filled in the same way. There is no inconsistency where one site produces NULL and the other produces some other value. That is why the rewriter does not bother building an explicit NULL expression for the targetList. Inserting a NULL TargetEntry and leaving the slot empty have the same end result, so PostgreSQL chose to leave the slot empty and keep the tree lighter.

The DEFAULT keyword arrives as a SetToDefault placeholder and gets resolved

The previous section covered automatic filling when the user omits a column. The user can also write the DEFAULT keyword explicitly to say "fill this slot with the default expression."

INSERT INTO accounts VALUES (DEFAULT, 'bob', DEFAULT, DEFAULT);
Enter fullscreen mode Exit fullscreen mode

When this SQL passes through the analyzer, each DEFAULT written by the user becomes a SetToDefault placeholder node in the Query tree. SetToDefault is just a temporary dummy placed where an expression node belongs; it is not an executable expression. The node definition comment in PostgreSQL says this directly.

/*
 * Placeholder node for a DEFAULT marker in an INSERT or UPDATE command.
 *
 * This is not an executable expression: it must be replaced by the actual
 * column default expression during rewriting.
 */
Enter fullscreen mode Exit fullscreen mode

The expression that replaces a SetToDefault is built by the same responsibility function shown earlier. The same three-step priority applies, so each DEFAULT slot receives the corresponding identity / pg_attrdef / type-default expression. The core mechanism is the same up to this point. What differs is where the placeholder lives in the tree, and that depends on the form of the INSERT.

Single-row INSERT is substituted directly in the targetList

For the single-row INSERT case above, since there is only one row, the four values (DEFAULT, 'bob', DEFAULT, DEFAULT) go straight into the Query's targetList. The three SetToDefault nodes sit at positions 1, 3, and 4 of the targetList. The rewriter walks the targetList column by column, and whenever it encounters a SetToDefault, it substitutes the expression produced by the responsibility function in place. In terms of the Query tree structure from 1.3.1, one row in the targetList represents the expression for one column, and that expression flips from SetToDefault to the actual default expression.

Multi-row INSERT runs through a VALUES RTE in between

For multi-row INSERT, which inserts several rows in one statement, the Query tree shape is slightly different.

INSERT INTO accounts VALUES (DEFAULT, 'bob',   DEFAULT, DEFAULT),
                            (DEFAULT, 'carol', 500,     DEFAULT),
                            (DEFAULT, 'dave',  DEFAULT, '2026-01-01');
Enter fullscreen mode Exit fullscreen mode

With three rows, the targetList cannot hold three rows at once (targetList is shaped as one expression per column). Instead, PostgreSQL bundles the rows into a separate node and attaches it to the Query tree's from-list. That node is called a VALUES RTE. RTE is short for RangeTblEntry (introduced in 1.2.3), a single data-source unit referenced from the from-list (a table, a sub-query, a function, and here, the VALUES bundle). A VALUES RTE holds a list of rows internally, and each row in turn holds a list of values. The seven SetToDefault placeholders from the INSERT above are scattered across positions 1, 3, and 4 of the three rows inside the VALUES RTE.

Why does the tree shape differ for what looks like the same INSERT? A single-row INSERT is the case where producing one row is the entire result of the Query, while a multi-row INSERT is closer to "produce a set of rows and pour them into the table." PostgreSQL handles the latter the same way it handles a regular sub-query. A VALUES RTE sits in the same kind of slot a SELECT's result set would, an "inline data source." That is how INSERT INTO accounts SELECT ... and INSERT INTO accounts VALUES (...), (...) end up sharing the same rewriter and planner code paths.

The multi-row path walks each value of each row inside the VALUES RTE, and whenever it sees a SetToDefault, it substitutes the expression produced by the same responsibility function. The expression produced is identical to the single-row case; only the location it lands in differs.

There is one small optimization. A VALUES RTE may carry 100 or 1,000 rows. If none of them contain a SetToDefault, the rewriter skips the expensive list rebuild entirely. Rather than reconstructing all those rows, it passes the original list through to the next stage. The reason PostgreSQL has this fast path is the practical observation that multi-row INSERT is typically used for bulk loads, where the DEFAULT keyword almost never appears.

UPDATE only tidies what the user wrote; DELETE leaves the targetlist alone

UPDATE goes through the same normalization function as INSERT, but it does much less work than INSERT does. There is no need to look for missing columns to fill. The SET clause of an UPDATE only puts the columns the user explicitly listed into the targetList, and the columns the user did not list must keep the existing row's values. So there is nothing to fill in for empty slots. The two things the rewriter does for UPDATE are: first, resolve any DEFAULT keyword the user wrote into the actual expression (the same SetToDefault substitution mechanism as in INSERT); second, combine separate-line updates of parts of the same column into a single expression. The second item needs a brief explanation.

A column in PostgreSQL can be more than a simple scalar (int, text, etc.); it can be a composite type or an array. You can write a SET clause that updates only part of such a column. For example, if address is a composite-type column shaped like (street, city, zip), you can update two different fields of the same column in one UPDATE.

UPDATE accounts
   SET address.street = '101 Pine St',
       address.city   = 'Seattle'
   WHERE id = 1;
Enter fullscreen mode Exit fullscreen mode

Or you can update different subscripts of an array column tags:

UPDATE accounts
   SET tags[2] = 'premium',
       tags[4] = 'verified'
   WHERE id = 1;
Enter fullscreen mode Exit fullscreen mode

The two SET items look like the same column name (address, tags) appearing twice in the SQL text, but they really refer to different parts of the column. PostgreSQL combines the two lines into a single expression that operates on the column as a whole. address.street = '101 Pine St' followed by address.city = 'Seattle' becomes "take the original address, change street first, then take the result and change city" as one expression. As a result, even though the column name appears twice in the SET clause, at execution time the column is updated once.

When the entire column is assigned twice (UPDATE t SET address = X, address = Y, replacing the whole column twice), that is no longer a partial update but a genuine conflict, and the rewriter throws a syntax error. The combining only happens when the two SET items point to different parts of the same column.

DELETE is simpler still. In the DML branch, DELETE has a one-line case body that reads "Nothing to do here." DELETE produces no column values. It only decides which rows to remove, so there is nothing to do to the targetList itself. None of the things we have seen so far (INSERT's default filling, the DEFAULT keyword substitution for INSERT/UPDATE, UPDATE's partial-update combining) apply to DELETE. The amount of work the rewriter does is set by how much responsibility the user's query has for producing row data.

That covers the DML's targetlist normalization. We have walked through the filling-in-empty-slots half of DML rewriting. The other half, redirecting references to point at different places, is covered in 1.3.4 through RETURNING mapping and DML-on-view transformation.

What this means in practice

First, when an INSERT feels slow, default expressions are the first thing to suspect. Even if the user does not name a column, the default expression for it still runs on every INSERT. If an expression in pg_attrdef calls a function (now(), uuid_generate_v4()) or a sub-query, that function or sub-query runs once per missing column on every INSERT. Sequence consumption, statistics counters, side-effect function calls, all of these happen exactly as if the user had provided the value explicitly. When INSERT performance looks off, the fastest first check is to inspect what each default expression evaluates to via \d+ <table>. Avoiding heavy sub-queries in default expressions follows the same reasoning.

Second, RULE definitions can safely reference NEW.<column> even for columns without a default. With a rule like the forward_deleted_at example from the body text, there is no need to separately check whether the user named deleted_at. If they did, the value flows through as written; if they did not, NULL flows through. Missing data is uniformly carried as NULL, so the rule author does not have to branch on every possible combination of "which columns did the user actually write." This safety guarantee shows up often in INSERT triggers and audit rules.

Top comments (0)