JoongHyuk Shin

Posted on May 14 • Edited on Jun 21

1.3.1 Rule System and View Expansion

#postgres #database #internals #query

This section is about how PostgreSQL makes a view reference disappear. When a user writes SELECT * FROM my_view, the planner no longer sees my_view in that query. As soon as PostgreSQL receives the query, it moves the SELECT definition that the view name points to into that spot. Where and how that substitution happens, and how the fact that the mechanism rides on PostgreSQL's old RULE system surfaces in application code, is the subject of this section.

View and materialized view: what is the difference

PostgreSQL has two kinds of views: the plain view and the materialized view.

A plain view holds no data. A view definition is a SELECT line (or a larger query), and every time you SELECT from the view, you get back the result of running that definition SELECT freshly. The view itself has no storage. Only the view definition is registered in the catalog, and each time a query comes in, that definition gets expanded to read data from the base tables. The subject of this section is the expansion of plain views.

A materialized view stores its data. It runs the view definition's SELECT once, keeps the result on disk, and later SELECTs read that stored result directly. When the base table data changes, the materialized view's result is not refreshed automatically; the user has to explicitly run REFRESH MATERIALIZED VIEW to recompute it. So the rewriter leaves materialized views alone, without expanding them.

Aspect	Plain view	Materialized view
Data storage	None	Yes (on disk)
Behavior on SELECT	Runs definition SELECT each time	Reads stored result
When updated	Reflects base table changes immediately	Updated only on `REFRESH`
Rewriter handling	Expanded into definition SELECT	Left as is

With this picture in mind, it becomes clear that this section deals with how PostgreSQL moves the definition SELECT into place for plain views.

PostgreSQL's RULE system

PostgreSQL has a mechanism called the RULE system. It is the system responsible for rewriting the query itself, right after the query arrives at the backend and before it is actually executed. As noted in the 1.3 chapter introduction, the rewriter is the stage that changes the form of a query while preserving its meaning, and the RULE system is the tool that decides which rules to apply for that form change. (Transformations that change the meaning itself, such as subquery unnesting or predicate push-down, are the planner's responsibility in PostgreSQL and are covered in chapter 1.4.)

A rule is a bundle of four elements.

Element	Meaning	Example
Target relation	Which table/view it applies to	the `users` table
Event type	Which command triggers it	one of INSERT, UPDATE, DELETE, SELECT
Condition	Under what condition it fires (optional)	`WHERE NEW.role = 'admin'`
Action	What it does when fired	replace the original command with another query, or run an additional query

This bundle is stored as one row in the pg_rewrite catalog. A user can create a rule directly with the CREATE RULE command, and PostgreSQL also registers one rule automatically when a view is created.

Comparing it to the trigger, which does a similar job, makes the position of the RULE clear. A trigger acts at the executor stage, hooking into each individual row. This row was inserted, so add a row to the audit table, that kind of thing. A rule rewrites the query as a whole at the rewriter stage. Query A came in, so rewrite it as query B and run that, that kind of thing. If a trigger is a "per-row side effect," a rule is a "per-query transformation."

Here is one example of a rule a user might define directly. Suppose you want to record into an audit table every time an INSERT happens on the users table.

CREATE RULE log_user_insert AS
    ON INSERT TO users
    DO ALSO INSERT INTO user_audit_log (user_id, action, ts)
            VALUES (NEW.id, 'INSERT', now());

DO ALSO means "run the original INSERT as is, and additionally run this audit INSERT too." With this rule in place, when you fire INSERT INTO users (id, name) VALUES (1, 'alice'), the result of passing through the rewriter is two Query trees. One is the original INSERT into users, the other is the INSERT into user_audit_log. Both queries go through the planner and executor and get executed. This is what it looks like for one SQL text to fan out into multiple statements as it passes through the rewriter. (DO INSTEAD replaces the original query; DO ALSO, or the default, adds to the original query.)

A user-defined rule is one use case of the RULE system. But the rule you encounter most often in ordinary applications is not one a user made directly. It is the SELECT rule that PostgreSQL registers automatically when you create a view.

A view is a special case of the RULE system

Creating a view in PostgreSQL means registering two things at once.

One is the registration of an empty table. When you run CREATE VIEW my_view AS SELECT ..., PostgreSQL adds a row to pg_class. It gets an OID like an ordinary table, and column definitions go into pg_attribute too. The only difference is that the relkind column is marked 'v' (view). Looking at just this row, a view is an empty table. No file is even created to store data.

The other is the registration of a SELECT substitution promise. PostgreSQL adds a row to the pg_rewrite catalog, and that row is the promise: "when this view is queried with SELECT, instead of returning an empty result, run the SELECT written in the view definition and return that result." The formal name of this promise is the ON SELECT DO INSTEAD SELECT rule. Unpacked word by word:

ON SELECT: which event it fires on. When a query that reads this relation with SELECT comes in.
DO INSTEAD: how the original action is handled. Instead of the original action (querying this empty view as is).
SELECT ...: what gets run. The SELECT written in the view definition.

Putting the three parts together gives the promise: "when this view is queried with SELECT, do not return 0 rows from the empty table; run the view definition's SELECT and return that result." PostgreSQL does not keep a view as a separate data structure; it expresses the concept of a view with one empty table and this one-line promise. Mapping it to the general rule definition from the previous section: the view's automatic rule has the view itself as target, SELECT as event, no condition, and the execution of the definition SELECT as action.

This automatic SELECT rule for views has two constraints. There is exactly one ON SELECT rule per view, and that rule fires unconditionally. A single line of PostgreSQL code captures this fact directly.

"RIR" stands for "Retrieve-Instead-Retrieve", that is an ON SELECT DO INSTEAD SELECT rule (which has to be unconditional and where only one rule can exist on each relation).

The abbreviation RIR shows up all over the PostgreSQL code, and its identity is in this one line. "retrieve" is the SELECT keyword from the POSTQUEL era, and its trace remains in the abbreviation. Two historical names you will often encounter in PostgreSQL material are worth a brief note.

POSTGRES (all caps): the name of the database project started at UC Berkeley in 1986. Led by Michael Stonebraker as the follow-on research project, it meant the next generation after Ingres.
POSTQUEL: the name of the query language that the POSTGRES project used. It was designed as the successor to QUEL (Ingres's query language). It is not SQL.

When SQL was added in 1994 (the name at that point was Postgres95), POSTQUEL was replaced by SQL, and in 1996 the project renamed itself to PostgreSQL, but traces of that era remain in abbreviations like RIR and in some function names.

The rewriter works in two steps

The rewriter has a single entry point function, but inside it two steps run in sequence.

Step 1: applying the INSERT/UPDATE/DELETE rules that the user defined with CREATE RULE. The audit rule we saw earlier fires at this step. One input query can fan out into zero or several queries.
Step 2: applying RIR rules. That is, firing the view's automatic ON SELECT rule to expand a view reference into its definition SELECT. It is applied to each of the queries produced by Step 1.

What the separation of the two steps means is clear. View expansion always happens last. Whether a user-defined rule fans one query into several or changes it into something else, if a view reference remains in the result, it all gets resolved in Step 2. By the time it reaches the planner, no view reference remains anywhere.

Step 2 does one thing. It walks the query's range table, and when it meets a view reference, it expands the view's definition SELECT into that spot.

The expansion mechanism resolves cleanly thanks to the unification of the RTE data structure we saw in chapter 1.2. In the Query that the analyzer produced, the view reference spot holds an RTE of kind RTE_RELATION (that is, an RTE pointing at an ordinary table or view). After passing through rewriter Step 2, that spot becomes kind RTE_SUBQUERY (an RTE that holds a sub-Query inside it), and the view definition's SELECT sits inside it. The item at slot N of the range table just changes kind and stays in the same spot.

Suppose a view v_active_users is defined as SELECT id, name FROM users WHERE deleted_at IS NULL. Here is what the one line SELECT * FROM v_active_users looks like before and after the rewriter.

Slot 1 of the range table stays in place. Only the kind changed from RTE_RELATION to RTE_SUBQUERY, and the view definition sits inside it whole. Thanks to this unification, the planner, the stage after the rewriter, does not know the concept of a view at all. The Query the planner receives is just an ordinary SELECT with a sub-query expanded into a slot. This is why, when you look at EXPLAIN output, the view name disappears and base table names show up directly.

A view definition can contain another view inside it. In that case the expanded sub-Query is recursively passed to the same function to resolve the inner view too. The problem is when a view references itself (either directly or through another view). Left as is, that becomes an infinite recursion of expanding and expanding again. To prevent this, PostgreSQL carries a list of the OIDs of views currently being processed, and when it meets a view already in that list, it throws the error infinite recursion detected in rules for relation "..." and aborts query execution. It chose to cut things off with a clear error rather than spin forever.

Step 2 exceptions: materialized views and EXCLUDED

The Step 2 view expansion logic has two explicit exceptions.

First, materialized views are not expanded. As we saw earlier, a materialized view reads its stored result directly, so there is no need to expand its definition on every query. Step 2 passes an RTE with relkind 'm' straight through.

Second, the virtual name EXCLUDED in INSERT ... ON CONFLICT is not expanded. To understand EXCLUDED, we need to briefly touch PostgreSQL's UPSERT syntax.

PostgreSQL has the INSERT ... ON CONFLICT (...) DO UPDATE SET ... syntax. Commonly called UPSERT, it means "attempt the INSERT, but if it hits a unique constraint or similar and a conflict occurs, run an UPDATE in its place instead." For example, in a table that records a user's login count, you can handle "if it is a new user, add it; if it already exists, increment the count" in one query.

INSERT INTO users (id, name, login_count)
VALUES (1, 'alice', 1)
ON CONFLICT (id) DO UPDATE SET
    name = EXCLUDED.name,                   -- the incoming value 'alice'
    login_count = users.login_count + 1;    -- the existing row's value + 1

If a row with id = 1 already exists, the INSERT conflicts, and the UPDATE in the ON CONFLICT clause runs against that row. Here the UPDATE must be able to reference both rows: the row already in the DB (referenced as users.column) and the row that tried to come in via INSERT but was blocked (referenced as EXCLUDED.column). The name EXCLUDED carries the meaning "the values excluded because of the conflict." In the example above, EXCLUDED.name is 'alice', and users.login_count points at the existing count stored in the DB.

EXCLUDED is neither a real table nor a view. It is a virtual row name that exists only inside the UPSERT syntax. Internally PostgreSQL does represent EXCLUDED as an RTE_RELATION RTE, but the view expansion logic must not wrongly fire and try to expand this RTE (there is no definition to expand). So when the rewriter meets this RTE, it skips it with a separate branch.

The RULE system has many pitfalls and is not recommended

The fact that view expansion rides on the RULE system is a trace of PostgreSQL's history. The RULE system existed from the POSTGRES era, and views were implemented as one branch of that system. The idea was that if a user directly defined INSERT/UPDATE/DELETE rules on a view with CREATE RULE, that view could be used as an update target.

The RULE system itself is powerful but full of pitfalls. The same rule getting evaluated multiple times, cascading happening differently than intended, permission checks behaving subtly: problems come up often. In PostgreSQL 9.3, the automatically updatable view feature was introduced to sidestep those pitfalls. When a user fires INSERT/UPDATE/DELETE against a simple view (one that, say, just picks some columns from a single base table), PostgreSQL analyzes the view definition and automatically converts it into INSERT/UPDATE/DELETE against the base table, with no rule definition needed. For complex views that cannot meet the automatically-updatable conditions, solving it with an INSTEAD OF trigger is the standard approach today. PostgreSQL's official documentation recommends triggers over rules for INSERT/UPDATE/DELETE.

In many cases, tasks that could be performed by rules on
INSERT/UPDATE/DELETE are better done with triggers. Triggers are
notationally a bit more complicated, but their semantics are much
simpler to understand. Rules tend to have surprising results when
the original query contains volatile functions: volatile functions
may get executed more times than expected in the process of carrying
out the rules.

The ON SELECT rule generated automatically when a view is created is something users rarely deal with directly. PostgreSQL installs it, and the rewriter expands it. The case where a user faces the RULE system is usually when they want to make a complex view updatable that automatically updatable views cannot handle, and even then an INSTEAD OF trigger is safer and less work.

In other RDBMS engines, view expansion is usually handled as a separate stage. PostgreSQL's decision was to fold view expansion into one branch of general rule application. The result of that unification is that data structures and code paths gather in one place, but the cost is that the RULE system, an old abstraction, is exposed to users as is.

What this means in practice

When you change a view definition with ALTER, every query that calls that view gets expanded with the new definition at its next analysis. A view definition is stored as a SELECT rule in the catalog, and view expansion has the rewriter re-read that catalog entry at execution time and expand it. When the catalog changes, an invalidation message propagates to the plan cache, and on the next execution the result is planned with a fresh expansion using the new definition (see 1.1.2 plan cache). From the application side, this means you can use a view as a code abstraction. Just changing the view definition consistently updates the behavior of every query that uses that view from the next call onward.