JoongHyuk Shin

Posted on May 4 • Edited on May 22

1.1.1 Life of a Query

#postgres #database #internals #protocol

This section is the map for the rest of the book. The five stages introduced in the 1.1 chapter overview (message dispatch, parser + analyzer, rewriter, planner, executor) are traced here through the actual code: which functions implement each stage, and in what order they get called. The mechanics of each of the five stages are unpacked in later chapters. Here, only the skeleton matters: how a backend starts up, how it receives messages, and where the first fork in the road appears.

One term to settle first. In this book, a statement refers to a single unit of SQL command that the client sends to the backend. One SELECT * FROM users, one INSERT INTO users ..., each is one statement. A single request may carry several statements joined by ;, and in that case each is handled as a separate statement. Every section from here on uses "statement" in this sense.

One backend process owns one query

Every time a client connects, PostgreSQL forks a backend process for it (the parent is postmaster). That process stays alive until the client disconnects, and it handles every query that client sends, by itself. Unlike the thread-pool model common in other RDBMSs, PG uses one OS process per connection. The reasons behind that decision are taken up in 6.1.1.

The actual entry point of that backend is a function called PostgresMain. The name is grand; what it does is unexpectedly simple. Two things, then off it goes.

First, it installs signal handlers. Signals are asynchronous notifications the OS delivers to a process (for example, SIGTERM is a request to shut down, SIGUSR1 is for PG-internal communication). A backend has to react to signals from postmaster and from other backends, so each signal is wired to a handler ahead of time. Signals and IPC in general are covered in 6.3.

Second, it initializes the transaction system. Every SQL statement in PG, even without an explicit BEGIN, runs inside some transaction. The transaction system is the core PG machinery that tracks BEGIN/COMMIT boundaries, MVCC visibility, XID assignment, and so on. Transactions and MVCC are the subject of all of chapter 3. For now, it's enough to know that this machinery is set up before the backend ever sees a SQL statement.

Once those two preparations are done, the real work of the backend begins. An infinite loop.

for (;;)
{
    ...
    ReadyForQuery(whereToSendOutput);
    ...
    firstchar = ReadCommand(&input_message);
    ...
    switch (firstchar)
    {
        case PqMsg_Query:        // 'Q', simple query
            exec_simple_query(query_string);
            break;
        case PqMsg_Parse:        // 'P', extended: parse
            exec_parse_message(...);
            break;
        case PqMsg_Bind:         // 'B', extended: bind
            exec_bind_message(&input_message);
            break;
        case PqMsg_Execute:      // 'E', extended: execute
            exec_execute_message(portal_name, max_rows);
            break;
        case PqMsg_Sync:         // 'S', end of an extended cycle
            finish_xact_command();
            send_ready_for_query = true;
            break;
        ...
    }
}

This loop is the entire life of a backend.

"Announce that I'm ready, read one message, dispatch on its type." Repeat forever. When the client closes the connection, an 'X' (Terminate) message arrives, the loop exits, and the process dies.

The first fork in the road is visible right here. There's the 'Q' path and the 'P' / 'B' / 'E' path. That split is the difference between the simple query protocol and the extended query protocol.

Simple vs extended

Simple is the case where a single message contains the SQL text in full. Type SELECT 1; into psql and hit enter, and that's what flies across the wire. The backend receives that one message and runs the full five-stage cycle (dispatch, parser + analyzer, rewriter, planner, executor) before returning the result.

Extended does the same job but splits it into four messages ('P', 'B', 'E', 'S'). Splitting the stages opens up plan reuse, parameter safety, and pipelining. The semantic differences between the two protocols and how they play out in practice are unpacked in section 1.1.2.

Optimizable vs utility

Everything described so far assumes optimizable statements: SELECT/INSERT/UPDATE/DELETE. These have paths to optimize. The planner decides between sequential and index scan, hash join and nested loop, one join order or another.

But statements like CREATE TABLE, VACUUM, SET, and BEGIN (the utility statements) are different. There's nothing for a cost model to optimize. They're DDL or system commands, with no path to choose. In that case the planner produces only an empty shell of a plan and hands the actual work to a utility-statement handler. The executor never gets called on this path.

The detailed branching is the subject of 1.1.3. The takeaway here is just one thing: not every query in PG goes through the planner.

The big picture

We can now compress the journey of a SQL line into a single diagram.

Client
   │
   │  'Q' (or 'P' + 'B' + 'E')
   ▼
PostgresMain main loop
   │
   ▼
exec_simple_query
   │
   ├─ pg_parse_query           → raw parse tree     (1.2.1, 1.2.3)
   │
   ├─ pg_analyze_and_rewrite   → list of Query nodes (1.2.2, 1.3)
   │
   ├─ pg_plan_queries          → execution plan      (1.4 chapter)
   │     └─ utility produces an empty shell          (1.1.3)
   │
   ├─ PortalStart + PortalRun  → tuple pulling       (1.5)
   │
   └─ PortalDrop + finish_xact_command
   │
   ▼
ReadyForQuery → back to the top of the loop

Each box in this diagram corresponds to a chapter in the book. 1.2 is parser and analyzer, 1.3 is rewriter, 1.4 is planner, 1.5 is the executor. All of part 1 is essentially one zoomed-in view of this diagram.

Working on another RDBMS engine, I once found this aspect of PG surprising. PG accepts a multi-statement query like SELECT 1; SELECT 2; as a single simple-query message. What's even more surprising is the transaction handling. Without an explicit BEGIN/COMMIT, all those statements get bundled into a single implicit transaction block, and if even one of them fails, the whole batch rolls back.

At first I assumed this was just standard behavior. Comparing the client protocols of other major databases made it clear this is a PG-specific decision. MySQL has CLIENT_MULTI_STATEMENTS off by default, so multi-statement queries are simply rejected (you have to flip the flag explicitly because of SQL injection risk). Even with the flag on, statements are processed sequentially, and because autocommit is the default, each one commits as its own transaction. Oracle accepts only one statement per OCI call, so to bundle multiple statements you have to wrap them in an anonymous PL/SQL block (BEGIN ... END;). SQL Server accepts multiple statements in a T-SQL batch, but atomic handling still requires an explicit BEGIN TRANSACTION. None of the three does what PG does: bundle automatically as soon as the message arrives.

What this means in practice

This five-stage skeleton turns out to be the foundation for two diagnostic tools you'll use in operations.

First, you can see exactly where EXPLAIN's output comes from. EXPLAIN runs only as far as stage 4 (plan); it skips stage 5 (execute). EXPLAIN ANALYZE actually runs through stage 5 and measures it. That's why EXPLAIN ANALYZE produces real load and shouldn't be casually run in production: an EXPLAIN ANALYZE UPDATE ... actually updates rows. The familiar BEGIN; EXPLAIN ANALYZE UPDATE ...; ROLLBACK idiom exists for exactly this reason.

Second, the fact that one backend means one process means one query at a time explains why connection pooling matters so much. A backend's main loop is essentially single-threaded. While one client runs a long query, that backend can't do anything else. Connection counts therefore drive memory and scheduling costs linearly, and a pooler like PgBouncer becomes effectively mandatory. The answer to "why are PostgreSQL connections so expensive?" lives inside this one-line main loop.

DEV Community