Philip McClarence

Posted on Apr 23

The Complete Guide to PostgreSQL SQL Query Analysis & Optimization

#database #performance #postgres #sql

Most PostgreSQL performance work is wasted because it starts from the wrong end. Someone notices a slow query, skim-reads EXPLAIN, pattern-matches to "missing index," adds one, and moves on. Sometimes that works. Often it doesn't — and when it doesn't, the next attempt is usually an even blunter instrument: "just add more RAM," "just use a read replica," "just cache it."

This guide is a systematic alternative. The argument is that a large fraction of single-query latency problems in OLTP workloads fall into one of a small number of bottleneck categories, each with a characteristic EXPLAIN signature and a well-understood fix. (Lock contention, vacuum bloat, replication lag, and the generic-plan vs custom-plan behaviour of prepared statements are real and common, but they are cluster-level or protocol-level problems rather than single-plan problems; this guide is strictly about the latter.) If you can name the category in sixty seconds of reading the plan, the fix usually follows in minutes.

We'll work through the full workflow end-to-end on a real query against a real PostgreSQL 17 database, then map the eight bottleneck categories to the eight deep-dive articles that make up this series. Every EXPLAIN snippet below is captured from an actual run against a 500,000-row sim_bp_orders table on a Neon Postgres 17.8 database — not a synthetic example.

The workflow

Read the EXPLAIN plan — specifically the three signals that matter most: estimated-vs-actual row counts, access path at each scan node, and where time is actually spent.
Categorise the bottleneck — translate the plan signals into one of eight categories.
Apply the matching fix — index, rewrite, tune memory, or restructure the query.
Verify with a second EXPLAIN — before/after is how you know you actually fixed something.

That's it. The rest of this article walks through each step on a concrete example.

A typical slow query

Our running example is a dashboard query: "show me the fifty highest-value pending orders."

SELECT order_id,
       user_id,
       total_amount_cents,
       created_at
FROM sim_bp_orders
WHERE status = 'pending'
ORDER BY total_amount_cents DESC
LIMIT 50;

The table is 500,000 rows, with roughly 20% in status = 'pending'. There's a primary key on order_id, indexes on user_id and created_at, but no index on status or total_amount_cents. We've disabled parallel execution (SET max_parallel_workers_per_gather = 0) for this example so the plan reads cleanly. Here's the plan:

Limit  (cost=13270.89..13271.02 rows=50 width=20) (actual time=50.873..50.883 rows=50 loops=1)
  Buffers: shared hit=3689
  ->  Sort  (cost=13270.89..13521.64 rows=100300 width=20) (actual time=50.871..50.877 rows=50 loops=1)
        Sort Key: sim_bp_orders.total_amount_cents DESC
        Sort Method: top-N heapsort  Memory: 30kB
        Buffers: shared hit=3689
        ->  Seq Scan on sim_bp_orders  (cost=0.00..9939.00 rows=100300 width=20) (actual time=0.011..37.781 rows=100252 loops=1)
              Filter: ((status)::text = 'pending'::text)
              Rows Removed by Filter: 399748
              Buffers: shared hit=3689
 Planning Time: 0.073 ms
 Execution Time: 50.908 ms

Fifty-one milliseconds is not a disaster on its own. It's the kind of number that gets shrugged at until a hundred of these queries run concurrently on a busy application server, at which point CPU saturates and every request starts stacking.

Step 1 — Read the plan

Three signals tell you nearly everything about a plan node.

Signal 1 — how the table is accessed. At the leaf of this plan is Seq Scan on sim_bp_orders. A sequential scan means the planner's cost model decided reading every row was cheaper than any available index — sometimes because no useful index exists, sometimes because existing indexes don't match the query shape, occasionally because statistics are misleading the cost estimate. On small tables, or when the query needs a large fraction of the table anyway, a seq scan is often genuinely the cheapest plan. But on a 500k-row table with a selective filter and an ORDER BY ... LIMIT 50, it's the wrong shape.

Signal 2 — rows removed by filter. Rows Removed by Filter: 399,748, with Actual Rows: 100,252 matching. The scan touched every row in the table. The filter selectivity is ~20% — not pathological by itself — but 400,000 rows of pure waste every time the dashboard refreshes. An index on the filter column would let PostgreSQL skip them entirely.

Signal 3 — planner estimate vs reality. rows=100,300 estimated vs rows=100,252 actual. Essentially perfect. If the ratio had been ten-to-one or worse in either direction, the plan would be built on bad assumptions and ANALYZE would be the first move. Here, statistics are healthy.

There's a fourth node worth naming: the Sort above the scan is a top-N heapsort. Unlike a full sort, a top-N heapsort streams all input rows through a heap of size N (50 here) — it reads all 100,252 pending rows but only ever holds 50 in memory. That's why the Memory: 30kB is so small. Even so, it's 100,252 rows of unnecessary work: an index on (total_amount_cents DESC) WHERE status = 'pending' would let the planner walk the index from the largest value downward and stop after fifty entries.

Step 2 — Categorise the bottleneck

Once you've read the plan, map what you see to one of eight categories. Each category has a characteristic signature; each maps to a deep-dive article in this series.

Plan signal	Bottleneck category	Fix article
You can't even read the plan confidently	Plan literacy	Reading EXPLAIN / EXPLAIN ANALYZE Output
`Seq Scan` with large row counts, many `Rows Removed by Filter`	Missing or wrong index	Index Usage & Optimisation
Nested Loop joining large tables; Hash Join spilling to disk	Join strategy	Join Optimisation
`CTE Scan` feeding a filter; `SubPlan` running per outer row	Subquery / CTE structure	Subquery & CTE Optimisation
`HashAggregate` or `Sort` spilling, expensive window functions	Aggregate or window tuning	Aggregate & Window Function Tuning
Index exists but isn't being used; function on indexed column	WHERE clause shape	WHERE Clause Optimisation
Plan is "fine" but the query itself is the problem	Query rewriting	Query Rewriting Techniques
`SELECT *`, implicit casts, deep pagination, N+1 from ORM	Anti-pattern	Anti-Patterns & Common Mistakes

Our example plan maps cleanly. A Seq Scan with 400,000 rows removed by filter, sitting under an ORDER BY ... LIMIT that can't exploit any existing index, is the textbook signature for the Missing or wrong index category. The Sort above it is solvable in the same stroke — a single partial index can eliminate both the scan and the sort.

Step 3 — Apply the fix

The fix is a partial index, with the non-filter columns tucked into INCLUDE so the planner can serve the query from the index alone without touching the heap:

CREATE INDEX CONCURRENTLY idx_sim_bp_orders_pending_by_amount
    ON sim_bp_orders (total_amount_cents DESC)
    INCLUDE (order_id, user_id, created_at)
    WHERE status = 'pending';

Four non-obvious choices:

Partial index. Only pending orders are indexed, because that's the only status the query cares about. A full index on (status, total_amount_cents) would work too; it would contain roughly 5× more entries. Partial indexes only help queries whose WHERE clause implies the index's predicate — so if this dashboard later adds WHERE status IN ('pending', 'processing'), the planner will skip this index.
Sort direction in the index. Specifying total_amount_cents DESC means the planner can scan the btree in the direction that produces rows in the needed order without an explicit Sort node.
Tiebreaker. In a real dashboard you'd almost always want a tiebreaker column — ORDER BY total_amount_cents DESC isn't deterministic for ties, and two rows with equal totals would shuffle between pages; adding , order_id DESC to both the index and the query fixes that.
Covering (INCLUDE). The SELECT list is satisfied entirely from index tuples, which lets the planner serve the query as an Index Only Scan without heap fetches. Index Only Scan also requires the visibility map to mark the relevant heap pages all-visible, so on a write-heavy table where autovacuum can't keep up, you may still see heap fetches even with a covering index.

CREATE INDEX CONCURRENTLY avoids taking an ACCESS EXCLUSIVE lock on the table, so normal reads and writes continue while the index builds. It still takes weaker locks (SHARE UPDATE EXCLUSIVE) twice, waits for transactions that hold old snapshots on the target table to finish before advancing between phases, and runs two passes over the table — so it's slower than CREATE INDEX in wall-clock time, and a single long-running transaction that has touched this table can stall the build indefinitely. On a 500k-row table the build takes seconds; on a 500M-row table it can take hours. The application stays up the whole time. A partial index on status = 'pending' still pays write cost when rows are inserted into or updated out of that state — so if pending is a high-churn status, weigh the read win against the write overhead.

Step 4 — Verify

Same query, same data, index in place:

Limit  (cost=0.42..2.18 rows=50 width=20) (actual time=0.021..0.031 rows=50 loops=1)
  Buffers: shared hit=5
  ->  Index Only Scan using idx_sim_bp_orders_pending_by_amount on sim_bp_orders
        (cost=0.42..3544.68 rows=100300 width=20)
        (actual time=0.021..0.026 rows=50 loops=1)
        Heap Fetches: 0
        Buffers: shared hit=5
 Planning Time: 0.186 ms
 Execution Time: 0.045 ms

0.045 ms, down from 50.9 ms — roughly 1100× faster. Buffers dropped from 3,689 hit to 5 hit. The Sort node is gone entirely: the index is already sorted in the right order. The Filter line is gone: the partial index guarantees every row it contains already satisfies status = 'pending'. Heap Fetches: 0 means the visibility map covered every leaf page we touched, so PostgreSQL served all 50 tuples from the index without reading a single heap page.

Two caveats on the headline number. First, EXPLAIN ANALYZE's Execution Time measures server-side SQL execution only — it excludes network round-trip, client-side tuple deserialisation, and connection pool overhead. Real application latency for this query is probably closer to 2–10 ms depending on your region. Second, the measurement is on a hot cache with an immediately-post-vacuum visibility map; a colder cache would show Buffers: shared read=N instead of all hit. The meaningful improvement is the ~700× drop in buffer reads — that's what translates into lower CPU under concurrency.

The eight categories

The workflow above treats "spot the category" as a two-sentence step. In practice, each category has its own rules, exceptions, and non-obvious variants. The rest of this series is eight standalone articles, each diving into one category.

1. Reading EXPLAIN / EXPLAIN ANALYZE output

Before you can optimise a plan, you have to be able to read one. EXPLAIN reports the planner's estimated plan; EXPLAIN ANALYZE executes the query and reports what actually happened. The deep dive covers every common node type, the meaning of loops, Buffers, Memory, Workers Planned vs Launched, and the five most common ways to misread a plan. → Reading EXPLAIN / EXPLAIN ANALYZE Output.

2. Index usage and optimisation

A large share of single-query OLTP latency problems come down to indexing — either missing, or present but not matching the query shape. But "add an index" understates what's actually required: choosing columns in the right order, deciding between full and partial indexes, using INCLUDE for covering indexes, expression indexes for computed predicates, GIN/GiST/BRIN for the data types where btrees are wrong, and knowing when not to add one. → Index Usage & Optimisation.

3. Join optimisation

The planner picks between Nested Loop, Hash Join, and Merge Join based on cost estimates. Each has a regime where it's best, and the worst joins are the ones using the wrong strategy — usually a Nested Loop on two large tables. → Join Optimisation.

4. Subquery and CTE optimisation

PostgreSQL 12 changed CTE semantics — what used to always be materialised is now inlined by default, except when you ask for materialisation explicitly. That change made many old "CTE as optimisation fence" tricks silently stop working. → Subquery & CTE Optimisation.

5. Aggregate and window function tuning

GROUP BY and window functions look declarative, but the planner has strong opinions about how to execute them: HashAggregate versus GroupAggregate, partial and parallel aggregation, window frame optimisation. Sorts and hashes that spill to disk are almost always the visible symptom, and work_mem is almost always the knob. → Aggregate & Window Function Tuning.

6. WHERE clause optimisation

An index is only useful if the WHERE clause is sargable. Wrapping an indexed column in a function (lower(email) = '...'), doing implicit casts (varchar_column = 123), or comparing on the wrong side of an operator all silently disable indexes that look like they should apply. → WHERE Clause Optimisation.

7. Query rewriting techniques

Sometimes the plan is "fine" but the query itself is asking the wrong question. Correlated subqueries can usually become lateral joins; NOT IN with NULLs should be NOT EXISTS; offset pagination past a few hundred pages should be keyset pagination; DISTINCT over a large set is often GROUP BY in disguise. → Query Rewriting Techniques.

8. Anti-patterns and common mistakes

The final category is queries that are wrong by construction: SELECT * in hot paths, implicit type casts that silently disable indexes, missing LIMIT on exploratory joins, N+1 patterns coming out of ORMs, inserting one row at a time instead of batching. → Anti-Patterns & Common Mistakes.

Where to start

If you can already read EXPLAIN confidently, the highest-value articles are probably Index Usage and Query Rewriting, because those are where the largest wins hide. If reading the plans in this article felt like work, start with Reading EXPLAIN and come back here.

Slow queries are not mysterious. They fall into a small number of categories, each with a characteristic plan signature and a well-understood fix. Learn to recognise the signatures and most of the rest follows.

postgres #performance #database #sql

Canonical version with the full series linked: https://mydba.dev/blog/postgres-query-analysis-complete-guide

DEV Community