Philip McClarence

Posted on May 1

PostgreSQL WHERE Clause Optimization

#database #postgres #sql #performance

The single question that decides whether an index helps your query is: can the planner match the WHERE clause against the index? If the answer is yes, you get an index or bitmap scan and the query returns quickly. If the answer is no — because you wrapped the indexed column in a function, used an implicit cast, or combined conditions with OR in a way the planner can't decompose — the index is silently unused and the table is sequentially scanned.

The catch is that "the planner can match the predicate" isn't a yes-or-no rule; it's a long list of conditions. This article is the sixth in the Complete Guide to PostgreSQL SQL Query Analysis & Optimization series and covers the conditions most often violated in production SQL. Every EXPLAIN block is captured from the series' Neon Postgres 17.8 database.

Sargable predicates — the rule in one sentence

A predicate is sargable (Search ARGument ABLE) when it compares an indexed column, or a leading prefix of an indexed expression, against a constant or parameter — without wrapping the indexed value in a function the planner can't invert. The term isn't formal PostgreSQL terminology, but it's the right mental model: sargable ⇒ indexable; non-sargable ⇒ sequential scan, no matter how many indexes you add.

The canonical non-sargable predicate is a function on the column:

-- Not sargable — the index on email can't help.
SELECT * FROM sim_bp_users WHERE lower(email) = 'user42@example.com';

An index on email doesn't help here because the planner's test is lower(email) = constant, and the index doesn't store lower(email). The fix is either:

Normalise on write. Store emails lowercased; query against the raw column. Most applications should have been doing this anyway.
Expression index on the function. CREATE INDEX ON sim_bp_users (lower(email)) — the index stores the lowercased value, and lower(email) = 'x' becomes sargable against it.
citext extension. A case-insensitive text type with its own operator class. Indexes on citext columns work for equality and pattern operators; which exact cases are index-usable depends on the operator class and the collation semantics. citext is usually the cleanest solution for "case-insensitive equality everywhere" in the schema; for prefix-heavy workloads, an expression index with text_pattern_ops (covered in the index usage article) is often a better fit because its semantics are simpler.

A real capture shows the difference. Non-sargable lower(email) LIKE 'user12%' against 200k rows:

Parallel Seq Scan on sim_bp_users
    Filter: (lower((email)::text) ~~ 'user12%'::text)
    Rows Removed by Filter: 94444
 Execution Time: 122.833 ms

Sargable email LIKE 'user12%' with the existing text_pattern_ops index:

Index Only Scan using idx_sim_bp_users_email_pattern on sim_bp_users
    Index Cond: ((email ~>=~ 'user12'::text) AND (email ~<~ 'user13'::text))
    Heap Fetches: 0
 Execution Time: 24.757 ms

Same data, same 20-row output, 5× faster — and the ratio widens with table size.

Implicit casts that silently disable indexes

PostgreSQL's type system is strict, but it will coerce types when the operator allows it. The implicit coercion happens at the constant side of the comparison usually, which is safe. When it happens at the column side, it's a silent index-bypass:

-- Sargable — PG casts '123' to int; index on int_col still applies.
SELECT * FROM t WHERE int_col = '123';

-- Not sargable — PG casts text_col to int, wrapping the column.
SELECT * FROM t WHERE text_col = 123;

In the second form, the planner sees int_col_cast(text_col) = 123 and the cast prevents the index on text_col from matching. The fix is usually "use the right type in the query," but occasionally a text column genuinely needs to index-match integer literals — in which case, an expression index on the cast solves it: CREATE INDEX ON t ((text_col::int)). Rare, but real.

More insidious: the varchar(N) ↔ text case. status varchar(20) is indexed; the query does WHERE status = 'pending'. PostgreSQL picks the right operator and the index is used. Change the column type to citext or an application-specific domain, and operator resolution can pick a different candidate — sometimes applying a cast on the column side and silently disabling the index. Schema-type changes are a plan-breaking migration; re-run EXPLAIN on the key queries after any column-type change.

The leftmost-prefix rule (quickly, with the consequences)

A composite btree on (a, b, c) helps queries that use:

a = ?
a = ? AND b = ?
a = ? AND b = ? AND c = ?
a = ? AND b < ?
a = ? AND b = ? ORDER BY c

It does not help queries that use only b or only c, or only the range portion of a leading column plus equality on a trailing one. The planner can use a prefix of the index's columns starting from the leading one.

The practical implication: for a composite index, put equality predicates first and range/ORDER BY columns last. An index on (tenant_id, created_at) serves a tenant-scoped time-range filter cleanly; (created_at, tenant_id) forces a seq scan for the same query on a specific tenant.

A common mistake is trying to "cover multiple access patterns with one composite index." If the app filters sometimes by status, sometimes by user_id, and sometimes by both, neither (status, user_id) nor (user_id, status) serves both single-column filters efficiently. You usually want two single-column indexes — the planner will combine them with a BitmapAnd when both are filtered — or one composite index plus one lone single-column index on whichever column is the more common filter in isolation.

OR across indexed columns — the BitmapOr pattern

OR in a WHERE clause used to be a classic "can't use an index" gotcha. Modern PostgreSQL handles the common case well via BitmapOr. Each branch of the OR produces a bitmap from its respective index; the bitmaps are unioned; a single heap scan visits only the matching pages:

SELECT user_id, email
FROM sim_bp_users
WHERE email = 'user42@example.com'
   OR username = 'user42';

Bitmap Heap Scan on sim_bp_users
  Recheck Cond: (((email)::text = 'user42@example.com'::text)
              OR ((username)::text = 'user42'::text))
  ->  BitmapOr
        ->  Bitmap Index Scan on idx_sim_bp_users_email_pattern
              Index Cond: ((email)::text = 'user42@example.com'::text)
        ->  Bitmap Index Scan on idx_sim_bp_users_username_pattern
              Index Cond: ((username)::text = 'user42'::text)
 Execution Time: 4.406 ms

4.4 ms. Both branches of the OR hit an index; the BitmapOr merges the two TID bitmaps (automatically deduplicating tuple IDs that appeared in both branches, since the bitmap is a set structure indexed by TID); the Bitmap Heap Scan visits each matched page once, rechecks the combined condition, and emits matching rows. No rewrite needed.

OR becomes a problem when only some of the branches are indexable, or when the branches match most of the table. In those cases the planner often falls back to a seq scan because the total estimated cost of two bitmap scans + union + recheck is similar to a single scan. If the optimizer picks a seq scan for an OR you thought would hit an index, check each branch individually — the non-sargable one is usually the culprit.

OR → UNION ALL — when the planner won't decompose

For the classic "OR across tables" case — WHERE t.x = 1 OR u.y = 2 in a join — the planner can't always produce a BitmapOr because the two sides are in different relations. The rewrite:

-- Before: OR across joined tables.
SELECT o.order_id
FROM sim_bp_orders o JOIN sim_bp_users u ON u.user_id = o.user_id
WHERE o.status = 'pending' OR u.status = 'suspended';

-- After: UNION (not UNION ALL — we want deduplication).
SELECT o.order_id
FROM sim_bp_orders o JOIN sim_bp_users u ON u.user_id = o.user_id
WHERE o.status = 'pending'
UNION
SELECT o.order_id
FROM sim_bp_orders o JOIN sim_bp_users u ON u.user_id = o.user_id
WHERE u.status = 'suspended';

Each branch of the UNION is a separate query the planner can optimise independently — one can use an index on o.status, the other on u.status, and the dedup at the top removes overlap. This only wins when both branches are individually selective; if one branch matches most of the table, UNION isn't faster.

UNION vs UNION ALL matters for correctness: UNION dedupes (expensive if the output is large and has many overlaps); UNION ALL doesn't (faster, but returns duplicate rows for the overlap). Default to UNION if you're rewriting an OR to preserve equivalent semantics.

`LIKE '%needle%'` — leading wildcards

Standard btree indexes can only help LIKE when the pattern has a fixed prefix. LIKE 'user12%' is range-scannable (with text_pattern_ops or C collation); LIKE '%user12%' isn't — there's no way to translate it into a range on a sorted index.

The fix is a trigram index (pg_trgm extension, GIN):

CREATE EXTENSION pg_trgm;
CREATE INDEX idx_users_email_trgm ON sim_bp_users USING gin (email gin_trgm_ops);

-- Now this becomes a GIN index scan instead of a seq scan:
SELECT * FROM sim_bp_users WHERE email LIKE '%user12%';

Trigram GIN indexes store overlapping 3-character substrings of the text. The query engine decomposes %user12% the same way and looks up candidate rows in the index. The match set is usually narrow enough that the subsequent heap scan is cheap, even though it has to re-verify each candidate against the full pattern.

GIN indexes have write amplification — inserts are roughly 3× slower than for a btree, and updates trigger full re-indexing of the changed row. Use trigram GINs sparingly on high-write tables.

`IS NULL`, `IS NOT NULL`, and three-valued logic

IS NULL is sargable against a btree and against most specialised indexes. column IS NULL can use an index if the index covers nulls (the default for btrees), producing a fast point-scan of the null rows. This is worth knowing because "find the records that haven't been processed yet" is a common pattern on append-mostly tables.

The failure mode is <> with nullable columns. status <> 'completed' excludes rows where status is NULL — NULL is not-equal to everything but also not-not-equal. If you actually want "all rows where status is not completed or is unknown," you have to write it explicitly: status <> 'completed' OR status IS NULL, or status IS DISTINCT FROM 'completed' (which treats NULL as a value).

NOT IN (SELECT ...) on a nullable inner column is the same trap at a higher level: if any row in the subquery has NULL for the compared column, NOT IN returns no rows at all. Use NOT EXISTS (see the subquery/CTE article) unless you've proven the inner column is NOT NULL.

Function calls on the constant side — safe

The non-sargable warning applies to functions on the column side, not the constant side. WHERE created_at > now() - interval '1 day' is sargable because now() - interval '1 day' evaluates to a constant (once per query), and the planner compares the indexed created_at to that constant.

The subtlety is that functions marked VOLATILE (like random()) can't be evaluated once and cached; they're re-evaluated per row, which changes the plan in surprising ways. User-defined functions default to VOLATILE unless you explicitly mark them STABLE or IMMUTABLE. If you're calling a UDF in a WHERE clause that should be constant for the duration of the query, mark it STABLE — otherwise the planner treats it as volatile and loses optimisation opportunities.

Partial index predicate implication

Partial indexes have their own sargability requirement, in addition to the usual one: the query's WHERE clause must imply the partial index's predicate, from the planner's perspective. The planner uses a built-in theorem prover on predicates, which handles equality, inequality, and simple boolean structure. It doesn't handle:

Function calls (WHERE lower(status) = 'pending' won't match a partial index on WHERE status = 'pending' because the function disables the implication).
OR-wrapped forms that don't obviously decompose.
Casts that the theorem prover doesn't recognise as reversible.

When a partial index isn't being used, the most common reason is that the query's predicate isn't obviously implying the partial predicate. Rewrite the query to match the partial predicate as literally as possible.

A diagnostic recipe

When an index exists but a query isn't using it:

Look at the Filter: line in EXPLAIN. If the filter mentions the indexed column with any function around it, that's the non-sargable form. Rewrite.
Check column types match literal types. WHERE int_column = '123' is fine; WHERE text_column = 123 casts the column and loses the index.
Check the Index Cond: line for the expected index. If the index is available but the plan shows Filter: instead of Index Cond:, the planner decided the predicate couldn't use the index — look for functions or casts on the column.
Try SET enable_seqscan = off; just for the session. The resulting plan tells you what the planner would use if forced. If it's still a seq scan or a bizarre fallback, the predicate is genuinely unindexable.
For partial indexes, read the partial predicate carefully. The query's WHERE clause has to imply it literally, not just semantically.

Next steps

When the predicates are right but the query itself is structured awkwardly, the next article — Query Rewriting Techniques — covers the systematic transformations that turn expensive SQL into cheap SQL without changing results: DISTINCT → GROUP BY, keyset pagination, batch operations, and the other rewrites every production SQL writer eventually needs.

postgres #performance #database #sql

Originally published at mydba.dev/blog/postgres-where-clause-optimization.

DEV Community

PostgreSQL WHERE Clause Optimization

Sargable predicates — the rule in one sentence

Implicit casts that silently disable indexes

The leftmost-prefix rule (quickly, with the consequences)

OR across indexed columns — the BitmapOr pattern

OR → UNION ALL — when the planner won't decompose

`LIKE '%needle%'` — leading wildcards

`IS NULL`, `IS NOT NULL`, and three-valued logic

Function calls on the constant side — safe

Partial index predicate implication

A diagnostic recipe

Next steps

postgres #performance #database #sql

Top comments (0)

Sargable predicates — the rule in one sentence

Implicit casts that silently disable indexes

The leftmost-prefix rule (quickly, with the consequences)

OR across indexed columns — the BitmapOr pattern

OR → UNION ALL — when the planner won't decompose

LIKE '%needle%' — leading wildcards

IS NULL, IS NOT NULL, and three-valued logic

Function calls on the constant side — safe

Partial index predicate implication

A diagnostic recipe

Next steps

postgres #performance #database #sql

`LIKE '%needle%'` — leading wildcards

`IS NULL`, `IS NOT NULL`, and three-valued logic