DEV Community

Peter Radilov
Peter Radilov

Posted on

Why PostgreSQL Ignores Your Index

You created an index, but the query still does a Seq Scan? The cost in EXPLAIN is some mysterious number and you have no idea how to interpret it? Reads with an index are slower than without one? Let's figure out how PostgreSQL actually works with indexes — using hands-on examples with 4 million rows.

This is Part 2 of a series based on my internal PostgreSQL talks. I spent a long time working with other technologies, and when I came back to PostgreSQL, I revisited my own notes to refresh the fundamentals. The fundamental mechanics of indexes haven't changed, but some useful tools have appeared over the past 5 years — more on that at the end.

Setup: 4 Million Rows

For our experiments, let's create a table with no indexes and no primary key — 2 million records each for 'hans' and 'paul':

DROP TABLE IF EXISTS t_test;
CREATE TABLE t_test (id serial, name text);
INSERT INTO t_test (name) SELECT 'hans' FROM generate_series(1, 2000000);
INSERT INTO t_test (name) SELECT 'paul' FROM generate_series(1, 2000000);
SELECT name, count(*) FROM t_test GROUP BY name;
Enter fullscreen mode Exit fullscreen mode

No Indexes — Disaster

EXPLAIN ANALYZE SELECT * FROM t_test WHERE id = 432332;
Enter fullscreen mode Exit fullscreen mode
Seq Scan on t_test  (cost=0.00..71622.00 rows=1 width=9)
                    (actual time=15.304..126.473 rows=1 loops=1)
  Filter: (id = 432332)
  Rows Removed by Filter: 3999999
Planning Time: 0.561 ms
Execution Time: 126.488 ms
Enter fullscreen mode Exit fullscreen mode

~126 milliseconds. PostgreSQL scanned all 4 million rows, discarded 3,999,999, and found one. Everyone knows indexes will solve this, but before building them — we need to understand the query plan.

EXPLAIN and "Parrots"

Let's look at cost=0.00..71622.00 again — what is this number?

Let's disable parallelism for a clean experiment:

SET max_parallel_workers_per_gather TO 0;
Enter fullscreen mode Exit fullscreen mode

The cost is composed of the number of disk blocks and the cost of processing rows:

SELECT pg_relation_size('t_test') / 8192.0; -- ~21622 blocks of 8 KB
SHOW cpu_tuple_cost;    -- 0.01 (cost of processing a row)
SHOW cpu_operator_cost; -- 0.0025 (cost of an operator/function)
Enter fullscreen mode Exit fullscreen mode

The cost formula for our query:

SELECT (pg_relation_size('t_test') / 8192.0) * 1
       + count(id) * 0.01
       + count(id) * 0.0025
FROM t_test;
Enter fullscreen mode Exit fullscreen mode

This gives us ~71622 — roughly what EXPLAIN shows.

The cost in EXPLAIN has nothing to do with actual time. It's an abstract coefficient — "parrots," as we say in Russian. You can't convert it to milliseconds. Comparing costs between two different queries is also meaningless because it doesn't account for hardware. Within a single query, it's useful for understanding which part is more expensive. Between queries — only as a rough estimate.

Building an Index

CREATE INDEX idx_id ON t_test (id);
EXPLAIN SELECT * FROM t_test WHERE id = 43242;
Enter fullscreen mode Exit fullscreen mode
Index Scan using idx_id on t_test  (cost=0.43..8.45 rows=1 width=9)
  Index Cond: (id = 43242)
Enter fullscreen mode Exit fullscreen mode

The cost dropped from 71,622 to 8.45 — nearly 8,500x cheaper. Execution time goes from 126ms to sub-millisecond.

By default, PostgreSQL builds a BTree index. It provides high concurrency, but indexes aren't free — you pay with memory and slower INSERT operations.

Indexes and Sorting

BTree is used not just for lookups:

EXPLAIN SELECT * FROM t_test ORDER BY id DESC LIMIT 10;
Enter fullscreen mode Exit fullscreen mode
Limit  (cost=0.43..0.74 rows=10 width=9)
  ->  Index Scan Backward using idx_id on t_test  (cost=0.43..125505.43 rows=4000000 width=9)
Enter fullscreen mode Exit fullscreen mode

PostgreSQL doesn't sort — it simply walks the index in reverse (Backward) and stops after 10 rows.

EXPLAIN SELECT min(id), max(id) FROM t_test;
Enter fullscreen mode Exit fullscreen mode
Result  (cost=0.92..0.93 rows=1 width=8)
  InitPlan 1 (returns $0)
    ->  Limit  (cost=0.43..0.46 rows=1 width=4)
          ->  Index Only Scan using idx_id on t_test  (cost=0.43..128791.43 rows=4000000 width=4)
                Index Cond: (id IS NOT NULL)
  InitPlan 2 (returns $1)
    ->  Limit  (cost=0.43..0.46 rows=1 width=4)
          ->  Index Only Scan Backward using idx_id on t_test t_test_1  (cost=0.43..128791.43 rows=4000000 width=4)
                Index Cond: (id IS NOT NULL)
Enter fullscreen mode Exit fullscreen mode

For min — one row from the start of the tree. For max — one from the end (Backward). Total cost — less than one.

Bitmap Scans: Multiple Conditions

PostgreSQL can use the same index multiple times via OR:

EXPLAIN SELECT * FROM t_test WHERE id = 30 OR id = 50;
Enter fullscreen mode Exit fullscreen mode
Bitmap Heap Scan on t_test  (cost=8.88..16.85 rows=2 width=9)
  Recheck Cond: ((id = 30) OR (id = 50))
  ->  BitmapOr  (cost=8.88..8.88 rows=2 width=0)
        ->  Bitmap Index Scan on idx_id  (cost=0.00..4.44 rows=1 width=0)
              Index Cond: (id = 30)
        ->  Bitmap Index Scan on idx_id  (cost=0.00..4.44 rows=1 width=0)
              Index Cond: (id = 50)
Enter fullscreen mode Exit fullscreen mode

PostgreSQL scans the index twice (once per condition), merges the results via BitmapOr into a page bitmap, and only then goes to the table (Bitmap Heap Scan) for the actual rows.

Why PostgreSQL Ignores Your Index

Let's add an index on name:

CREATE INDEX idx_name ON t_test (name);
Enter fullscreen mode Exit fullscreen mode

Search for a non-existent name:

EXPLAIN SELECT * FROM t_test WHERE name = 'hans2';
Enter fullscreen mode Exit fullscreen mode
Index Scan using idx_name on t_test  (cost=0.43..8.45 rows=1 width=9)
  Index Cond: (name = 'hans2'::text)
Enter fullscreen mode Exit fullscreen mode

The index works. Notice: rows=1, even though 'hans2' doesn't exist in the table. PostgreSQL never estimates rows=0 — even for non-existent data it expects at least one row, otherwise there would be no point in attempting a scan.

Now let's search for something that covers almost the entire table:

EXPLAIN SELECT * FROM t_test WHERE name = 'hans' OR name = 'paul';
Enter fullscreen mode Exit fullscreen mode
Seq Scan on t_test  (cost=0.00..81622.00 rows=3000003 width=9)
  Filter: ((name = 'hans'::text) OR (name = 'paul'::text))
Enter fullscreen mode Exit fullscreen mode

The index is not used. When the query covers a large portion of the table (roughly 70%+), PostgreSQL decides it's cheaper to read everything from disk directly rather than bounce between the index and the table.

This is a key insight: execution plans depend on the data and are not static. The same query with different parameters can execute in completely different ways. The planner makes decisions based on statistics — what it knows about the data distribution. If the statistics are stale, the decision may be suboptimal.

The term for this is selectivity: what fraction of the table a condition affects. High selectivity (few rows) — an index is beneficial. Low selectivity (many rows) — Seq Scan is cheaper.

Physical Data Layout

Suppose we want to read a range: the first 10,000 records. With an index it's fast — if the data is stored sequentially on disk.

But what if the data is scattered?

CREATE TABLE t_random AS SELECT * FROM t_test ORDER BY random();
CREATE INDEX idx_random ON t_random (id);
VACUUM ANALYZE t_random;
Enter fullscreen mode Exit fullscreen mode

Let's compare:

-- Original table (data in order)
EXPLAIN (analyze true, buffers true) SELECT * FROM t_test WHERE id < 10000;
Enter fullscreen mode Exit fullscreen mode
Index Scan using idx_id on t_test  (cost=0.43..346.88 rows=10083 width=9)
                                   (actual time=0.010..1.219 rows=9999 loops=1)
  Index Cond: (id < 10000)
  Buffers: shared hit=3 read=82
Planning Time: 0.108 ms
Execution Time: 1.668 ms
Enter fullscreen mode Exit fullscreen mode

85 blocks (hit + read) — data is stored together, read sequentially. Now the randomized table:

-- Randomized table (data shuffled)
EXPLAIN (analyze true, buffers true) SELECT * FROM t_random WHERE id < 10000;
Enter fullscreen mode Exit fullscreen mode
Bitmap Heap Scan on t_random  (cost=196.41..18136.79 rows=10320 width=9)
                              (actual time=1.826..79.262 rows=9999 loops=1)
  Recheck Cond: (id < 10000)
  Heap Blocks: exact=7981
  Buffers: shared hit=801 read=7210
  ->  Bitmap Index Scan on idx_random  (cost=0.00..193.83 rows=10320 width=0)
        Buffers: shared hit=3 read=27
Planning Time: 0.289 ms
Execution Time: 79.843 ms
Enter fullscreen mode Exit fullscreen mode

8,011 blocks (shared hit + read) instead of 85 — nearly 100x more disk accesses. 79ms instead of 1.6ms. Same 10,000 rows, same index, but the data is scattered across disk. PostgreSQL even switched from Index Scan to Bitmap Heap Scan, because with low correlation that's cheaper.

Correlation

PostgreSQL knows about this through statistics:

SELECT tablename, attname, correlation
FROM pg_stats
WHERE tablename IN ('t_test', 't_random') AND attname = 'id'
ORDER BY 1, 2;
Enter fullscreen mode Exit fullscreen mode
 tablename | attname | correlation
-----------+---------+--------------
 t_random  | id      | -0.003704906
 t_test    | id      |            1
Enter fullscreen mode Exit fullscreen mode

Correlation measures how closely the physical order of data on disk matches the logical order:

  • ~1 — data is ordered, blocks are read sequentially (t_test)
  • ~0 — data is randomly scattered, each row means a separate disk access (t_random)

CLUSTER

You can physically sort the data by an index:

CLUSTER t_random USING idx_random;
VACUUM ANALYZE t_random;
Enter fullscreen mode Exit fullscreen mode

After this, range queries are fast again. But the downsides are significant:

  • Locks the entire table for the duration — even SELECT waits
  • Only works with one index
  • Not maintained automatically — after new inserts, data becomes unordered again

Index Only Scan

If a query only accesses columns that are in the index, PostgreSQL can skip the table entirely:

EXPLAIN SELECT id FROM t_test WHERE id = 34234;
Enter fullscreen mode Exit fullscreen mode
Index Only Scan using idx_id on t_test  (cost=0.43..8.45 rows=1 width=4)
  Index Cond: (id = 34234)
Enter fullscreen mode Exit fullscreen mode

Everything needed (id) is already in the index — no table access required. But if we request all columns:

EXPLAIN SELECT * FROM t_test WHERE id = 34234;
Enter fullscreen mode Exit fullscreen mode
Index Scan using idx_id on t_test  (cost=0.43..8.45 rows=1 width=9)
  Index Cond: (id = 34234)
Enter fullscreen mode Exit fullscreen mode

Now it's a regular Index Scan — the name column isn't in the index, so PostgreSQL goes to the table.

To cover all needed columns — use a covering index with INCLUDE:

CREATE INDEX idx_random_cover ON t_random (id) INCLUDE (name);
EXPLAIN SELECT * FROM t_random WHERE id = 34234;
Enter fullscreen mode Exit fullscreen mode
Index Only Scan using idx_random_cover on t_random  (cost=0.43..4.45 rows=1 width=9)
  Index Cond: (id = 34234)
Enter fullscreen mode Exit fullscreen mode

Now it's Index Only Scan again — even for SELECT *, because name is included in the index via INCLUDE.

Functional Indexes

You can build an index on the result of a function. The only requirement — the function must be deterministic (same input → same output, no side effects):

CREATE INDEX idx_cos ON t_random (cos(id));
EXPLAIN SELECT * FROM t_random WHERE cos(id) = 10;
Enter fullscreen mode Exit fullscreen mode
Index Scan using idx_cos on t_random  (cost=0.43..8.45 rows=1 width=9)
  Index Cond: (cos((id)::double precision) = '10'::double precision)
Enter fullscreen mode Exit fullscreen mode

A typical use case: an index on lower(email) for case-insensitive search.

Partial Indexes

If 99% of your data is 'hans' and 'paul', and you never search for them:

CREATE INDEX idx_name ON t_test (name) WHERE name NOT IN ('hans', 'paul');
Enter fullscreen mode Exit fullscreen mode

The index will be tiny and will only be updated when "atypical" values are inserted.

Index Types

PostgreSQL has 6 index types (SELECT * FROM pg_am). We've covered BTree — let's look at two more.

GiST and Fuzzy Search (pg_trgm)

CREATE EXTENSION IF NOT EXISTS pg_trgm;
Enter fullscreen mode Exit fullscreen mode

The pg_trgm extension splits strings into trigrams and computes the distance between them:

SELECT 'abcde' <-> 'abdeacb'; -- a number from 0 to 1
SELECT show_trgm('abcdef');
Enter fullscreen mode Exit fullscreen mode

Let's load 2,354 Austrian cities and try searching with a misspelling:

CREATE TABLE t_location (name text);
-- data import

SELECT * FROM t_location
ORDER BY name <-> 'Kramertneusiedel'
LIMIT 3;
Enter fullscreen mode Exit fullscreen mode

The strings 'Gramatneusiedl' and 'Kramertneusiedel' sound similar, and K instead of G is a common mistake. Without an index — full table scan. With a GiST index:

CREATE INDEX idx_trgm ON t_location USING GiST(name gist_trgm_ops);
Enter fullscreen mode Exit fullscreen mode

Now the search uses the index. A trigram index also works with LIKE and regular expressions:

EXPLAIN SELECT * FROM t_location WHERE name LIKE '%neusi%';
Enter fullscreen mode Exit fullscreen mode
Bitmap Heap Scan on t_location  (cost=4.31..18.42 rows=2 width=15)
  Recheck Cond: (name ~~ '%neusi%'::text)
  ->  Bitmap Index Scan on idx_trgm  (cost=0.00..4.31 rows=2 width=0)
        Index Cond: (name ~~ '%neusi%'::text)
Enter fullscreen mode Exit fullscreen mode

BTree is useless for LIKE '%...%' — it can't search for a substring in the middle. But GiST + pg_trgm handles it because it works with trigrams, not character ordering.

GIN and Full-Text Search

GIN indexes work with lexemes — canonical word forms:

SELECT to_tsvector('english', 'A car, I want a car. I would not even mind having many cars');
Enter fullscreen mode Exit fullscreen mode

'car' and 'cars' → same lexeme. 'want' and 'wanted' → same lexeme. This lets you find documents by any word form:

SELECT to_tsvector('english', 'A car, I want a car...')
    @@ to_tsquery('english', 'wanted');        -- true
SELECT to_tsvector('english', 'A car, I want a car...')
    @@ to_tsquery('english', 'wanted & bmw');   -- false
Enter fullscreen mode Exit fullscreen mode

Two approaches to setup:

1. Functional index:

CREATE INDEX idx_fts ON t_fts USING gin(to_tsvector('english', comment));
Enter fullscreen mode Exit fullscreen mode

2. Separate column with a trigger (faster for search, more expensive on insert):

ALTER TABLE t_fts ADD COLUMN ts tsvector;

CREATE TRIGGER tsvectorupdate
BEFORE INSERT OR UPDATE ON t_fts
FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger(ts, 'pg_catalog.english', 'comment');

CREATE INDEX idx_fts ON t_fts USING gin(ts);
Enter fullscreen mode Exit fullscreen mode

GiST and Exclusion Constraints

With the btree_gist extension, you can build constraints that prevent overlapping ranges — for example, for reservations:

CREATE EXTENSION btree_gist;

CREATE TABLE t_reservation (
    room int,
    "from" date,
    "to" date,
    EXCLUDE USING GiST (room WITH =, daterange("from", "to") WITH &&)
);
Enter fullscreen mode Exit fullscreen mode
INSERT INTO t_reservation VALUES (10, '2017-01-01', '2017-03-03'); -- OK
INSERT INTO t_reservation VALUES (13, '2017-01-01', '2017-03-03'); -- OK (different room)
INSERT INTO t_reservation VALUES (13, '2017-02-02', '2017-08-14'); -- ERROR! (overlap)
Enter fullscreen mode Exit fullscreen mode

PostgreSQL won't allow booking the same room for an overlapping period — the check is performed at the index level.

What Changed in PostgreSQL Over 5 Years

The fundamental mechanics of indexes haven't changed, but useful tools have appeared:

  • BRIN indexes (Block Range Index) deserve a special mention. For data with high correlation (a topic we covered above), BRIN can be orders of magnitude more compact than BTree. A table with a billion rows and timestamps is the ideal BRIN use case.
  • GIN for pg_trgm. This article showed GiST + pg_trgm, but starting with PG 13, GIN support for trigrams improved significantly. For large datasets, GIN + pg_trgm can be faster than GiST + pg_trgm.
  • REINDEX CONCURRENTLY (PG 12) — index rebuilding without an exclusive table lock.
  • pg_repack — an extension that does what CLUSTER does, but without locking the table. In production, CLUSTER is practically unusable, while pg_repack solves the same problem.
  • Parallel index builds became the norm. Parallel Seq Scan and Index Scan also improved significantly — disabling parallelism is only useful for demos, not for production.

This is Part 2 of the series. In Part 1, we covered transactions, locks, and VACUUM.

Top comments (0)