Covering Indexes and Index-Only Scans: The Read Win You Are Missing

#postgres #database #backend #performance

Book: Database Playbook: Choosing the Right Store for Every System You Build
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

Your index is used. EXPLAIN says Index Scan. The query is still slow.

You check the plan again. The index lookup is fast. The time goes somewhere after it, on a line that reads Heap Fetches or just shows a fat Buffers: shared hit number you didn't expect. The index found the rows. Then Postgres went back to the table to read the columns you actually asked for.

That second trip is the heap fetch. For a query that returns three columns out of a 40-column table, you're paying a random page read per row to grab data the index could have carried itself. Covering indexes remove that trip. The plan flips from Index Scan to Index Only Scan, and the heap stays cold.

This is one of the cheapest read wins in Postgres, and most teams never reach for it because the Index Scan line already looked like a success.

What an index actually stores

A B-tree index stores the indexed columns plus a pointer to the row in the heap (the ctid). When you query columns that aren't in the index, Postgres walks the tree to find matching rows, then follows each pointer back to the table to read the rest.

Take an orders table and the common dashboard query:

CREATE INDEX idx_orders_customer
ON orders (customer_id);

SELECT customer_id, total_cents, status
FROM orders
WHERE customer_id = $1;

The index has customer_id and the row pointer. It does not have total_cents or status. So Postgres finds every matching ctid in the index, then reads each of those rows from the heap to get the other two columns. If a customer has 4,000 orders scattered across the table, that's up to 4,000 random heap reads.

The index did its job. The heap fetch is the cost nobody budgeted for.

INCLUDE: carry the extra columns in the index

Postgres 11 added INCLUDE columns. They live in the leaf pages of the B-tree but aren't part of the key, so they don't affect ordering or uniqueness. They're along for the ride, available to the planner without a heap trip.

CREATE INDEX idx_orders_customer_covering
ON orders (customer_id)
INCLUDE (total_cents, status);

Now the index holds everything the query reads: customer_id in the key, total_cents and status as payload. Postgres can answer the whole SELECT from the index pages. That plan is an Index Only Scan.

You could also put those columns in the key:

CREATE INDEX idx_orders_customer_key
ON orders (customer_id, total_cents, status);

This also produces an index-only scan, but it makes the index bigger in the inner nodes and changes the sort order. Use the key position only when you filter or sort on those columns too. When they're purely output, INCLUDE is the cleaner choice: smaller tree, same read win.

Confirm it with EXPLAIN

Never assume the plan. Run it. The flag that matters is BUFFERS, and the line that matters is Heap Fetches.

Before the covering index:

EXPLAIN (ANALYZE, BUFFERS)
SELECT customer_id, total_cents, status
FROM orders
WHERE customer_id = 4823;

Index Scan using idx_orders_customer on orders
  Index Cond: (customer_id = 4823)
  Buffers: shared hit=12 read=3891
Planning Time: 0.1 ms
Execution Time: 14.7 ms

Read 3,891 buffers for a few thousand matching rows. That's the heap. Each random read is a page the index sent you back to fetch.

After creating idx_orders_customer_covering:

EXPLAIN (ANALYZE, BUFFERS)
SELECT customer_id, total_cents, status
FROM orders
WHERE customer_id = 4823;

Index Only Scan using idx_orders_customer_covering on orders
  Index Cond: (customer_id = 4823)
  Heap Fetches: 0
  Buffers: shared hit=46
Planning Time: 0.1 ms
Execution Time: 1.2 ms

Two things changed. The scan type is now Index Only Scan. And Heap Fetches: 0 confirms Postgres never touched the table. Buffer reads dropped from thousands to dozens because the index pages already had every column.

The Heap Fetches line is the honest metric. An index-only scan with high heap fetches is an index-only scan in name only.

The visibility-map caveat nobody mentions first

Here's where index-only scans surprise people. The plan can say Index Only Scan and still read the heap. You'll see it as a non-zero Heap Fetches.

The index doesn't store row visibility. Postgres uses MVCC, so a given row version might be visible to your transaction or not, and that information lives in the heap tuple, not the index. To answer a query purely from the index, Postgres needs another way to know a row is visible to everyone.

That's the visibility map. It's a bitmap with two bits per heap page. One of them marks a page as all-visible: every tuple on that page is visible to all current transactions. When a query's matching rows live on all-visible pages, Postgres trusts the map and skips the heap. When a page isn't marked all-visible, it has to fetch the tuple to check. That fetch is what Heap Fetches counts.

The visibility map is maintained by VACUUM. A page becomes all-visible after vacuum processes it. So right after a burst of writes, the pages holding your new rows are not all-visible yet, and an index-only scan on them degrades into ordinary heap fetches until vacuum catches up.

-- after a heavy insert/update batch, force the map current
VACUUM (ANALYZE) orders;

You can watch the gap directly:

SELECT relname,
       n_live_tup,
       n_dead_tup,
       last_autovacuum
FROM pg_stat_user_tables
WHERE relname = 'orders';

If last_autovacuum is old and n_dead_tup is climbing, your index-only scans are quietly paying heap fetches. On a high-churn table this is the difference between the plan you tested at 2am on a quiet table and the plan you got under production write load. Tune autovacuum to run more aggressively on tables you depend on for index-only scans:

ALTER TABLE orders SET (
  autovacuum_vacuum_scale_factor = 0.02,
  autovacuum_vacuum_insert_scale_factor = 0.02
);

Lower scale factors make autovacuum fire sooner, keeping more pages all-visible and more of your scans heap-free.

When a covering index pays off, and when it doesn't

The win is largest for queries that return few columns from a wide table and match many rows per lookup. Dashboard aggregates, lookup-by-foreign-key, hot read paths an ORM hits on every request. Those are the ones where the heap fetch dominates.

It's not free. Every INCLUDE column is copied into the index, so the index grows and every write that touches those columns updates the index too. A covering index on a column that changes on every UPDATE adds write amplification you'll feel. Put stable, read-heavy columns in INCLUDE. Don't drag a frequently-mutated blob along for a read you run once a day.

A few rules that hold up:

Cover the columns a hot query reads, not every column in the table.
Keep filter and sort columns in the key; keep pure-output columns in INCLUDE.
Check Heap Fetches: 0 in EXPLAIN (ANALYZE, BUFFERS), not just the scan type.
Keep autovacuum healthy on tables you serve index-only scans from.
Drop the plain index the covering one replaces, so you're not paying for both.

A covering index is the same row of data, stored once more in a shape that answers your read without a second trip. When the read matters and the columns are stable, it's close to free performance. When you skip the BUFFERS check, it's a plan that looks fixed and isn't.

If this was useful

This post pulled from the indexing chapter of the Database Playbook: Choosing the Right Store for Every System You Build. The book covers covering indexes alongside the rest of the read-path toolkit: partial indexes, expression indexes, the visibility map's role in vacuum tuning, and how all of it shifts when you move off Postgres onto a store with a different storage engine. If you've ever shipped an index and watched the query stay slow, the indexing chapter is the longer version of why.