PostgreSQL Covering Indexes: Eliminate Heap Fetches with INCLUDE

#database #performance #postgres #sql

PostgreSQL Covering Indexes: Eliminate Heap Fetches with INCLUDE

You have an index on customer_id. Your query filters by customer_id and selects customer_name and customer_email. PostgreSQL finds the matching rows in the index (fast), then fetches each row from the heap table to get the name and email (slow). Those heap fetches are random I/O operations scattered across the table. On a 100M-row table returning 1,000 rows, that is 1,000 random reads -- and they dominate the query execution time. A covering index eliminates them entirely.

How Index Lookups Actually Work

Every standard B-tree index lookup is two steps:

Index scan: find matching row pointers (TIDs) in the index -- fast, sequential access
Heap fetch: retrieve actual row data from the heap table -- slow, random I/O

The heap fetch is the bottleneck. For single-row lookups it's barely noticeable. For queries returning hundreds or thousands of rows, it dominates execution time.

An index-only scan skips step 2 entirely. If all columns the query needs exist in the index, PostgreSQL reads everything from the index. No heap access, no random I/O.

The INCLUDE Clause (PostgreSQL 11+)

Before PostgreSQL 11, covering indexes required composite indexes on all columns: CREATE INDEX ON customers (customer_id, customer_name, customer_email). This works but has a cost -- the index maintains sort order on all three columns, wasting CPU on sorts for columns nobody searches or orders by.

INCLUDE adds columns to the index leaf pages without including them in the sort key:

CREATE INDEX CONCURRENTLY idx_customers_covering
    ON customers (customer_id)
    INCLUDE (customer_name, customer_email);

The key column (customer_id) is the search key for WHERE, ORDER BY, and joins. The included columns are stored alongside but not sorted -- they exist solely to enable index-only scans.

Detecting Covering Index Opportunities

Look for index scans with heap fetches in EXPLAIN output:

EXPLAIN (ANALYZE, BUFFERS)
SELECT customer_name, customer_email
FROM customers
WHERE customer_id BETWEEN 1000 AND 2000;

Watch for:

Index Scan using idx_customers_id -- the heap was visited for each row
Heap Fetches: 1000 -- 1,000 trips to the heap table
High Buffers: shared hit=... from random heap access

The goal: Index Only Scan with Heap Fetches: 0.

Find candidates system-wide:

SELECT
    schemaname,
    relname AS table_name,
    indexrelname AS index_name,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan > 100
ORDER BY idx_tup_fetch DESC
LIMIT 20;

Tables with high idx_tup_fetch are performing many heap fetches. Cross-reference with your most frequent queries.

Practical Examples

Dashboard Query

A reporting dashboard showing recent orders:

-- The query
SELECT order_id, customer_name, order_total, order_status
FROM orders
WHERE created_at >= now() - interval '7 days'
ORDER BY created_at DESC
LIMIT 50;

-- The covering index
CREATE INDEX CONCURRENTLY idx_orders_recent_covering
    ON orders (created_at DESC)
    INCLUDE (order_id, customer_name, order_total, order_status);

This single index handles the WHERE filter, ORDER BY, LIMIT, and returns all selected columns -- entirely from the index. No heap access.

INCLUDE vs Composite

-- INCLUDE: customer_name is retrieved but never searched
CREATE INDEX ON orders (order_date)
    INCLUDE (customer_name, order_total);

-- Composite: both columns are used in WHERE or ORDER BY
CREATE INDEX ON orders (customer_id, order_date);

Use INCLUDE when extra columns are only in the SELECT list. Use composite when columns appear in WHERE or ORDER BY.

The Vacuum Dependency

Index-only scans require pages to be marked "all-visible" in the visibility map. PostgreSQL can skip the heap only for these pages. Vacuum maintains the visibility map.

If vacuum falls behind:

Pages are not marked all-visible
The planner falls back to regular index scans with heap fetches
Your covering index provides zero benefit

-- Check visibility map health
SELECT
    relname,
    n_live_tup,
    n_dead_tup,
    last_autovacuum,
    last_vacuum
FROM pg_stat_user_tables
WHERE relname = 'customers';

If last_autovacuum is stale and dead tuples are accumulating, tune autovacuum to run more frequently on that table. A covering index without healthy vacuum is a wasted investment.

Prevention

When writing a new query that selects specific columns from an indexed table, ask: "Can I add these columns to the index with INCLUDE to eliminate heap fetches?"

Keep covering indexes focused. Don't add every column -- include only the columns your most frequent queries select. Different queries needing different columns should get targeted covering indexes, not one massive index.

Monitor heap fetch counts over time. A query showing Heap Fetches: 0 today may regress if vacuum falls behind or if a new column is added to the SELECT list. Track idx_tup_fetch on important indexes -- a sudden increase signals regression.

Review covering indexes when query patterns change. Application refactors that add columns to SELECT clauses break index-only scans silently -- the query works, but performance drops.

Originally published at mydba.dev/blog/postgres-covering-index