Philip McClarence

Posted on Apr 13

PostgreSQL Partial Indexes: Targeted Indexing for Faster Queries

#database #performance #postgres #sql

PostgreSQL Partial Indexes: Targeted Indexing for Faster Queries

A PostgreSQL partial index is an index with a WHERE clause that only indexes rows matching a condition. Instead of indexing every row in the table, you index exactly the subset your queries need.

Despite being one of PostgreSQL's most powerful features, partial indexes are wildly underused. Most developers learn about B-tree indexes and never discover the WHERE clause option.

The Problem with Full Indexes

Consider a 50-million-row orders table. An index on status includes all 50 million entries -- even though 95% have status = 'completed' and your application only queries the 2.5 million active, pending, or failed orders.

Three problems compound:

Size waste: A full index might be 400 MB. The same index on just the non-completed rows: 20 MB. That 380 MB difference matters for buffer cache efficiency.

Write overhead: Every INSERT, UPDATE, DELETE touching an indexed column must update the full index, even for completed orders your queries never touch.

Missing constraints: How do you enforce "email must be unique among active users, but deleted users can share emails"? A regular unique index can't. A partial unique index can.

Finding Partial Index Candidates

Look for columns with heavily skewed value distributions:

-- Columns where one value dominates (>80% of rows)
SELECT
    tablename,
    attname AS column_name,
    n_distinct,
    most_common_vals,
    most_common_freqs
FROM pg_stats
WHERE schemaname = 'public'
    AND n_distinct BETWEEN 2 AND 20
    AND most_common_freqs[1] > 0.8
ORDER BY most_common_freqs[1] DESC;

Check that queries actually filter on the minority values:

SELECT
    substring(query, 1, 120) AS query_preview,
    calls,
    mean_exec_time AS avg_ms
FROM pg_stat_statements
WHERE query ILIKE '%WHERE%status%'
    AND calls > 100
ORDER BY total_exec_time DESC
LIMIT 10;

Find oversized indexes:

SELECT
    indexname,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    idx_scan AS scans
FROM pg_stat_user_indexes
WHERE pg_relation_size(indexrelid) > 10 * 1024 * 1024
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 20;

Creating Partial Indexes

Status column pattern

-- Instead of indexing ALL orders:
-- CREATE INDEX idx_orders_status ON orders (status);

-- Index only the rows your queries target:
CREATE INDEX CONCURRENTLY idx_orders_active_status
    ON orders (status)
    WHERE status IN ('pending', 'processing', 'failed');

Soft-delete pattern

-- Index active users only
CREATE INDEX CONCURRENTLY idx_users_active_email
    ON users (email)
    WHERE deleted_at IS NULL;

Partial unique constraint

-- Email unique among active users only
-- Deleted users can have duplicate emails
CREATE UNIQUE INDEX CONCURRENTLY idx_users_unique_active_email
    ON users (email)
    WHERE deleted_at IS NULL;

This works with upsert too:

INSERT INTO users (email, name, deleted_at)
VALUES ('user@example.com', 'Jane Smith', NULL)
ON CONFLICT (email) WHERE deleted_at IS NULL
DO UPDATE SET name = EXCLUDED.name;

The ON CONFLICT clause must exactly match the partial index's WHERE condition. Character-for-character. A mismatch produces a confusing error: "there is no unique or exclusion constraint matching the ON CONFLICT specification."

Boolean flag pattern

-- Index only unprocessed jobs (tiny fraction of the table)
CREATE INDEX CONCURRENTLY idx_jobs_unprocessed
    ON background_jobs (created_at)
    WHERE is_processed = false;

-- Index only active subscriptions
CREATE INDEX CONCURRENTLY idx_subscriptions_active
    ON subscriptions (customer_id, plan_id)
    WHERE is_active = true AND cancelled_at IS NULL;

Verifying the Planner Uses It

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE status = 'pending';

You should see Index Scan using idx_orders_active_status. If you see a sequential scan, the query's WHERE clause doesn't match the index condition closely enough for the planner to prove compatibility.

The planner can only use a partial index when the query condition implies the index condition. WHERE status = 'pending' implies WHERE status IN ('pending', 'processing', 'failed'), so it works. WHERE status != 'completed' might not be recognized as equivalent, depending on the planner.

Size Comparison

The benefit scales with selectivity:

Rows Indexed	Index Size Reduction	Write Overhead Reduction
5% of table	~95%	Very significant
20% of table	~80%	Significant
50% of table	~50%	Moderate
80% of table	~20%	Minimal -- use regular index

If the WHERE clause covers most of the table, a regular index is simpler and performs just as well.

Prevention: Building It Into Your Process

When you add a status column, a soft-delete flag, or any column with a skewed distribution, consider the partial index in the same migration. Don't wait until the table has millions of rows.

Use naming conventions that document the scope: idx_orders_active_status or idx_users_pending_email immediately communicate which rows are indexed. Avoid idx_orders_status_partial -- the name should describe what's indexed, not that it's partial.

Watch for the ON CONFLICT gotcha. If you plan to use a partial unique index with upserts, make sure the team knows the WHERE must match exactly.

Test with production-scale data. A partial index on 1,000 rows in dev doesn't prove much. The performance difference becomes dramatic at millions of rows.

DEV Community

PostgreSQL Partial Indexes: Targeted Indexing for Faster Queries

PostgreSQL Partial Indexes: Targeted Indexing for Faster Queries

The Problem with Full Indexes

Finding Partial Index Candidates

Creating Partial Indexes

Status column pattern

Soft-delete pattern

Partial unique constraint

Boolean flag pattern

Verifying the Planner Uses It

Size Comparison

Prevention: Building It Into Your Process

Top comments (0)