DEV Community

Cover image for Finding the Missing Indexes Your Queries Are Begging For
Philip McClarence
Philip McClarence

Posted on

Finding the Missing Indexes Your Queries Are Begging For

The most common source of missing indexes in production is not carelessness -- it is the development/production gap. A developer adds a new query, tests it against a development database with 1,000 rows where the sequential scan completes in 2ms, and ships it to production where the same table has 50 million rows. The sequential scan now takes 3 seconds. No errors, no crashes, just a slow endpoint that users tolerate until they stop tolerating it.

Finding that one missing index is easy once someone complains. Finding ALL the missing indexes across your entire database -- before users notice -- is the real challenge.

Identifying Tables That Need Indexes

Start by finding tables with disproportionate sequential scan activity:

SELECT
    schemaname, relname AS table_name,
    seq_scan, idx_scan,
    CASE WHEN (seq_scan + idx_scan) > 0
        THEN round(100.0 * seq_scan / (seq_scan + idx_scan), 1)
        ELSE 0
    END AS seq_scan_pct,
    seq_tup_read,
    pg_size_pretty(pg_relation_size(relid)) AS table_size
FROM pg_stat_user_tables
WHERE seq_scan > 100
  AND pg_relation_size(relid) > 10 * 1024 * 1024  -- tables > 10MB
ORDER BY seq_tup_read DESC
LIMIT 15;
Enter fullscreen mode Exit fullscreen mode

Key interpretation points:

  • Table size matters. A table with 95% sequential scans and 8KB is fine -- PostgreSQL correctly chooses seq scan when the whole table fits in a few pages. A 500MB table with the same ratio almost certainly needs an index.
  • Use seq_tup_read, not just seq_scan. A table might have few scan events, but each one reads millions of rows. Sort by tuples read to find the highest I/O impact.
  • Rule of thumb: any table over 100MB with more than 10% sequential scan access warrants investigation.

Finding WHICH Columns to Index

Knowing a table is being sequentially scanned is only half the problem. You need to identify the filter columns from the actual queries. Check pg_stat_statements:

SELECT query, calls, mean_exec_time, rows
FROM pg_stat_statements
WHERE query ILIKE '%orders%'
ORDER BY mean_exec_time * calls DESC
LIMIT 10;
Enter fullscreen mode Exit fullscreen mode

Then confirm with EXPLAIN:

EXPLAIN (COSTS, BUFFERS)
SELECT * FROM orders
WHERE customer_id = 12345 AND status = 'pending'
ORDER BY created_at DESC LIMIT 10;
Enter fullscreen mode Exit fullscreen mode

The output reveals which columns appear in Filter: conditions and Sort: operations -- these are your index candidates.

From Recommendation to Verification

Step 1: Create the index based on query pattern analysis

-- Column order matters: highest selectivity first,
-- equality filters before range, sort columns last
CREATE INDEX CONCURRENTLY idx_orders_customer_status_created
ON orders (customer_id, status, created_at DESC);
Enter fullscreen mode Exit fullscreen mode

Why this column order? customer_id leads because it has the highest selectivity (each customer has few orders relative to the total). status is second for the equality filter. created_at DESC is last to support the ORDER BY without an additional sort step.

Step 2: Verify the planner uses it

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders
WHERE customer_id = 12345 AND status = 'pending'
ORDER BY created_at DESC LIMIT 10;
Enter fullscreen mode Exit fullscreen mode

Look for Index Scan using idx_orders_customer_status_created. If you still see a sequential scan, update statistics:

ANALYZE orders;
Enter fullscreen mode Exit fullscreen mode

Step 3: Monitor over 24 hours

SELECT indexrelname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
WHERE relname = 'orders'
  AND indexrelname = 'idx_orders_customer_status_created';
Enter fullscreen mode Exit fullscreen mode

If idx_scan is climbing and the table's seq_scan count has slowed, the index is working. If idx_scan stays at zero, check for type mismatches between query parameters and indexed columns.

Preventing Missing Indexes

The best time to catch missing indexes is immediately after deployment:

  • Post-release check: After every release, compare pg_stat_user_tables for tables where seq_scan jumped relative to pre-deploy baseline.
  • Continuous monitoring: Track sequential scan ratios over time, not just during performance incidents.
  • Development guidelines: Require EXPLAIN output review for any new query on tables over 10 million rows. Catch the problem before it reaches production.

The development/production gap will always exist -- dev databases are small by necessity. The mitigation is catching missing indexes quickly after deployment, not hoping developers predict production query plans.

Full article with monitoring screenshots: Finding the Missing Indexes Your Queries Are Begging For

Top comments (0)