DEV Community

Cover image for Postgres Tells You Your Query Was Slow. Not Which Index Was Wasted.
Ali Afana
Ali Afana

Posted on

Postgres Tells You Your Query Was Slow. Not Which Index Was Wasted.

TL;DR: Postgres has pg_stat_user_indexes. It tells you how many times each index was scanned. It does not tell you whether the slow query you're chasing actually used the index you added for it, or whether you're maintaining indexes the planner never picks. I built a 3-file analyzer — a query wrapper, a logs table, a dashboard — and the first time I ran it against my own production database, 20 of my 51 indexes had never been scanned. 78% of my total index disk was being maintained for nothing.


The Gap in Postgres's Stats

Open pg_stat_user_indexes right now:

SELECT indexrelname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE schemaname = 'public';
Enter fullscreen mode Exit fullscreen mode

You'll see one row per index:

  • idx_scan — how many times the index was used
  • idx_tup_read — tuples read via the index
  • idx_tup_fetch — tuples fetched from the heap after the index hit

That's it.

You won't see:

  • Which queries used which index
  • Whether the slow query you wrote a CREATE INDEX for is actually using it
  • How much each unused index is costing you per INSERT
  • Whether the planner picked your composite index over a single-column one (and made the single-column one redundant)
  • Plan diffs when the same query starts going through a different index next week

For a hobby project, fine. For a production database with hot tables, you're guessing.

I'm building a multi-tenant AI sales chatbot. The schema has 51 indexes spread across stores, products, conversations, messages, leads, webhook logs, and the rest of the tables. Some I added intentionally. Some came along with migrations as scaffolding. Some I'd genuinely forgotten about. I had no idea which ones were earning their keep.

pg_stat_user_indexes told me idx_conversations_store_id had been scanned 13 times this month. That number was useless on its own. Was it the chat handler? The merchant dashboard? The webhook? Did the planner pick it because it was the only viable plan, or because a composite index that's already on the table would have done the same job for free? No way to know.

So I built my own observability. Three files. One afternoon.


File 1: The Wrapper (query-logger.ts)

The core idea: don't run queries directly. Run them through a wrapper that captures the plan, measures execution, and logs everything asynchronously.

Why Not pg_stat_statements or auto_explain?

I started with pg_stat_statements and auto_explain. The first gives you per-query stats but not plans — it tells you a query is slow without telling you which index the planner picked. The second writes plans to the Postgres log, which means parsing log files instead of querying a table. I wanted plans + dimensions in one row I could JOIN against pg_stat_user_indexes. Hence the wrapper.

EXPLAIN Without ANALYZE

EXPLAIN (FORMAT JSON) returns the planner's chosen plan without executing the query. Cost is sub-millisecond for most queries. It gives you a tree of nodes — Index Scan, Bitmap Heap Scan, Seq Scan — each tagged with the relation and the index it touches.

// Last checked against Postgres 16 — https://www.postgresql.org/docs/current/sql-explain.html
async function getPlan(sql: string, params: unknown[]) {
  const result = await db.query(`EXPLAIN (FORMAT JSON) ${sql}`, params);
  return result.rows[0]["QUERY PLAN"][0];
}

function extractIndexes(plan: any): string[] {
  const indexes: string[] = [];
  const walk = (node: any) => {
    if (node["Index Name"]) indexes.push(node["Index Name"]);
    (node["Plans"] ?? []).forEach(walk);
  };
  walk(plan["Plan"]);
  return indexes;
}

function planContainsSeqScan(plan: any): boolean {
  let found = false;
  const walk = (node: any) => {
    if (node["Node Type"] === "Seq Scan") found = true;
    (node["Plans"] ?? []).forEach(walk);
  };
  walk(plan["Plan"]);
  return found;
}
Enter fullscreen mode Exit fullscreen mode

The Wrapper

import { db } from "@/lib/db";
import { supabaseAdmin } from "@/lib/supabase/admin";

interface QueryMeta {
  endpoint: string;
  tableName: string;
  storeId?: string;
}

export async function loggedQuery(
  sql: string,
  params: unknown[],
  meta: QueryMeta
) {
  const plan = await getPlan(sql, params);

  const start = Date.now();
  const result = await db.query(sql, params);
  const duration = Date.now() - start;

  // Fire-and-forget log — never blocks the response
  supabaseAdmin
    .from("query_logs")
    .insert({
      query_hash: hash(sql),
      sql_preview: sql.slice(0, 200),
      table_name: meta.tableName,
      endpoint: meta.endpoint,
      store_id: meta.storeId,
      indexes_used: extractIndexes(plan),
      seq_scan: planContainsSeqScan(plan),
      planning_ms: plan["Planning Time"],
      execution_ms: duration,
      rows_returned: result.rowCount,
    })
    .then(() => {})
    .catch(() => {}); // Silent fail — monitoring never breaks the app

  return result;
}
Enter fullscreen mode Exit fullscreen mode

(hash(sql) is SHA-1 over the SQL string with $1, $2, etc. stripped via regex — "same query, different parameters" collapses into one group.)

The One Pattern That Matters: Fire-and-Forget

Same rule as every other observability layer:

.then(() => {}).catch(() => {}); // Silent fail
Enter fullscreen mode Exit fullscreen mode

The log insert is not awaited. If the database is overloaded, if the table is locked behind VACUUM, if the row blows up some constraint — the user-facing query still goes through.

In testing, the log insert takes 8–25ms. The actual query takes 5–800ms. If I awaited the log, on a cheap read I'd literally double the latency for zero user benefit.

Monitoring must never slow down the thing it's monitoring. That's the only rule that matters here.

There's a second cost worth naming: EXPLAIN plans the query, then the actual db.query plans it again. Two plans per measurement. For most queries it's microseconds. For planner-heavy queries with lots of joins, it adds up. Solution: sample. I run the wrapper on 1 in 10 queries, controlled by an env var. Enough signal, low overhead.


File 2: The Table (query_logs)

CREATE TABLE query_logs (
  id BIGSERIAL PRIMARY KEY,
  query_hash TEXT NOT NULL,
  sql_preview TEXT,
  table_name TEXT,
  endpoint TEXT,
  store_id UUID,
  indexes_used TEXT[],
  seq_scan BOOLEAN DEFAULT false,
  planning_ms NUMERIC(10,3),
  execution_ms INT,
  rows_returned INT,
  created_at TIMESTAMPTZ DEFAULT now()
);

CREATE INDEX idx_query_logs_hash ON query_logs(query_hash);
CREATE INDEX idx_query_logs_created ON query_logs(created_at);
CREATE INDEX idx_query_logs_indexes_used ON query_logs USING GIN (indexes_used);
Enter fullscreen mode Exit fullscreen mode

The columns are the dimensions. Each one answers a question Postgres's stats can't:

  • query_hash — group identical queries. The same chat-search query with different store_id is one logical query. Hash the SQL with parameters stripped.
  • indexes_used — array of index names the planner picked. The GIN index lets you ask "show me every query that touched idx_products_store_id" in milliseconds.
  • seq_scan — true if the plan contains a Seq Scan node. Fast filter for "queries that fell off the index entirely."
  • planning_ms + execution_ms — separate them. A query with 50ms planning and 5ms execution is a different problem from 5ms planning and 50ms execution.
  • rows_returned — combined with execution time, surfaces queries where the index scan retrieved 100k rows just to filter down to 12.

One detail that's easy to miss: indexes_used as TEXT[], not TEXT. A single query can scan three indexes (composite + bitmap OR + index-only scan). Store it as a comma-separated string and you'll spend the rest of your life writing LIKE '%idx_name%' queries. Use the array. Use the GIN index. Move on.


File 3: The Dashboard

The killer query — the one that makes this whole exercise worth the afternoon — is the join you've been waiting for. Indexes that exist in Postgres, never appear in any logged plan:

WITH plan_indexes AS (
  SELECT DISTINCT unnest(indexes_used) AS index_name
  FROM query_logs
  WHERE created_at > now() - interval '30 days'
)
SELECT
  s.indexrelname AS index_name,
  s.relname AS table_name,
  pg_size_pretty(pg_relation_size(s.indexrelid)) AS size,
  s.idx_scan AS pg_scan_count,
  CASE WHEN p.index_name IS NULL
    THEN 'NEVER PLANNED'
    ELSE 'used'
  END AS status
FROM pg_stat_user_indexes s
LEFT JOIN plan_indexes p ON p.index_name = s.indexrelname
WHERE s.schemaname = 'public'
ORDER BY pg_relation_size(s.indexrelid) DESC;
Enter fullscreen mode Exit fullscreen mode

That's it. That's the question Postgres can't answer alone: which of my indexes does the planner never pick over a real workload window?

The other views I added on top of query_logs:

  • Slowest query groupsquery_hash ordered by avg(execution_ms), with indexes_used displayed alongside. Now "this query is slow" becomes "this query is slow and it's using idx_X — is idx_X doing what I thought?"
  • Queries that fell to Seq ScanWHERE seq_scan = true, grouped by query_hash. Often the index you added doesn't match the predicate exactly (wrong column order, missing WHERE clause).
  • Index swap candidates — pairs of indexes where one is a strict prefix of another. The shorter one is usually dead weight.
  • Planning time spikes — queries where planning_ms > execution_ms. Almost always a sign the planner is fighting too many indexes on the table.

The UI is intentionally boring. Stat cards: total queries logged, distinct query shapes, % seq-scan, count of indexes never planned. A table per view. No charts that take longer to read than the underlying number.


What It Found

Real numbers, the day I built the dashboard:

Metric Value
Total indexes (public schema, excluding primary keys) 51
Indexes with idx_scan = 0 20
Indexes with idx_scan between 1 and 50 8
Total index disk 3,720 kB
Disk used by never-scanned indexes 2,896 kB (78%)

Read that last row again. Of the disk Postgres was using for indexes on this database, 78% of it was sitting on b-trees the planner had never once chosen.

Pre-launch is exactly the right time to build this lens. The 78% is real, and the makeup is honest: roughly half of the zero-scan indexes are on a paused workspace whose feature isn't running, four more are on Messenger-related tables still gated behind Meta's app review. Those will earn their keep eventually. The rest — and the boundary cases sitting at 2 or 14 scans — are the actual question. The dashboard's job today isn't to drop anything. It's to give me a queued list to revisit 30 days after the product takes real traffic, when "zero scans" means waste and not "feature hasn't shipped." That list took one afternoon to build. The point isn't the headline number — it's that without the lens, I couldn't have separated dormant from wasted at all.

The 2,552 kB Index Nobody Has Ever Used

The single most surprising finding: idx_products_embedding, the pgvector index for semantic search, is 2,552 kB on its own — 94% of the index disk on the products table, and around two-thirds of the entire database's index disk. The planner has never once chosen it.

Semantic search hasn't run at production volume yet — chat is gated to admins until Meta clears the Messenger app — so this isn't waste, it's a dormant feature. But that's exactly what makes the dashboard valuable. The day customers start chatting at scale, this index either lights up or it doesn't, and I'll know within hours whether semantic search is actually using it or quietly falling back to ILIKE.

The Barely-Used Tier

idx_conversations_store_id at 13 scans, idx_leads_store_id at 14, idx_products_status at 4, idx_webhook_logs_store_id at 2. These are the boundary cases — indexes the planner has picked once or twice and otherwise ignored. They're the exact set worth watching: some will graduate to actively used as traffic grows, others will sit at 14 scans for the next month and join the drop list.

That's the loop. pg_stat_user_indexes tells you how many times each index was scanned. It can't tell you whether each scan was the only way the query could have run, or whether the zero-scan indexes are dormant scaffolding or genuine waste. Without a dashboard like this you can't even ask the question.


5 Things I Learned Building This

1. Index stats don't equal index value

idx_scan = 0 is a candidate, not a verdict. On a mature database, it usually does mean drop. On a young one, it means "the planner has never picked this yet" — could be redundant, could be dormant scaffolding for a feature you haven't shipped. Either way, treat it as a question. And idx_scan = 50,000 doesn't mean an index is earning its keep either; if a sibling index would have been picked instead, the high scan count is just an artifact of which one the planner sorted first. Plans tell the truth. Stats tell you what the planner did, not what it could have done without you.

2. EXPLAIN without ANALYZE is your friend

EXPLAIN ANALYZE runs the query. EXPLAIN alone just plans it. The plan is what you usually want. Reach for ANALYZE when you specifically need actual row counts vs. estimates — but for "which index would the planner pick for this," EXPLAIN alone is enough and orders of magnitude cheaper.

3. Sample — don't measure every query

Wrap every query and your monitoring becomes a meaningful fraction of your DB load. For index usage — fundamentally a frequency question — 10% sampling captures ~99% of the signal at 10% of the cost. Confidence intervals on aggregate stats stay tighter than the noise floor you're chasing anyway. Tail-latency hunting is the exception: chasing the slowest 1% of queries needs higher sampling or full coverage. For "which indexes does the planner pick," 10% is plenty.

4. The dimensions are the product

Same lesson as every other observability piece I've written. query_logs only answers questions you thought to ask when you designed the schema. endpoint, table_name, seq_scan, indexes_used as a typed array — each column is a question you'll get to ask cheaply later. Add them when you build, not when you have a problem.

5. Indexes are a cost, not a feature

Every secondary index has to be updated on every INSERT and on every UPDATE that touches its columns. On a hot table with 8 indexes, that's up to 8 b-tree maintenance operations per write. Most teams treat CREATE INDEX as free because the read got faster now. The cost shows up six months later in INSERT latency that nobody traces back to "we added an index for that one report."


What to Add When You're Ready

Plan diff over time. Same query_hash, different indexes_used today vs. last week is a regression alarm. Cardinality changed. Statistics went stale. ANALYZE didn't run.

Cost-of-write per index. Multiply each table's INSERT/UPDATE rate by the number of indexes that touch the modified columns. Indexes on rarely-modified columns are nearly free. Indexes on hot-update columns are budget items.

Bloat tracking. pg_stat_user_indexes doesn't tell you when an index is fragmented and needs REINDEX. Add a column tracking the ratio between live tuples and index size — a sudden divergence is almost always bloat.

Seq Scan threshold alerts. A query that flips from Index Scan to Seq Scan in production is usually a missing or stale index. Catch it the day it happens, not the day the table grows enough to make it user-visible.

Counterfactual planning. Run the same query with SET enable_indexscan = off and compare plan costs. If the cost barely moves, the index is decorative.


The Bottom Line

Three files. One afternoon. About 350 lines.

A query wrapper that captures plans. A table with the dimensions you need to slice by. A dashboard that joins query plans to index stats — because that join is the question Postgres structurally cannot answer on its own.

You don't need pgBadger or pganalyze (those are great if you have the budget). You need the smallest possible instrument that answers "which of my indexes does the planner never actually pick" — because that's the question your pg_stat_user_indexes view can't.

The first time I ran mine, it told me 20 of 51 indexes had never been scanned, and 78% of my index disk was being maintained for nothing. Some of that is pre-launch noise. Some of it isn't. I now have a queued list of indexes to revisit 30 days after the product takes real traffic — and I have it because I built the lens before I needed it.

Build the lens before you ship. The schema is simplest now, and the question "which of these indexes will the planner actually use?" is one your future self will pay to answer if you don't pay to answer it cheaply today.


I'm building **Provia* — an AI sales chatbot for Arabic-speaking e-commerce stores. Follow for more posts on building AI products from Gaza on a tight budget.*

Top comments (0)