DEV Community

RAXXO Studios
RAXXO Studios

Posted on • Originally published at raxxo.shop

The 7 Postgres Indexes That Took My API From 400ms to 40ms

  • I took one of my APIs from 400ms p95 to 40ms p95 by fixing 7 missing or wrong Postgres indexes.

  • Most of my slow queries were not slow because of bad SQL, they were slow because Postgres had to scan the whole table.

  • Partial indexes and covering indexes did more for me than plain B-tree indexes on primary columns.

  • EXPLAIN ANALYZE and pg_stat_statements are the only two tools you need to find the real bottlenecks.

The API in question powers my analytics dashboard. Nothing fancy. A few endpoints that read from a Postgres database with around 8 million rows spread across 14 tables. It had gotten slow. 400ms p95 on the hot endpoints, 900ms p99, timeouts on the reports page when I filtered by date range. I spent a Saturday looking at indexes instead of rewriting code, and that Saturday is the reason the dashboard now answers in 40ms. This post is what I changed, in order, with the numbers before and after. Postgres indexes are the lever I keep underestimating, and I want to stop doing that.

I am going to assume you know roughly what an index is. What I am not assuming is that you have seen the specific index shapes that saved me the most time. Partial indexes, covering indexes, and expression indexes do not get enough airtime in the tutorials I learned from. They should.

The 400ms Baseline: What My Queries Actually Looked Like

Before touching anything, I turned on pg_stat_statements and let it run for a full day. This is step zero. If you do not know which queries are slow, you will optimize the wrong things. I have done it. I have spent an hour tuning a query that accounts for 0.4% of total query time while the one eating 41% of the database kept running untouched.


CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

SELECT
  substring(query, 1, 80) AS query,
  calls,
  round(total_exec_time::numeric, 1) AS total_ms,
  round(mean_exec_time::numeric, 2) AS mean_ms,
  round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 1) AS pct
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

Enter fullscreen mode Exit fullscreen mode

The results were humbling. Four queries were responsible for 78% of total execution time. Every one of them was doing a sequential scan on tables with millions of rows. I ran EXPLAIN (ANALYZE, BUFFERS) on each and confirmed it. Seq Scan, Seq Scan, Seq Scan, and a Nested Loop with a Seq Scan on the inner side.

This is the honest baseline I started from:

| Endpoint | p50 | p95 | p99 | Worst query |

|---|---|---|---|---|

| GET /events | 120ms | 410ms | 920ms | filter by user_id + created_at |

| GET /reports | 280ms | 890ms | 2100ms | aggregate by date range |

| GET /users/:id/summary | 60ms | 220ms | 480ms | join users to events |

| GET /search | 180ms | 520ms | 1400ms | ILIKE on title |

I want to be clear that this was not a case of Postgres being slow. Postgres was doing exactly what I told it to do. I had not told it how to find the rows I cared about without reading every row, so it read every row. That is on me.

The 7 Indexes That Actually Moved the Needle

These are the seven indexes I added, in the order I added them, with the before and after for the query that changed most.

1. Composite index on (user_id, created_at DESC)

The events table is ordered by write time. My queries are ordered by (user_id, created_at DESC) because I always want a user's newest events first. A single-column index on user_id is not enough. Postgres can find the user's rows but then has to sort them by created_at, which on a busy user is a lot of rows.


CREATE INDEX CONCURRENTLY idx_events_user_created
ON events (user_id, created_at DESC);

Enter fullscreen mode Exit fullscreen mode

Before: 410ms p95. After: 24ms p95. One index, 17x faster, and this was the single biggest win of the day. CONCURRENTLY matters. On a table I cannot take offline, a non-concurrent CREATE INDEX holds a lock that blocks writes. Always concurrent for production tables.

2. Partial index for unread events

A specific subset of events is "unread". That subset is less than 2% of total rows. A full index on the boolean read_at IS NULL column would touch every row. A partial index only stores the rows where the predicate is true, so it is smaller, faster to read, faster to maintain.


CREATE INDEX CONCURRENTLY idx_events_unread
ON events (user_id, created_at DESC)
WHERE read_at IS NULL;

Enter fullscreen mode Exit fullscreen mode

Before: the notification badge count query took 180ms. After: 3ms. The index is 94% smaller than the full equivalent, which also means it fits in cache and stays there.

3. Covering index for the hottest list query

This is the one I wish I had learned five years earlier. When Postgres uses an index to find rows, it still has to fetch the row data from the heap unless all the columns you need are inside the index itself. A covering index adds those columns via INCLUDE, so the heap fetch never happens. This is called an index-only scan, and it is as fast as Postgres gets.


CREATE INDEX CONCURRENTLY idx_events_list_covering
ON events (user_id, created_at DESC)
INCLUDE (title, type, read_at);

Enter fullscreen mode Exit fullscreen mode

EXPLAIN went from Index Scan + Heap Fetch to Index Only Scan. The list endpoint that renders the dashboard feed dropped from 96ms to 11ms. The trade-off is disk space. My covering index is 2.3x the size of the plain version. On an 8 million row table, that is about 240 MB. Worth it for a hot endpoint.

4. Expression index for case-insensitive search

My search endpoint was doing LOWER(title) ILIKE '%term%'. The ILIKE with a leading wildcard is a sequential scan no matter what. For the prefix case (ILIKE 'term%'), I can use an expression index on LOWER(title) with the right operator class.


CREATE INDEX CONCURRENTLY idx_events_title_lower
ON events (LOWER(title) text_pattern_ops);

Enter fullscreen mode Exit fullscreen mode

Searches that start from the beginning of the title dropped from 520ms to 14ms. For full substring search I eventually added pg_trgm and a GIN index, but that is another post. Start with the cheap win. Most of my search queries were prefix matches anyway.

5. GIN index on JSONB metadata

My events have a JSONB metadata column with things like device type, country, referrer. I was filtering on metadata ->> 'country' = 'DE' all over the place. Postgres supports GIN indexes on JSONB, and specifically on the jsonb_path_ops operator class, which is smaller and faster than the default for equality checks.


CREATE INDEX CONCURRENTLY idx_events_metadata_gin
ON events USING GIN (metadata jsonb_path_ops);

Enter fullscreen mode Exit fullscreen mode

Country filter queries dropped from 320ms to 18ms. The index is larger than a B-tree (28% of the table size in my case), but it covers any key inside the JSONB object. One index, many queries, all fast.

6. BRIN index for time-series scans

This one is niche but saved me on the reports endpoint. When your data is naturally ordered on disk by a column (in my case, created_at, because I never update old events), a BRIN index stores tiny summaries per block of pages instead of one entry per row. It is about 1000x smaller than an equivalent B-tree. For range scans across millions of rows, it is often fast enough and cheap enough to leave on every table.


CREATE INDEX CONCURRENTLY idx_events_created_brin
ON events USING BRIN (created_at) WITH (pages_per_range = 32);

Enter fullscreen mode Exit fullscreen mode

The report aggregate that scanned 90 days of data dropped from 890ms to 62ms. The index is 4 MB on an 8 million row table. Four megabytes.

7. Foreign key index on the join side

I had a foreign key from events.organization_id to organizations.id but no index on events.organization_id. Postgres does not create one for you when you add a foreign key. The join from organizations to events was a sequential scan on the events table every time.


CREATE INDEX CONCURRENTLY idx_events_org_id
ON events (organization_id);

Enter fullscreen mode Exit fullscreen mode

The organization summary endpoint dropped from 480ms to 9ms. I went back and audited every foreign key in my schema. I was missing three more. I added them all.

Where I Got Indexes Wrong

I also made mistakes. A few of the indexes I added first turned out to be useless or actively harmful, and I want to save you the debugging time.

Indexes I never used. My first instinct was to index every column that appeared in a WHERE clause. Postgres has pg_stat_user_indexes to tell you which indexes are actually being read. After a week of production traffic, I queried it and found three indexes with zero scans. I dropped them.


SELECT
  schemaname, relname, indexrelname,
  idx_scan, pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

Enter fullscreen mode Exit fullscreen mode

Every unused index is pure cost. It slows down writes, bloats backups, competes for cache. If it is not being scanned, drop it.

Indexes on low-cardinality columns. I indexed a status column with 4 distinct values. Postgres looked at the stats and chose a sequential scan anyway, because fetching 40% of the rows through an index is slower than just reading the table. The correct fix was a partial index, like the unread one above. Whenever a column has few distinct values but you mostly query one specific value, a partial index wins over a full index.

Indexing before migration finishes. I once added an index during a schema migration on a table that was being heavily written to. Without CONCURRENTLY, the migration held an ACCESS EXCLUSIVE lock for 90 seconds and blocked production writes the entire time. That incident is why every index I create now has CONCURRENTLY in the statement, no exceptions, even in development, because muscle memory is how you avoid incidents.

Over-covering indexes. My second attempt at the covering index on events had INCLUDE (title, type, read_at, metadata, raw_payload). Postgres refused to do an index-only scan anyway because raw_payload was too large. Covering indexes make sense when the included columns are small and hot. Large blobs belong in the heap.

Monitoring: How I Know an Index Is Earning Its Keep

After the dust settled, I set up three queries I now run weekly. Together they tell me whether my indexes are doing useful work.

Index hit ratio. The ratio of index reads coming from cache vs disk. Below 99% on a busy table is a sign the working set does not fit in memory or the indexes are too big.


SELECT
  relname,
  idx_blks_read,
  idx_blks_hit,
  round(100.0 * idx_blks_hit / nullif(idx_blks_hit + idx_blks_read, 0), 2) AS hit_pct
FROM pg_statio_user_indexes
WHERE idx_blks_read > 0
ORDER BY idx_blks_read DESC
LIMIT 20;

Enter fullscreen mode Exit fullscreen mode

Bloat. Indexes accumulate dead tuples over time, especially on tables with lots of updates. The pgstattuple extension gives you a real number. Above 30% bloat, I run REINDEX CONCURRENTLY (available in Postgres 12+). Before 12 I used pg_repack.

Unused indexes. The same pg_stat_user_indexes query from above, run on a fresh snapshot once a month. Anything at zero scans gets dropped.

None of this is exotic. All of it is built into Postgres. Most of it I learned by reading the docs after the fact, instead of before. If you do this before you go to production, you are ahead of where I was.

Bottom Line

Indexes are the highest leverage work I have ever done on a database and the one I keep putting off because it feels less fun than writing new features. That is a trap. A Saturday on indexes gave me a 10x speedup on the user-visible parts of my dashboard and retired three tickets about slowness that had been open for weeks. The tools to find the right indexes are already in your database. pg_stat_statements for the hottest queries. EXPLAIN ANALYZE for the execution plan. pg_stat_user_indexes for usage. If you have not opened any of those this month, open one today. Your p95 will thank you.

Top comments (0)