OutworkTech

Posted on Jun 2

Database Indexing Mistakes That Kill SaaS Performance at Scale

#database #performance #saas #sql

Your API is fast. Your code is clean. Your architecture looks solid on paper.

Then you hit 500,000 records and everything slows down. Queries that ran in 12ms now take 4 seconds. Your dashboards lag. Users start filing support tickets. Your on-call engineer is staring at a query plan at midnight wondering what went wrong.

Nine times out of ten, the answer is indexing. Not missing indexes — wrong indexes. Indexes that exist but don't help. Indexes that actively hurt write performance without meaningfully improving reads.

This is a breakdown of the most damaging database indexing mistakes in production SaaS systems — and how to fix them before they become incidents.

Mistake 1: Indexing Everything "Just in Case"

The most common mistake isn't under-indexing. It's over-indexing out of anxiety.

New engineers especially fall into this pattern — add an index on every column that appears in a WHERE clause, just to be safe. Seems responsible. It isn't.

Every index you add is a write tax. On every INSERT, UPDATE, and DELETE, PostgreSQL (or MySQL) has to update every index on that table. On a table with 8 indexes, every write touches 8 data structures.

At low volume, this is invisible. At 10,000 writes per minute, it becomes your bottleneck.

The fix:

Audit your indexes regularly. In PostgreSQL:

SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;

Any index with idx_scan = 0 or near zero hasn't been used since your last stats reset. That's a candidate for removal — not immediately, but after investigation.

Mistake 2: Not Understanding Index Selectivity

An index on a boolean column (is_active, is_deleted) is almost always useless.

Here's why: selectivity measures how many distinct values exist relative to total rows. A boolean column has two values. If 95% of your rows have is_active = true, an index on that column tells the query planner almost nothing useful. It will often skip the index entirely and do a full table scan — correctly.

-- This index is nearly useless on a table where 95% of rows are active
CREATE INDEX idx_users_is_active ON users(is_active);

-- This is what you probably need instead
CREATE INDEX idx_users_active_created ON users(created_at)
WHERE is_active = true;

The second example is a partial index — it only indexes rows matching the condition. Smaller, faster, and actually selective.

Rule of thumb: If a column has fewer than 10-20 distinct values relative to table size, a plain index on it alone will underperform. Use partial indexes or composite indexes instead.

Mistake 3: Getting Composite Index Column Order Wrong

Composite indexes are powerful and widely misunderstood.

PostgreSQL can use a composite index (a, b, c) for queries filtering on a, or a and b, or a and b and c. It cannot efficiently use it for queries filtering on just b or just c.

-- Index created
CREATE INDEX idx_orders_user_status_date ON orders(user_id, status, created_at);

-- This query uses the index efficiently ✓
SELECT * FROM orders WHERE user_id = 123 AND status = 'pending';

-- This query does NOT efficiently use the index ✗
SELECT * FROM orders WHERE status = 'pending' AND created_at > '2025-01-01';

The second query skips user_id — the leading column — so the index is effectively useless for it.

The fix:

Put the most selective column first, and design composite indexes around your actual query patterns — not your table schema. Run EXPLAIN ANALYZE on your real queries before creating indexes.

EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 123
AND status = 'pending'
ORDER BY created_at DESC;

If you see Seq Scan on a large table, you have an indexing problem. If you see Index Scan with high cost, you have a column order problem.

Mistake 4: Ignoring Index Bloat

Indexes degrade over time. This surprises most engineers who treat indexes as a set-and-forget solution.

In PostgreSQL, when rows are updated or deleted, the old index entries are not immediately removed. They become dead tuples — bloat that the index still has to scan through. On high-churn tables (orders, events, logs, sessions), this bloat accumulates fast.

A table with 1 million live rows can have an index sized for 8 million rows due to bloat. Every query through that index is doing 8x the work it should.

Check your index bloat:

SELECT
  tablename,
  indexname,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
  idx_scan
FROM pg_stat_user_indexes
JOIN pg_index USING (indexrelid)
ORDER BY pg_relation_size(indexrelid) DESC;

The fix:

Schedule regular REINDEX CONCURRENTLY on high-churn tables during low-traffic windows:

REINDEX INDEX CONCURRENTLY idx_orders_user_status_date;

CONCURRENTLY is critical — a standard REINDEX locks the table. On a production SaaS, that lock will cause an incident.

Also make sure autovacuum is properly tuned. The default settings are conservative and often insufficient for high-write SaaS workloads.

Mistake 5: Using Indexes on Low-Cardinality Columns in Multi-Tenant Systems

This one is specific to SaaS and almost always overlooked.

In a multi-tenant system, most queries include a tenant_id filter. The natural instinct is to index tenant_id. But if you have 50 large tenants sharing a table with 10 million rows, tenant_id alone is low-cardinality for those tenants — each one owns 200,000 rows.

An index scan on tenant_id = 'large-tenant-uuid' returns 200,000 rows. PostgreSQL may decide a sequential scan is faster. Your "indexed" query is still slow.

-- Insufficient for large tenants
CREATE INDEX idx_events_tenant ON events(tenant_id);

-- Much better — tenant + time range covers real query patterns
CREATE INDEX idx_events_tenant_created ON events(tenant_id, created_at DESC);

-- Even better for specific query patterns
CREATE INDEX idx_events_tenant_type_created ON events(tenant_id, event_type, created_at DESC)
WHERE event_type IN ('purchase', 'refund', 'signup');

The real fix for multi-tenant systems at serious scale is table partitioning by tenant_id — but that's a separate architectural decision. Composite indexes with time-range columns are the practical first step.

Mistake 6: Not Indexing Foreign Keys

This one causes slow deletes and JOINs that no one can explain.

In PostgreSQL, foreign key columns are not automatically indexed. When you delete a parent row, PostgreSQL has to check all child tables for referencing rows — and without an index on the foreign key column, it does a sequential scan on every child table for every delete.

-- You have this
ALTER TABLE orders ADD CONSTRAINT fk_orders_user
  FOREIGN KEY (user_id) REFERENCES users(id);

-- PostgreSQL does NOT automatically create this
CREATE INDEX idx_orders_user_id ON orders(user_id);

-- You have to create it manually

On small tables this is invisible. On a user_id column in an orders table with 50 million rows, deleting or updating a user triggers a full table scan. That 4-second delete? This is often why.

The fix:

After every foreign key constraint, immediately create an index on the referencing column. Make it a team convention — part of your migration checklist.

Mistake 7: Not Using `EXPLAIN ANALYZE` Before Deploying Index Changes

Most indexing decisions are made by intuition. Intuition is wrong often enough to matter.

EXPLAIN ANALYZE shows you exactly what the query planner is doing — which indexes it uses, which it ignores, how many rows it actually scanned versus estimated, and where the time is actually being spent.

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.total, u.email
FROM orders o
JOIN users u ON u.id = o.user_id
WHERE o.tenant_id = 'abc-123'
  AND o.status = 'pending'
  AND o.created_at > NOW() - INTERVAL '7 days'
ORDER BY o.created_at DESC
LIMIT 50;

What to look for:

Seq Scan on large tables → missing index
Rows Removed by Filter: 89420 → index exists but wrong columns, low selectivity
Buffers: shared hit=0 read=45000 → index is there but cold, or bloated
High actual time on a node despite index use → index bloat or statistics out of date

Run ANALYZE tablename to refresh planner statistics if query plans look wrong.

The Indexing Checklist for SaaS Systems

Before your next migration goes to production:

[ ] Does every foreign key column have an index?
[ ] Are composite index columns ordered by selectivity, not convenience?
[ ] Are boolean or low-cardinality filters using partial indexes instead of full indexes?
[ ] Have you run EXPLAIN ANALYZE on the top 10 slowest queries this week?
[ ] Do you have a process for identifying and removing unused indexes?
[ ] Are high-churn tables scheduled for regular REINDEX CONCURRENTLY?
[ ] Is autovacuum tuned for your actual write volume, not PostgreSQL defaults?
[ ] In multi-tenant tables, do indexes include tenant_id as the leading column?

The Bigger Point

Indexes are not a performance feature you add when things get slow. They are a design decision you make alongside your schema — and revisit as your query patterns evolve.

The teams that handle scale well aren't the ones with the most indexes. They're the ones who understand what each index costs, what it buys, and when to remove the ones that are no longer earning their keep.

A database that's fast at 10,000 rows and fast at 50 million rows doesn't happen by accident. It happens because someone treated query planning as a first-class engineering concern — not an afterthought.

This post is part of OutworkTech's backend engineering series. If you missed the previous posts — How to Version APIs Without Breaking Production and REST vs GraphQL vs gRPC — they cover the API layer that sits on top of the database decisions discussed here.

OutworkTech builds and scales backend systems, APIs, and SaaS infrastructure for companies that need engineering depth without the overhead. If your database is becoming a bottleneck — let's talk.

Top comments (1)

Echo • Jun 2

This is the stuff nobody catches in code review. The classic "we have an index on (a, b) but the query filters on b alone" gotcha scales into a 30s page-load before anyone notices. Adding EXPLAIN to CI for the top 5 queries per endpoint is the cheap fix.