Safdar Wahid

Posted on Apr 21 • Originally published at blog.easecloud.io

Indexing Strategies for Faster Database Queries

#database #postgressql #sql #performance

TL;DR

Indexes = direct lookups — milliseconds vs full table scans (seconds).
B-tree for most queries — Supports =, <, >, BETWEEN, LIKE 'prefix%', ORDER BY.
Index WHERE / JOIN / ORDER BY columns — Otherwise full scan.
Composite index order matters — (a, b, c) works for a, a+b, a+b+c — not b or c alone. Equality first, then range.
Partial indexes — WHERE active = true = smaller, faster.
Covering indexes with INCLUDE — Query never touches table.
Maintenance — Drop unused (idx_scan = 0), remove duplicates, REINDEX, ANALYZE.
Common mistakes — Indexing everything, wrong column order, low-selectivity columns (booleans), functions like YEAR(date) =, skipping maintenance.

Database indexes are the most powerful tool for query optimization. Without indexes, databases scan entire tables to find matching rows. With proper indexes, databases locate data directly. The difference between scanning millions of rows and looking up a handful determines whether queries take seconds or milliseconds.

How Database Indexes Work

Indexes are auxiliary data structures that enable fast data lookup. Think of a book's index: instead of reading every page to find a topic, you look up the topic in the index and go directly to the relevant pages.

Scenario	Operation	Performance Impact
Without index	Full table scan — every row read and evaluated	For a million-row table: reads 1 million rows
With index	B-tree traversal + fetch matching rows	Traverses 3–4 levels, each = one disk read

With an index, databases perform index lookups. Finding matching rows requires only the tree traversal plus fetching the actual rows. For queries returning few rows, this is orders of magnitude faster.

Indexes have costs. They consume storage space. They slow down INSERT, UPDATE, and DELETE operations because indexes must be maintained. Index too much, and write performance suffers.

Query Planner Decision Factors:

Estimates costs for different strategies
Chooses the cheapest execution plan
Sometimes full scans are faster than index lookups — particularly when queries return most rows

Types of Indexes

B-tree indexes suit equality and range queries. They efficiently handle =, <, >, BETWEEN, and LIKE 'prefix%' conditions. Most databases use B-tree as the default index type.

-- B-tree index (default)
CREATE INDEX idx_users_email ON users(email);

-- Supports these queries efficiently
SELECT * FROM users WHERE email = 'user@example.com';
SELECT * FROM users WHERE email LIKE 'user%';
SELECT * FROM users WHERE email BETWEEN 'a%' AND 'm%';

Hash indexes provide fast equality lookups only. They don't support range queries. Some databases use hash indexes for specific use cases.

GIN indexes (Generalized Inverted Index) suit full-text search and array columns. PostgreSQL uses GIN for JSONB containment queries and full-text search.

-- GIN index for JSONB
CREATE INDEX idx_products_metadata ON products USING GIN (metadata);

-- Supports JSONB containment queries
SELECT * FROM products WHERE metadata @> '{"category": "electronics"}';

GiST indexes (Generalized Search Tree) suit geometric data and range types. PostGIS uses GiST for spatial queries.

Partial indexes index only a subset of rows — useful when queries consistently filter on a condition.

-- Partial index for active users only
CREATE INDEX idx_active_users ON users(email) WHERE active = true;

Expression indexes index computed expressions rather than raw columns.

-- Index on lowercase email for case-insensitive matching
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

Choosing What to Index

Index these:

Columns used in WHERE clauses — If queries filter on a column, it likely needs an index
Columns used in JOIN conditions — Joining tables on unindexed columns requires scanning
Columns used in ORDER BY — Indexes can provide pre-sorted data, eliminating sort operations

Analyze query patterns before indexing. Look at actual queries your application runs. Slow query logs reveal what needs optimization.

Selectivity Guidelines:

Column Type	Selectivity	Index Benefit
Highly selective (many unique values)	High	Benefits greatly from indexing
Low selectivity (few unique values, e.g., boolean)	Low	Rarely benefits from indexing

Primary keys — Automatically indexed
Foreign keys — Often need explicit indexes for join performance

Don't index everything. Each index adds write overhead and storage. Index strategically based on query patterns.

Use EXPLAIN to verify index usage. Query plans show whether indexes are used and how queries execute.

EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;

-- Look for "Index Scan" vs "Seq Scan" in output

Composite Index Strategy

Composite indexes cover multiple columns. A single index on (a, b, c) can support queries filtering on a, or a and b, or a and b and c.

Column order matters critically. An index on (a, b) supports queries on a alone, but not queries on b alone.

-- Composite index
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);

-- Efficiently supports:
SELECT * FROM orders WHERE customer_id = 123;
SELECT * FROM orders WHERE customer_id = 123 AND order_date > '2025-01-01';

-- Does NOT efficiently support:
SELECT * FROM orders WHERE order_date > '2025-01-01';  -- Can't use leading column

Put equality conditions before range conditions. In an index (a, b), if queries use a = value AND b > value, this works well. If queries use a > value AND b = value, the index is less effective for b.

Consider covering indexes. Including all columns a query needs in the index avoids fetching the actual table rows. PostgreSQL's INCLUDE clause adds columns without affecting sort order.

-- Covering index (PostgreSQL)
CREATE INDEX idx_orders_covering ON orders(customer_id) INCLUDE (total, status);

-- Query can be satisfied entirely from the index
SELECT total, status FROM orders WHERE customer_id = 123;

Index Maintenance and Monitoring

Monitor index usage statistics. Databases track how often indexes are used. Unused indexes waste resources.

-- PostgreSQL: Find unused indexes
SELECT schemaname, relname, indexrelname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0;

Remove unused indexes. Indexes not used for queries only slow down writes. Periodically audit and drop unused indexes.

Rebuild fragmented indexes. Over time, indexes become fragmented, reducing efficiency. Periodic rebuilding restores performance.

-- PostgreSQL: Rebuild index
REINDEX INDEX idx_orders_customer;

-- MySQL: Optimize table (rebuilds indexes)
OPTIMIZE TABLE orders;

Monitor index size relative to table size. Indexes larger than expected may indicate problems.

Check for duplicate indexes. Multiple indexes on the same columns waste resources. A composite index (a, b) makes a single-column index on a redundant.

Analyze tables regularly. Statistics help query planners make good decisions. Outdated statistics lead to poor query plans.

-- PostgreSQL
ANALYZE orders;

-- MySQL
ANALYZE TABLE orders;

Common Indexing Mistakes

Mistake	Problem	Solution
Indexing every column	Wastes resources, adds write overhead	Index only columns in query conditions
Ignoring column order in composite indexes	Index on `(a, b)` doesn't help `b` alone	Put equality first, range last
Over-indexing for write-heavy workloads	Slow INSERT/UPDATE/DELETE	Balance read vs write performance
Not considering index-only scans	Queries need table access	Design covering indexes
Indexing low-selectivity columns	Boolean index points to half the table	Rarely worth it
Functions preventing index usage	`WHERE YEAR(date) = 2025` can't use index	Rewrite as `date >= '2025-01-01' AND date < '2026-01-01'`
Forgetting index maintenance	Fragmentation reduces efficiency	Plan for regular rebuilding

Database-Specific Considerations

PostgreSQL offers diverse index types — B-tree, Hash, GIN, GiST, BRIN, and more — suited to different data types and access patterns.

MySQL clusters data with the primary key. Secondary indexes include the primary key, affecting size and lookup performance. Choose primary keys carefully.

-- MySQL: Primary key affects all secondary indexes
CREATE TABLE users (
    id BIGINT PRIMARY KEY,  -- Clustered
    email VARCHAR(255),
    INDEX idx_email (email)  -- Includes id implicitly
);

SQL Server distinguishes clustered and non-clustered indexes. One clustered index per table determines physical row order.

Cloud databases may have specific index features. Amazon Aurora, Google Cloud SQL, and Azure SQL have optimization features beyond their base engines.

Conclusion

Database indexing is both science and art. The science: B-trees, composite column order, covering indexes — clear, measurable behavior. The art: choosing which indexes to create, balancing read vs write overhead, knowing when a full table scan is actually faster.

The process:

Start with the slowest queries
Run EXPLAIN ANALYZE
Look for full table scans (Seq Scan in PostgreSQL, ALL in MySQL)
Add indexes for columns in WHERE, JOIN, and ORDER BY clauses
Build composite indexes with equality columns first, range columns last
Monitor index usage — unused indexes waste resources
Rebuild occasionally. Drop duplicates.
Always verify with EXPLAIN that your index is actually being used

An index that isn't used is just wasted storage and slower writes. Done right, indexes transform query performance from unbearable to instant. Done wrong, they add complexity without benefit.

👉 Talk to Our Engineers | See Case Studies

FAQs

1. How do I know if a query needs an index?

Run EXPLAIN ANALYZE before and after adding the index. Look for:

Before: Seq Scan (PostgreSQL) or ALL (MySQL) — database reads every row
After: Index Scan or Index Only Scan — database used your index

If a query is slow and EXPLAIN shows a full table scan on a large table, that's your candidate. Also check the rows estimate vs actual — wildly inaccurate estimates often indicate missing statistics or a need for ANALYZE.

2. What's the difference between a composite index and multiple single-column indexes?

Aspect	Composite Index `(a, b, c)`	Multiple Single-Column Indexes on `a`, `b`, `c`
Structure	One B-tree sorted by `a`, then `b`, then `c`	Three separate B-trees
Supports `a` alone	✅ Yes (leading column)	✅ Yes
Supports `a + b`	✅ Excellent	⚠️ Can combine (bitmap scans) but less efficient
Supports `b` alone	❌ No	✅ Yes
Storage	One index	Three indexes (more storage)
Rule of thumb	Use for columns frequently used together in filters	Use for columns frequently filtered independently

3. How many indexes is too many?

There's no magic number — it depends on your read/write ratio. Every INSERT, UPDATE, and DELETE must update every index on that table. If your write throughput is high, many indexes become expensive.

Signs of over-indexing:

Write queries are slow, but EXPLAIN shows they're waiting on index updates
High idx_blks_written relative to idx_blks_read (PostgreSQL)
Unused indexes (idx_scan = 0)

Start with: indexes for primary keys, foreign keys, and the top 5–10 slowest queries. Add more as needed. Remove unused indexes quarterly. A lean index set beats a bloated one.

DEV Community