DEV Community

Cover image for Indexing Strategies for Faster Database Queries
Safdar Wahid
Safdar Wahid

Posted on • Originally published at blog.easecloud.io

Indexing Strategies for Faster Database Queries

TL;DR

  • Indexes = direct lookups — milliseconds vs full table scans (seconds).
  • B-tree for most queries — Supports =, <, >, BETWEEN, LIKE 'prefix%', ORDER BY.
  • Index WHERE / JOIN / ORDER BY columns — Otherwise full scan.
  • Composite index order matters(a, b, c) works for a, a+b, a+b+c — not b or c alone. Equality first, then range.
  • Partial indexesWHERE active = true = smaller, faster.
  • Covering indexes with INCLUDE — Query never touches table.
  • Maintenance — Drop unused (idx_scan = 0), remove duplicates, REINDEX, ANALYZE.
  • Common mistakes — Indexing everything, wrong column order, low-selectivity columns (booleans), functions like YEAR(date) =, skipping maintenance.

Database indexes are the most powerful tool for query optimization. Without indexes, databases scan entire tables to find matching rows. With proper indexes, databases locate data directly. The difference between scanning millions of rows and looking up a handful determines whether queries take seconds or milliseconds.


How Database Indexes Work

Indexes are auxiliary data structures that enable fast data lookup. Think of a book's index: instead of reading every page to find a topic, you look up the topic in the index and go directly to the relevant pages.

Scenario Operation Performance Impact
Without index Full table scan — every row read and evaluated For a million-row table: reads 1 million rows
With index B-tree traversal + fetch matching rows Traverses 3–4 levels, each = one disk read

With an index, databases perform index lookups. Finding matching rows requires only the tree traversal plus fetching the actual rows. For queries returning few rows, this is orders of magnitude faster.

Full table scan: 10,000ms reads 1M rows. Index scan: 10ms reads 3-4 levels. Indexes turn seconds into milliseconds.

Indexes have costs. They consume storage space. They slow down INSERT, UPDATE, and DELETE operations because indexes must be maintained. Index too much, and write performance suffers.

Query Planner Decision Factors:

  • Estimates costs for different strategies
  • Chooses the cheapest execution plan
  • Sometimes full scans are faster than index lookups — particularly when queries return most rows

Types of Indexes

B-tree indexes suit equality and range queries. They efficiently handle =, <, >, BETWEEN, and LIKE 'prefix%' conditions. Most databases use B-tree as the default index type.

-- B-tree index (default)
CREATE INDEX idx_users_email ON users(email);

-- Supports these queries efficiently
SELECT * FROM users WHERE email = 'user@example.com';
SELECT * FROM users WHERE email LIKE 'user%';
SELECT * FROM users WHERE email BETWEEN 'a%' AND 'm%';
Enter fullscreen mode Exit fullscreen mode

Hash indexes provide fast equality lookups only. They don't support range queries. Some databases use hash indexes for specific use cases.

GIN indexes (Generalized Inverted Index) suit full-text search and array columns. PostgreSQL uses GIN for JSONB containment queries and full-text search.

-- GIN index for JSONB
CREATE INDEX idx_products_metadata ON products USING GIN (metadata);

-- Supports JSONB containment queries
SELECT * FROM products WHERE metadata @> '{"category": "electronics"}';
Enter fullscreen mode Exit fullscreen mode

GiST indexes (Generalized Search Tree) suit geometric data and range types. PostGIS uses GiST for spatial queries.

Partial indexes index only a subset of rows — useful when queries consistently filter on a condition.

-- Partial index for active users only
CREATE INDEX idx_active_users ON users(email) WHERE active = true;
Enter fullscreen mode Exit fullscreen mode

Expression indexes index computed expressions rather than raw columns.

-- Index on lowercase email for case-insensitive matching
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
Enter fullscreen mode Exit fullscreen mode

Choosing What to Index

Index these:

  • Columns used in WHERE clauses — If queries filter on a column, it likely needs an index
  • Columns used in JOIN conditions — Joining tables on unindexed columns requires scanning
  • Columns used in ORDER BY — Indexes can provide pre-sorted data, eliminating sort operations

Analyze query patterns before indexing. Look at actual queries your application runs. Slow query logs reveal what needs optimization.

Selectivity Guidelines:

Column Type Selectivity Index Benefit
Highly selective (many unique values) High Benefits greatly from indexing
Low selectivity (few unique values, e.g., boolean) Low Rarely benefits from indexing
  • Primary keys — Automatically indexed
  • Foreign keys — Often need explicit indexes for join performance

Don't index everything. Each index adds write overhead and storage. Index strategically based on query patterns.

Use EXPLAIN to verify index usage. Query plans show whether indexes are used and how queries execute.

EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;

-- Look for "Index Scan" vs "Seq Scan" in output
Enter fullscreen mode Exit fullscreen mode

Composite Index Strategy

Composite indexes cover multiple columns. A single index on (a, b, c) can support queries filtering on a, or a and b, or a and b and c.

Column order matters critically. An index on (a, b) supports queries on a alone, but not queries on b alone.

-- Composite index
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);

-- Efficiently supports:
SELECT * FROM orders WHERE customer_id = 123;
SELECT * FROM orders WHERE customer_id = 123 AND order_date > '2025-01-01';

-- Does NOT efficiently support:
SELECT * FROM orders WHERE order_date > '2025-01-01';  -- Can't use leading column
Enter fullscreen mode Exit fullscreen mode

Put equality conditions before range conditions. In an index (a, b), if queries use a = value AND b > value, this works well. If queries use a > value AND b = value, the index is less effective for b.

Consider covering indexes. Including all columns a query needs in the index avoids fetching the actual table rows. PostgreSQL's INCLUDE clause adds columns without affecting sort order.

-- Covering index (PostgreSQL)
CREATE INDEX idx_orders_covering ON orders(customer_id) INCLUDE (total, status);

-- Query can be satisfied entirely from the index
SELECT total, status FROM orders WHERE customer_id = 123;
Enter fullscreen mode Exit fullscreen mode

Index Maintenance and Monitoring

Monitor index usage statistics. Databases track how often indexes are used. Unused indexes waste resources.

-- PostgreSQL: Find unused indexes
SELECT schemaname, relname, indexrelname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0;
Enter fullscreen mode Exit fullscreen mode

Remove unused indexes. Indexes not used for queries only slow down writes. Periodically audit and drop unused indexes.

Rebuild fragmented indexes. Over time, indexes become fragmented, reducing efficiency. Periodic rebuilding restores performance.

Index fragmentation: 40% before causes bloat and slow scans. REINDEX restores 0% fragmentation.

-- PostgreSQL: Rebuild index
REINDEX INDEX idx_orders_customer;

-- MySQL: Optimize table (rebuilds indexes)
OPTIMIZE TABLE orders;
Enter fullscreen mode Exit fullscreen mode

Monitor index size relative to table size. Indexes larger than expected may indicate problems.

Check for duplicate indexes. Multiple indexes on the same columns waste resources. A composite index (a, b) makes a single-column index on a redundant.

Analyze tables regularly. Statistics help query planners make good decisions. Outdated statistics lead to poor query plans.

-- PostgreSQL
ANALYZE orders;

-- MySQL
ANALYZE TABLE orders;
Enter fullscreen mode Exit fullscreen mode

Common Indexing Mistakes

Mistake Problem Solution
Indexing every column Wastes resources, adds write overhead Index only columns in query conditions
Ignoring column order in composite indexes Index on (a, b) doesn't help b alone Put equality first, range last
Over-indexing for write-heavy workloads Slow INSERT/UPDATE/DELETE Balance read vs write performance
Not considering index-only scans Queries need table access Design covering indexes
Indexing low-selectivity columns Boolean index points to half the table Rarely worth it
Functions preventing index usage WHERE YEAR(date) = 2025 can't use index Rewrite as date >= '2025-01-01' AND date < '2026-01-01'
Forgetting index maintenance Fragmentation reduces efficiency Plan for regular rebuilding

Database-Specific Considerations

PostgreSQL offers diverse index types — B-tree, Hash, GIN, GiST, BRIN, and more — suited to different data types and access patterns.

MySQL clusters data with the primary key. Secondary indexes include the primary key, affecting size and lookup performance. Choose primary keys carefully.

-- MySQL: Primary key affects all secondary indexes
CREATE TABLE users (
    id BIGINT PRIMARY KEY,  -- Clustered
    email VARCHAR(255),
    INDEX idx_email (email)  -- Includes id implicitly
);
Enter fullscreen mode Exit fullscreen mode

SQL Server distinguishes clustered and non-clustered indexes. One clustered index per table determines physical row order.

Cloud databases may have specific index features. Amazon Aurora, Google Cloud SQL, and Azure SQL have optimization features beyond their base engines.


Conclusion

Database indexing is both science and art. The science: B-trees, composite column order, covering indexes — clear, measurable behavior. The art: choosing which indexes to create, balancing read vs write overhead, knowing when a full table scan is actually faster.

The process:

  1. Start with the slowest queries
  2. Run EXPLAIN ANALYZE
  3. Look for full table scans (Seq Scan in PostgreSQL, ALL in MySQL)
  4. Add indexes for columns in WHERE, JOIN, and ORDER BY clauses
  5. Build composite indexes with equality columns first, range columns last
  6. Monitor index usage — unused indexes waste resources
  7. Rebuild occasionally. Drop duplicates.
  8. Always verify with EXPLAIN that your index is actually being used

An index that isn't used is just wasted storage and slower writes. Done right, indexes transform query performance from unbearable to instant. Done wrong, they add complexity without benefit.

👉 Talk to Our Engineers | See Case Studies


FAQs

1. How do I know if a query needs an index?

Run EXPLAIN ANALYZE before and after adding the index. Look for:

  • Before: Seq Scan (PostgreSQL) or ALL (MySQL) — database reads every row
  • After: Index Scan or Index Only Scan — database used your index

If a query is slow and EXPLAIN shows a full table scan on a large table, that's your candidate. Also check the rows estimate vs actual — wildly inaccurate estimates often indicate missing statistics or a need for ANALYZE.

2. What's the difference between a composite index and multiple single-column indexes?

Aspect Composite Index (a, b, c) Multiple Single-Column Indexes on a, b, c
Structure One B-tree sorted by a, then b, then c Three separate B-trees
Supports a alone ✅ Yes (leading column) ✅ Yes
Supports a + b ✅ Excellent ⚠️ Can combine (bitmap scans) but less efficient
Supports b alone ❌ No ✅ Yes
Storage One index Three indexes (more storage)
Rule of thumb Use for columns frequently used together in filters Use for columns frequently filtered independently

3. How many indexes is too many?

There's no magic number — it depends on your read/write ratio. Every INSERT, UPDATE, and DELETE must update every index on that table. If your write throughput is high, many indexes become expensive.

Signs of over-indexing:

  • Write queries are slow, but EXPLAIN shows they're waiting on index updates
  • High idx_blks_written relative to idx_blks_read (PostgreSQL)
  • Unused indexes (idx_scan = 0)

Start with: indexes for primary keys, foreign keys, and the top 5–10 slowest queries. Add more as needed. Remove unused indexes quarterly. A lean index set beats a bloated one.

Top comments (0)