DEV Community

Vivek Kumar
Vivek Kumar

Posted on

SQL Indexes Explained: Write Queries That Don't Make Your Database Cry

SQL Indexes Explained: Write Queries That Don't Make Your Database Cry

Every developer eventually hits the moment where a query that looked perfectly fine suddenly takes 12 seconds to run in production. You stare at it in disbelief — it's not that complicated! Then someone suggests adding an index, you do, and the query drops to 20ms. Magic? Not quite. Once you understand what indexes actually do, you'll know exactly when to add them, how to design them well, and when not to use them.

This guide is for developers who write SQL regularly but want to move beyond "just add an index" to actually understanding the mechanics. We'll cover B-tree indexes, composite indexes, covering indexes, and the common mistakes that leave performance on the table — using PostgreSQL syntax throughout (MySQL and SQL Server are nearly identical).


What Is an Index, Really?

An index is a separate data structure that the database maintains alongside your table. Think of it like the index at the back of a textbook: instead of reading every page to find "normalization," you look it up alphabetically and jump straight to the right page.

Without an index, the database does a sequential scan — it reads every row in the table to find what you need. For a table with 10 rows, that's fine. For a table with 10 million rows, it's painful.

-- Without an index on email, this scans every row
SELECT * FROM users WHERE email = 'alice@example.com';
Enter fullscreen mode Exit fullscreen mode

With an index on email, the database uses a tree structure to find the matching row in a fraction of the time.


B-Tree: The Default Index Type

When you create an index without specifying a type, you get a B-tree (balanced tree). B-tree indexes store values in sorted order, organized as a tree where each level narrows down the search. They work brilliantly for:

  • Exact matches: WHERE status = 'active'
  • Range queries: WHERE created_at >= '2025-01-01'
  • Prefix searches: WHERE name LIKE 'Smi%'
  • Sorting: ORDER BY last_name

Creating a Simple Index

-- Syntax
CREATE INDEX idx_users_email ON users (email);

-- Confirm it exists
\d users   -- in psql, or:
SELECT indexname, indexdef FROM pg_indexes WHERE tablename = 'users';
Enter fullscreen mode Exit fullscreen mode

Now the query below uses the index instead of scanning the table:

EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'alice@example.com';
Enter fullscreen mode Exit fullscreen mode

You'll see something like Index Scan using idx_users_email in the query plan — a good sign.


Composite Indexes: Indexing Multiple Columns

A composite (multi-column) index covers more than one column in a single index structure. This is where index design gets interesting.

-- A composite index for order lookups by customer and date
CREATE INDEX idx_orders_customer_date ON orders (customer_id, created_at);
Enter fullscreen mode Exit fullscreen mode

The critical rule: the leftmost prefix principle.

The database can only use a composite index if the query filters on the leftmost column(s) in order. Think of it like a phone book sorted by last name, then first name: you can find everyone named "Smith", or find "John Smith" — but you can't efficiently search for everyone named "John" without scanning the whole book.

-- ✅ Uses idx_orders_customer_date (filters on customer_id)
SELECT * FROM orders WHERE customer_id = 42;

-- ✅ Uses idx_orders_customer_date (filters on both columns in order)
SELECT * FROM orders WHERE customer_id = 42 AND created_at >= '2025-01-01';

-- ❌ Cannot use the index (skips customer_id, goes straight to created_at)
SELECT * FROM orders WHERE created_at >= '2025-01-01';
Enter fullscreen mode Exit fullscreen mode

For the last query, you'd need a separate index on created_at alone.

General rule: Put equality columns first, range columns last.

-- Good: filter by status (equality) first, then date range
CREATE INDEX idx_orders_status_date ON orders (status, created_at);

-- A query that benefits
SELECT * FROM orders WHERE status = 'pending' AND created_at >= NOW() - INTERVAL '7 days';
Enter fullscreen mode Exit fullscreen mode

Covering Indexes: Eliminating Table Lookups

Even after finding the matching rows in an index, the database usually has to go back to the main table to fetch the columns you actually need. This is called a heap lookup (or table lookup). For very high-frequency queries, you can eliminate this extra step with a covering index — an index that contains all the columns your query needs.

-- Without covering index: finds row IDs in index, then fetches full rows from table
SELECT email, full_name FROM users WHERE status = 'active';

-- Create a covering index that includes the SELECT columns
CREATE INDEX idx_users_status_covering ON users (status) INCLUDE (email, full_name);
-- (INCLUDE available in PostgreSQL 11+, SQL Server 2005+)
Enter fullscreen mode Exit fullscreen mode

With this index, the database can answer the entire query from the index alone — never touching the main table. This is called an index-only scan and is extremely fast for read-heavy queries.

You can verify it worked with:

EXPLAIN ANALYZE SELECT email, full_name FROM users WHERE status = 'active';
-- Look for "Index Only Scan" in the output
Enter fullscreen mode Exit fullscreen mode

Partial Indexes: Index Only What You Query

If you only ever query a subset of rows, a partial index (PostgreSQL) lets you index just that subset:

-- Only index orders that are still pending (the ones you actually look up)
CREATE INDEX idx_orders_pending ON orders (created_at)
WHERE status = 'pending';
Enter fullscreen mode Exit fullscreen mode

This index is much smaller and faster to maintain than a full index on created_at. It's perfect for tables where most queries target a minority of rows — think "active users", "unprocessed jobs", or "open tickets".


Common Indexing Mistakes

1. Over-Indexing

Every index speeds up reads but slows down writes. When you INSERT, UPDATE, or DELETE a row, the database must update every index on that table. A table with 12 indexes on a write-heavy workload is going to hurt.

-- Anti-pattern: indexes on every column "just in case"
CREATE INDEX idx_users_first_name ON users (first_name);
CREATE INDEX idx_users_last_name  ON users (last_name);
CREATE INDEX idx_users_phone      ON users (phone);
CREATE INDEX idx_users_created_at ON users (created_at);
-- ...and five more you never use

-- Better: only index columns that appear in WHERE, JOIN, or ORDER BY
-- for high-traffic queries
Enter fullscreen mode Exit fullscreen mode

Audit your indexes regularly. Most databases let you check whether an index is actually being used.

2. LIKE Wildcards Defeat Indexes

A trailing wildcard (LIKE 'Smi%') can use an index. A leading wildcard (LIKE '%smith') cannot — it forces a full scan because the index is sorted by the start of the string, not the end.

-- ✅ Can use an index on last_name
WHERE last_name LIKE 'Smi%'

-- ❌ Forces a full scan — index is useless here
WHERE last_name LIKE '%smith'
Enter fullscreen mode Exit fullscreen mode

For full-text searching within strings, consider a full-text index or a tool like Elasticsearch instead.

3. Functions on Indexed Columns Kill the Index

Wrapping an indexed column in a function prevents the optimizer from using the index:

-- ❌ The function LOWER() breaks index usage on email
WHERE LOWER(email) = 'alice@example.com'

-- ✅ Option 1: Normalize data at insert time (store email already lowercased)
-- ✅ Option 2: Create a functional index
CREATE INDEX idx_users_email_lower ON users (LOWER(email));
WHERE LOWER(email) = 'alice@example.com'  -- now uses the functional index
Enter fullscreen mode Exit fullscreen mode

The same applies to DATE(timestamp_col), YEAR(date_col), and similar functions.

4. Low-Cardinality Columns

A column like is_active (only TRUE or FALSE) is a poor candidate for a B-tree index — the optimizer may decide a full scan is faster because the index doesn't narrow things down much. Partial indexes (as shown above) are a better choice for low-cardinality filtering.


How to Check If Your Index Is Being Used

Always verify with EXPLAIN ANALYZE:

EXPLAIN ANALYZE
SELECT order_id, total_amount
FROM orders
WHERE customer_id = 42
  AND created_at >= '2025-01-01';
Enter fullscreen mode Exit fullscreen mode

Look for these keywords in the output:

Output keyword Meaning Good?
Index Scan Found rows via index, then fetched from table
Index Only Scan Answered entirely from the index (covering index) ✅✅
Bitmap Index Scan Built a bitmap of matching rows from index
Seq Scan Full table scan — index not used ⚠️ investigate

If you see a Seq Scan on a large table and you have an index, something is wrong: the query may not match the index's column order, a function may be wrapping the column, or the table might be too small for the optimizer to bother.


Key Takeaways

Indexing well is one of the highest-leverage skills in SQL. Here's the short version:

  • B-tree indexes are the default and work for equality, range, prefix, and sort queries
  • Composite indexes follow the leftmost prefix rule — equality columns first, range columns last
  • Covering indexes (with INCLUDE) eliminate heap lookups for read-heavy queries
  • Partial indexes keep things lean when you only query a known subset of rows
  • Every index has a write overhead — index strategically, not speculatively
  • Functions on indexed columns break index usage — use functional indexes or normalize data at write time
  • Always verify with EXPLAIN ANALYZE — never assume an index is being used

A well-indexed table can handle orders of magnitude more load without any code changes. And once you start reading EXPLAIN ANALYZE output comfortably, you'll have a superpower that most developers don't.


Which indexing trick has saved you the most query time? Drop a comment — I'd love to hear about real production wins (or the horror stories that led to them). If this was useful, share it with a colleague who's still hitting query timeouts!

Top comments (0)