The Indexing Paradox That Cost Me 3 Seconds Per Query
I learned the hard way that adding indexes doesn't always make queries faster. After indexing a status column in my PostgreSQL database, I watched in horror as my query time jumped from 50ms to 3 seconds. What went wrong?
The truth is: wrong indexing is worse than no indexing at all. Today, I'll show you exactly when to index, when not to index, and how databases actually use your indexes under the hood.
What Is Database Indexing? The Library Analogy
Imagine searching for a book in a library with 10 million books arranged randomly. You'd have to check every single book—that's a full table scan in database terms.
Now imagine the same library with a card catalog organized alphabetically by author. You can find any book in seconds by looking up the author first. That's an index.
A database index is a separate data structure that stores a sorted copy of specific columns along with pointers to the actual table rows. Think of it as a roadmap that helps the database skip irrelevant data and jump directly to what you need.
How Indexes Are Stored: B-Tree vs Hash
B-Tree Index (Most Common)
PostgreSQL and MySQL primarily use B-Tree (Balanced Tree) indexes. Here's how they work:
[50]
/ \
[25] [75]
/ \ / \
[10] [40] [60] [90]
| | | |
Data Data Data Data
Key characteristics:
- Data is stored in a sorted, hierarchical tree structure
- Each node contains multiple keys and pointers
- Searches take O(log n) time—incredibly fast even with millions of rows
- Perfect for range queries:
WHERE price BETWEEN 100 AND 500 - Supports sorting:
ORDER BY created_at DESC
When you create an index on the email column:
CREATE INDEX idx_users_email ON users(email);
PostgreSQL builds a B-Tree where:
- Email values are sorted alphabetically
- Each leaf node points to the actual table row
- The tree automatically balances itself for optimal search performance
Hash Index (Specialized Use)
Hash indexes use a hash function to map values to buckets:
email: "john@example.com" → Hash Function → Bucket #47 → Row pointer
Characteristics:
- Lightning-fast for exact matches:
WHERE email = 'john@example.com' - Useless for range queries or sorting
- Rarely used in practice (B-Tree handles most cases better)
How Indexes Speed Up Queries: The Execution Plan
Let's see indexing in action with a real example.
Without Index (Full Table Scan)
-- Table: users (1,000,000 rows)
SELECT * FROM users WHERE email = 'john@example.com';
Database thinks:
- "I need to check every single row"
- Scans all 1,000,000 rows
- Finds 1 matching row
- Time: ~500ms
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'john@example.com';
-- Result:
Seq Scan on users (cost=0.00..25833.00 rows=1 width=100) (actual time=450.234..450.234 rows=1)
Filter: (email = 'john@example.com'::text)
Rows Removed by Filter: 999999
Planning Time: 0.123 ms
Execution Time: 500.456 ms
With Index (Index Scan)
CREATE INDEX idx_users_email ON users(email);
SELECT * FROM users WHERE email = 'john@example.com';
Database thinks:
- "I have an index on email! Let me use it"
- Jumps to 'john@example.com' in the B-Tree (3-4 hops maximum)
- Follows pointer directly to the matching row
- Time: ~0.5ms (1000x faster!)
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'john@example.com';
-- Result:
Index Scan using idx_users_email on users (cost=0.42..8.44 rows=1 width=100) (actual time=0.234..0.235 rows=1)
Index Cond: (email = 'john@example.com'::text)
Planning Time: 0.089 ms
Execution Time: 0.456 ms
The index transformed a 500ms query into a sub-millisecond operation!
The Golden Rule: Index High-Cardinality Columns
Cardinality = the number of unique values in a column.
High Cardinality = Fast Queries (Index These!)
Examples:
- Email addresses (almost all unique)
- User IDs (completely unique)
- Phone numbers (mostly unique)
- Order numbers (unique)
Why they're fast:
-- Index on email: 1,000,000 rows, 999,000 unique values
SELECT * FROM users WHERE email = 'alice@example.com';
-- Returns: 1 row (index narrows search to 0.0001% of data)
The index eliminates 99.9999% of rows instantly. The database reads the index (a few KB) instead of scanning the entire table (potentially GB).
Low Cardinality = Slow Queries (DON'T Index These!)
Examples:
- Status columns: "pending", "completed", "cancelled" (3 unique values)
- Gender: "male", "female", "other" (3 unique values)
- Boolean flags: true/false (2 unique values)
- Country codes: ~200 unique values in a 10M row table
Why they're slow:
-- Index on status: 1,000,000 orders, only 3 unique values
SELECT * FROM orders WHERE status = 'pending';
-- Returns: ~333,333 rows (33% of table)
What happens under the hood:
- Database checks index: finds 333,333 row pointers
- Reads index blocks (expensive I/O)
- For each pointer, performs random access to table (333,333 random I/Os!)
- Result: SLOWER than a sequential table scan
A sequential scan reads data in order (faster disk I/O), while index lookups cause random disk seeks. When you're retrieving 30%+ of a table, sequential wins.
The Breaking Point: When Indexes Hurt Performance
Problem 1: Low Selectivity
-- BAD INDEX: Only 3 possible values
CREATE INDEX idx_orders_status ON orders(status);
-- This query becomes SLOWER with the index
SELECT * FROM orders WHERE status = 'pending'; -- Returns 40% of rows
-- Solution: Don't index! Let database do full scan
DROP INDEX idx_orders_status;
Rule of thumb: If your query returns more than 5-10% of table rows, an index on that column will likely hurt performance.
Problem 2: Index Overhead on Writes
Every index you create:
- Increases INSERT time (database must update index)
- Increases UPDATE time (if indexed columns change)
- Increases DELETE time (database must remove from index)
- Consumes storage space (indexes can be 20-50% of table size)
-- Table with 5 indexes
INSERT INTO users (name, email, phone, country, status) VALUES (...);
-- Database must:
-- 1. Insert row into table
-- 2. Update index on name
-- 3. Update index on email
-- 4. Update index on phone
-- 5. Update index on country
-- 6. Update index on status
-- Result: INSERT is 6x slower!
Problem 3: Index Bloat
Indexes can become fragmented over time, especially with frequent updates:
-- Check index bloat (PostgreSQL)
SELECT schemaname, tablename, indexname,
pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;
-- Rebuild bloated indexes
REINDEX INDEX idx_users_email;
The Right Way: Smart Indexing Strategy
✅ DO Index These:
- Primary Keys (automatic in most databases)
- Foreign Keys (for JOIN performance)
CREATE INDEX idx_orders_user_id ON orders(user_id);
- Frequently Queried High-Cardinality Columns
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_products_sku ON products(sku);
- Columns in WHERE, JOIN, ORDER BY clauses (if high-cardinality)
CREATE INDEX idx_orders_created_at ON orders(created_at);
- Composite Indexes for multi-column queries
-- Query: WHERE user_id = ? AND status = ?
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
❌ DON'T Index These:
- Low-Cardinality Columns
-- BAD: Only 3-5 possible values
❌ CREATE INDEX idx_users_status ON users(status);
❌ CREATE INDEX idx_products_is_active ON products(is_active);
Small Tables (< 1000 rows—full scan is faster)
Columns Rarely Used in Queries
Frequently Updated Columns (unless absolutely necessary)
Measuring Impact: Before and After
Always measure! Use EXPLAIN ANALYZE:
-- Before indexing
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 12345;
-- Seq Scan: 245ms
-- Create index
CREATE INDEX idx_orders_user_id ON orders(user_id);
-- After indexing
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 12345;
-- Index Scan: 0.8ms (300x faster!)
Conclusion: Index Smartly, Not Blindly
Database indexing is a powerful optimization tool, but only when used correctly:
- Index high-cardinality columns (email, IDs, unique values)
- Avoid indexing low-cardinality columns (status, boolean, small enums)
- Measure with EXPLAIN ANALYZE before and after
- Monitor index usage and remove unused indexes
- Remember: Every index speeds reads but slows writes
The goal isn't to index everything—it's to index strategically. Your database will thank you with blazing-fast queries and efficient resource usage.
Did this help you optimize your database? 👏 Clap if you learned something new!
Want more performance tips? 🔔 Follow me for deep dives into database optimization, backend architecture, and real-world engineering lessons.
Know someone struggling with slow queries? 📤 Share this article and help them speed up their database!
Drop your indexing war stories in the comments—I'd love to hear what worked (or didn't work) for you! 💬
Tags: #Database #PostgreSQL #MySQL #Performance #Indexing #BackendDevelopment #SQL #DatabaseOptimization #Engineering
Top comments (0)