sizan mahmud0

Posted on Oct 23

Database Indexing Done Right: Why Your "Optimized" Queries Might Be Slower Than Before

#postgres #mysql #indexing #sql

The Indexing Paradox That Cost Me 3 Seconds Per Query

I learned the hard way that adding indexes doesn't always make queries faster. After indexing a status column in my PostgreSQL database, I watched in horror as my query time jumped from 50ms to 3 seconds. What went wrong?

The truth is: wrong indexing is worse than no indexing at all. Today, I'll show you exactly when to index, when not to index, and how databases actually use your indexes under the hood.

What Is Database Indexing? The Library Analogy

Imagine searching for a book in a library with 10 million books arranged randomly. You'd have to check every single book—that's a full table scan in database terms.

Now imagine the same library with a card catalog organized alphabetically by author. You can find any book in seconds by looking up the author first. That's an index.

A database index is a separate data structure that stores a sorted copy of specific columns along with pointers to the actual table rows. Think of it as a roadmap that helps the database skip irrelevant data and jump directly to what you need.

How Indexes Are Stored: B-Tree vs Hash

B-Tree Index (Most Common)

PostgreSQL and MySQL primarily use B-Tree (Balanced Tree) indexes. Here's how they work:

             [50]
            /    \
        [25]      [75]
       /   \      /   \
    [10] [40]  [60] [90]
     |    |     |    |
   Data Data  Data Data

Key characteristics:

Data is stored in a sorted, hierarchical tree structure
Each node contains multiple keys and pointers
Searches take O(log n) time—incredibly fast even with millions of rows
Perfect for range queries: WHERE price BETWEEN 100 AND 500
Supports sorting: ORDER BY created_at DESC

When you create an index on the email column:

CREATE INDEX idx_users_email ON users(email);

PostgreSQL builds a B-Tree where:

Email values are sorted alphabetically
Each leaf node points to the actual table row
The tree automatically balances itself for optimal search performance

Hash Index (Specialized Use)

Hash indexes use a hash function to map values to buckets:

email: "john@example.com" → Hash Function → Bucket #47 → Row pointer

Characteristics:

Lightning-fast for exact matches: WHERE email = 'john@example.com'
Useless for range queries or sorting
Rarely used in practice (B-Tree handles most cases better)

How Indexes Speed Up Queries: The Execution Plan

Let's see indexing in action with a real example.

Without Index (Full Table Scan)

-- Table: users (1,000,000 rows)
SELECT * FROM users WHERE email = 'john@example.com';

Database thinks:

"I need to check every single row"
Scans all 1,000,000 rows
Finds 1 matching row
Time: ~500ms

EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'john@example.com';

-- Result:
Seq Scan on users  (cost=0.00..25833.00 rows=1 width=100) (actual time=450.234..450.234 rows=1)
  Filter: (email = 'john@example.com'::text)
  Rows Removed by Filter: 999999
Planning Time: 0.123 ms
Execution Time: 500.456 ms

With Index (Index Scan)

CREATE INDEX idx_users_email ON users(email);

SELECT * FROM users WHERE email = 'john@example.com';

Database thinks:

"I have an index on email! Let me use it"
Jumps to 'john@example.com' in the B-Tree (3-4 hops maximum)
Follows pointer directly to the matching row
Time: ~0.5ms (1000x faster!)

EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'john@example.com';

-- Result:
Index Scan using idx_users_email on users  (cost=0.42..8.44 rows=1 width=100) (actual time=0.234..0.235 rows=1)
  Index Cond: (email = 'john@example.com'::text)
Planning Time: 0.089 ms
Execution Time: 0.456 ms

The index transformed a 500ms query into a sub-millisecond operation!

The Golden Rule: Index High-Cardinality Columns

Cardinality = the number of unique values in a column.

High Cardinality = Fast Queries (Index These!)

Examples:

Email addresses (almost all unique)
User IDs (completely unique)
Phone numbers (mostly unique)
Order numbers (unique)

Why they're fast:

-- Index on email: 1,000,000 rows, 999,000 unique values
SELECT * FROM users WHERE email = 'alice@example.com';
-- Returns: 1 row (index narrows search to 0.0001% of data)

The index eliminates 99.9999% of rows instantly. The database reads the index (a few KB) instead of scanning the entire table (potentially GB).

Low Cardinality = Slow Queries (DON'T Index These!)

Examples:

Status columns: "pending", "completed", "cancelled" (3 unique values)
Gender: "male", "female", "other" (3 unique values)
Boolean flags: true/false (2 unique values)
Country codes: ~200 unique values in a 10M row table

Why they're slow:

-- Index on status: 1,000,000 orders, only 3 unique values
SELECT * FROM orders WHERE status = 'pending';
-- Returns: ~333,333 rows (33% of table)

What happens under the hood:

Database checks index: finds 333,333 row pointers
Reads index blocks (expensive I/O)
For each pointer, performs random access to table (333,333 random I/Os!)
Result: SLOWER than a sequential table scan

A sequential scan reads data in order (faster disk I/O), while index lookups cause random disk seeks. When you're retrieving 30%+ of a table, sequential wins.

The Breaking Point: When Indexes Hurt Performance

Problem 1: Low Selectivity

-- BAD INDEX: Only 3 possible values
CREATE INDEX idx_orders_status ON orders(status);

-- This query becomes SLOWER with the index
SELECT * FROM orders WHERE status = 'pending';  -- Returns 40% of rows

-- Solution: Don't index! Let database do full scan
DROP INDEX idx_orders_status;

Rule of thumb: If your query returns more than 5-10% of table rows, an index on that column will likely hurt performance.

Problem 2: Index Overhead on Writes

Every index you create:

Increases INSERT time (database must update index)
Increases UPDATE time (if indexed columns change)
Increases DELETE time (database must remove from index)
Consumes storage space (indexes can be 20-50% of table size)

-- Table with 5 indexes
INSERT INTO users (name, email, phone, country, status) VALUES (...);

-- Database must:
-- 1. Insert row into table
-- 2. Update index on name
-- 3. Update index on email
-- 4. Update index on phone
-- 5. Update index on country
-- 6. Update index on status
-- Result: INSERT is 6x slower!

Problem 3: Index Bloat

Indexes can become fragmented over time, especially with frequent updates:

-- Check index bloat (PostgreSQL)
SELECT schemaname, tablename, indexname, 
       pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;

-- Rebuild bloated indexes
REINDEX INDEX idx_users_email;

The Right Way: Smart Indexing Strategy

✅ DO Index These:

Primary Keys (automatic in most databases)
Foreign Keys (for JOIN performance)

CREATE INDEX idx_orders_user_id ON orders(user_id);

Frequently Queried High-Cardinality Columns

CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_products_sku ON products(sku);

Columns in WHERE, JOIN, ORDER BY clauses (if high-cardinality)

CREATE INDEX idx_orders_created_at ON orders(created_at);

Composite Indexes for multi-column queries

-- Query: WHERE user_id = ? AND status = ?
CREATE INDEX idx_orders_user_status ON orders(user_id, status);

❌ DON'T Index These:

Low-Cardinality Columns

-- BAD: Only 3-5 possible values
❌ CREATE INDEX idx_users_status ON users(status);
❌ CREATE INDEX idx_products_is_active ON products(is_active);

Small Tables (< 1000 rows—full scan is faster)
Columns Rarely Used in Queries
Frequently Updated Columns (unless absolutely necessary)

Measuring Impact: Before and After

Always measure! Use EXPLAIN ANALYZE:

-- Before indexing
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 12345;
-- Seq Scan: 245ms

-- Create index
CREATE INDEX idx_orders_user_id ON orders(user_id);

-- After indexing
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 12345;
-- Index Scan: 0.8ms (300x faster!)

Conclusion: Index Smartly, Not Blindly

Database indexing is a powerful optimization tool, but only when used correctly:

Index high-cardinality columns (email, IDs, unique values)
Avoid indexing low-cardinality columns (status, boolean, small enums)
Measure with EXPLAIN ANALYZE before and after
Monitor index usage and remove unused indexes
Remember: Every index speeds reads but slows writes

The goal isn't to index everything—it's to index strategically. Your database will thank you with blazing-fast queries and efficient resource usage.

Did this help you optimize your database? 👏 Clap if you learned something new!

Want more performance tips? 🔔 Follow me for deep dives into database optimization, backend architecture, and real-world engineering lessons.

Know someone struggling with slow queries? 📤 Share this article and help them speed up their database!

Drop your indexing war stories in the comments—I'd love to hear what worked (or didn't work) for you! 💬

Tags: #Database #PostgreSQL #MySQL #Performance #Indexing #BackendDevelopment #SQL #DatabaseOptimization #Engineering

DEV Community