Prayush Adhikari

Posted on Nov 12

Database Indexing Explained Like You're 5 (Then Like You're a Senior Dev)

#systemdesign #beginners #database #tutorial

Hey there! So you want to understand database indexing? Cool. Let me break this down in a way that actually makes sense, starting from the absolute basics and working our way up to the stuff that'll make you look like a genius in your next code review.

The 5-Year-Old Explanation

Imagine you have a massive book with 10,000 pages about different animals. No table of contents, no organization, just random animals on random pages.

Now, your teacher asks: "Find me information about elephants."

What do you do? You flip through every single page until you find elephants. Page 1, nope. Page 2, nope. Page 3, nope... this is going to take forever, right?

That's your database without indexes.

Now imagine the same book, but at the front, there's a special page that says:

Elephants: Page 4,521
Lions: Page 892
Penguins: Page 7,234

Boom. You jump straight to page 4,521. Found it in seconds.

That's an index.

Simple, right? Now let's get into the real stuff.

The Real World Problem

Let's say you're building an app. Could be anything, a blog, an e-commerce site, whatever. You've got a users table with 1 million records. Someone tries to log in, and you run this query:

SELECT * FROM users WHERE email = 'prayush@example.com';

Without an index, your database does what's called a full table scan. It literally checks every single one of those 1 million rows. Every. Single. One.

On my machine, that query took 2.3 seconds without an index.

With an index? 0.003 seconds.

Yeah, you read that right. From 2.3 seconds to 3 milliseconds. That's not just fast, that's "your users don't rage-quit your app" fast.

How Indexes Actually Work (Getting Technical)

Alright, now we're getting into the good stuff. Let's talk about what's really happening under the hood.

The B-Tree Structure

Most databases use something called a B-Tree (Balanced Tree) for indexes. And no, despite what everyone thinks, the B doesn't stand for "binary." It actually stands for "balanced" or possibly "Bayer" (the guy who invented it).

Here's the deal: a B-Tree organizes your data in a tree structure where:

The root node contains pointers to other nodes
Intermediate nodes contain more pointers
Leaf nodes contain the actual data (or pointers to the actual data)

When you search for something, the database traverses this tree. Instead of checking 1 million rows, it might only need to check 3-4 nodes to find what it's looking for.

Let me show you what I mean with a real example from a project I worked on:

-- Creating a simple users table
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255),
    username VARCHAR(100),
    created_at TIMESTAMP
);

-- Insert a million records (not showing the full script here, trust me)
-- Now let's see the difference

Without index:

EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'random@email.com';
-- Seq Scan on users (cost=0.00..25833.00 rows=1 width=89) (actual time=2347.921..2347.922 rows=1 loops=1)

With index:

CREATE INDEX idx_users_email ON users(email);

EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'random@email.com';
-- Index Scan using idx_users_email on users (cost=0.42..8.44 rows=1 width=89) (actual time=0.038..0.039 rows=1 loops=1)

The numbers don't lie. The cost dropped from 25833 to 8.44. That's massive.

Types of Indexes (Because One Size Doesn't Fit All)

1. Single Column Index

The basic one we've been talking about:

CREATE INDEX idx_email ON users(email);

Use this when you're frequently searching by a single column.

2. Composite Index

Multiple columns in one index:

CREATE INDEX idx_email_username ON users(email, username);

Here's where it gets interesting. The order matters. This index is great for:

SELECT * FROM users WHERE email = 'x' AND username = 'y'; -- Fast
SELECT * FROM users WHERE email = 'x'; -- Also fast
SELECT * FROM users WHERE username = 'y'; -- Slow! Doesn't use the index

Why? Because the index is organized by email first, then username. It's like a phone book sorted by last name, then first name. You can't efficiently find someone by just their first name.

3. Unique Index

Enforces uniqueness AND speeds up queries:

CREATE UNIQUE INDEX idx_unique_email ON users(email);

Perfect for emails, usernames, or anything that should be unique.

4. Partial Index

This one's cool. Index only part of your data:

CREATE INDEX idx_active_users ON users(email) WHERE is_active = true;

If you only ever query active users, why index the inactive ones? Save space, save time.

5. Full-Text Index

For searching text content:

CREATE INDEX idx_post_content ON posts USING GIN(to_tsvector('english', content));

This is what makes your blog search actually work. I use this on my own projects and it's a game changer for search functionality.

The Dark Side: When Indexes Hurt You

Here's what nobody tells beginners: indexes aren't free. They come with a cost.

Storage Cost

Every index is basically a copy of your data in a different structure. More indexes = more disk space. I learned this the hard way when my database size doubled because I got index-happy.

Write Performance

Every time you INSERT, UPDATE, or DELETE, the database has to update all your indexes too.

Real talk: I once had a table with 12 indexes. Inserts were taking forever. Dropped it down to 4 strategic indexes, and suddenly my batch inserts went from 5 minutes to 30 seconds.

The Wrong Index Is Worse Than No Index

True story: I once created an index on a column with only 3 possible values (status: pending, active, inactive). The database was spending more time maintaining and reading the index than it would have just scanning the table.

Rule of thumb: If a column has low cardinality (not many unique values), skip the index.

Common Indexing Mistakes (Learn from My Pain)

Mistake 1: Indexing Everything

I did this. You'll probably do this too. Don't.

The Fix: Index based on your actual queries. Use EXPLAIN ANALYZE to see what's slow, then index specifically for that.

Mistake 2: Ignoring Index Order in Composite Indexes

Remember that email + username example? Put the most selective column first.

-- Bad: username has duplicates, email is unique
CREATE INDEX idx_bad ON users(username, email);

-- Good: email is unique, username has duplicates  
CREATE INDEX idx_good ON users(email, username);

Mistake 3: Not Analyzing Your Indexes

Indexes can get bloated over time. In PostgreSQL:

REINDEX TABLE users;

I do this monthly on my production databases. Keeps things running smooth.

Mistake 4: Forgetting About Covering Indexes

If your query only needs data that's in the index, the database doesn't even need to touch the main table.

CREATE INDEX idx_covering ON users(email, username, created_at);

-- This query only touches the index, super fast
SELECT email, username FROM users WHERE email = 'x';

Mistake 5: Over-Optimizing Too Early

I wasted weeks optimizing queries that ran once a day on 1000 rows. Focus on what matters: frequent queries on large tables.

Advanced Strategies (Senior Dev Territory)

Strategy 1: Index-Only Scans

Structure your indexes so queries can be answered without touching the table:

CREATE INDEX idx_posts_covering ON posts(user_id, created_at, title);

-- This query is lightning fast
SELECT title, created_at FROM posts WHERE user_id = 123 ORDER BY created_at DESC;

Strategy 2: Conditional Indexes with Expressions

You can index the result of a function:

CREATE INDEX idx_lower_email ON users(LOWER(email));

-- Now this is fast
SELECT * FROM users WHERE LOWER(email) = 'prayush@example.com';

Strategy 3: Index Hints (Use Sparingly)

Sometimes the query planner gets it wrong. In MySQL:

SELECT * FROM users USE INDEX (idx_email) WHERE email = 'x';

But honestly, if you need hints, you probably need better indexes.

Strategy 4: Monitoring Index Usage

Find unused indexes and kill them:

-- PostgreSQL
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0;

I do this every month. Found 15 unused indexes once. Dropped them all, freed up 2GB of space.

Real-World Example: My Blog Optimization

Let me show you something from an actual project. I have a blog posts table:

CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    user_id INTEGER,
    title VARCHAR(255),
    content TEXT,
    status VARCHAR(20),
    created_at TIMESTAMP,
    updated_at TIMESTAMP
);

Initial problem: Homepage query was taking 800ms to load recent posts.

SELECT * FROM posts 
WHERE status = 'published' 
ORDER BY created_at DESC 
LIMIT 10;

My indexing solution:

-- Partial index for published posts only
CREATE INDEX idx_published_posts ON posts(created_at DESC) 
WHERE status = 'published';

-- For author pages
CREATE INDEX idx_author_posts ON posts(user_id, created_at DESC)
WHERE status = 'published';

-- For full-text search
CREATE INDEX idx_post_search ON posts 
USING GIN(to_tsvector('english', title || ' ' || content));

Results:

Homepage: 800ms → 12ms
Author pages: 450ms → 8ms
Search: Actually works now (was timing out before)

Three strategic indexes. Not twelve random ones. That's the key.

The Practical Checklist

Here's what I do for every new table:

Primary key gets an index automatically (don't create one manually)
Foreign keys ALWAYS get indexed

   CREATE INDEX idx_posts_user_id ON posts(user_id);

Columns in WHERE clauses get indexed (if the query runs often)
Columns in ORDER BY get indexed (especially with LIMIT)
Run EXPLAIN ANALYZE on slow queries before creating indexes
Monitor index usage monthly and drop the dead ones

Tools I Actually Use

pgAdmin

For PostgreSQL, pgAdmin's query analyzer is solid. Shows you exactly what's happening.

MySQL Workbench

Same thing for MySQL. The visual EXPLAIN is chef's kiss.

Database Monitoring

I run a self-hosted Grafana dashboard that tracks:

Query execution times
Index usage stats
Table scan counts
Cache hit ratios

Want a tutorial on setting this up? Let me know because monitoring is half the battle.

The Bottom Line

Indexing isn't rocket science, but it's not something you can just wing either. Here's the real strategy:

Start without indexes (except primary keys and foreign keys)
Build your app, collect real usage data
Find the slow queries (EXPLAIN ANALYZE is your friend)
Add indexes strategically
Monitor and adjust
Don't over-optimize early

And remember: the best index is the one that solves a real performance problem. Not the one that looks cool in your schema.

Let's Connect

I'm Prayush Adhikari, grinding through computer engineering and breaking down complex topics into actually useful content. If database performance gets you excited (or frustrated), you're in the right place.

Got questions? Want me to cover something specific about databases? Drop a comment. I read every single one.

LinkedIn: adhikareeprayush
Twitter: codersburnout
GitHub: adhikareeprayush

Next up, I'm thinking about diving into database sharding or maybe query optimization techniques. What would you rather read about? Let me know.

Now go index something. But not everything.

DEV Community