Iniko

Posted on Oct 8

Indexing, Hashing & Query Optimization in DBMS

#programming #beginners #architecture #database

Databases don’t just store data — they must retrieve it efficiently. As table sizes grow into the thousands, millions, or more, naive queries (scanning every row) become too slow. That’s where indexing and hashing come in — they act like shortcuts into your data. Combined with smart query design, they dramatically speed up lookups.

Let’s break down how these concepts work, when to use them, and what trade-offs to watch out for.

What Is Indexing?

An index is a data structure built over a table (or one or more columns) to help the database find rows faster. Rather than scanning every row, the DBMS can use the index to jump directly to relevant records.

It’s analogous to the index at the back of a book — it doesn’t contain the full content, but pointers to where things live.

Types of Indexes

Primary Index: Automatically associated with the table’s primary key.

Secondary (or Secondary) Index: Manually created index on a non-key column to speed queries.

Clustering Index: Controls the physical order of data in the table (rows are stored in index order).

Non-Clustering Index: Separate structure; the table’s rows aren’t physically stored in index order, but index entries point to them.

Most relational databases use B-Tree or B+-Tree variants for indexes:

B-Tree: Internal nodes and leaf nodes both can store keys and pointers.

B+-Tree: Internal nodes only store keys (no direct data pointers); actual data pointers (or record references) live in the leaf nodes. Leaf nodes are often linked to each other, improving range scans (you can traverse leaf nodes sequentially).

Because of this design, B+-Tree indexes are especially good for range queries (e.g. BETWEEN, <, >) and sorted access.

Hash Indexing (or Hash-Based Indexing)

A hash index uses a hash function: a key is passed through the hash function, and it maps to a bucket (or slot). Records with that key go into that bucket (or chain). Lookups compute the hash and go directly to the appropriate bucket.

Strength: Excellent for equality searches (column = value).

Weakness: Poor for range queries or sorted operations (you can’t easily traverse “next bucket” in order).

So when your queries often ask “find rows matching this exact key?”, a hash index shines. But for queries like “find rows between A and B”, a B+-Tree style index is far better.

Examples: Students Table

Suppose you have:

CREATE TABLE Students (
roll_no INT PRIMARY KEY,
name VARCHAR(100),
age INT,
grade CHAR(1)
);

You might do:

-- Default B-Tree index (often automatic for primary key)
CREATE INDEX idx_roll_btree ON Students (roll_no);

-- If your database supports hash indexing:
CREATE INDEX idx_roll_hash ON Students USING HASH (roll_no);

Then:

SELECT * FROM Students WHERE roll_no = 50;
→ both B-Tree and hash index can perform well, but hash may be faster for exact match.

SELECT * FROM Students WHERE roll_no BETWEEN 10 AND 100;
→ B-Tree / B+-Tree index is much better suited, because entries are stored in sorted order.

Choosing Between B-Tree and Hash Index
Query Type Best Choice Reason
Exact match (=) Hash or B-Tree Hash is optimal for equality lookups
Range queries (BETWEEN, <, >) B-Tree / B+-Tree Maintains sorted order for efficient traversal
Sequential scans / ordered output B+-Tree Leaf nodes are linked, enabling fast in-order scanning

Other Considerations / Trade-Offs

Storage Overhead: Every index consumes additional disk/memory space.

Insert / Update / Delete Cost: More indexes = slower writes, because you must update all relevant indexes.

Low Cardinality Columns: Don’t index columns with very few distinct values (e.g. gender, status) — the benefit is minimal.

Index Fragmentation / Maintenance: Over time, indexes may become less efficient and need rebuilding, reorganization, or defragmentation.

Query Optimization & Index Strategy

Indexing is one of the most powerful levers for speeding up queries, but it must go hand-in-hand with how you write your SQL and how the database executes it.

Use the EXPLAIN (or EXPLAIN ANALYZE / EXPLAIN PLAN) command to see how your database is executing queries. It shows whether an index is being used, how many rows are being scanned, and more.

Avoid SELECT * — fetch only the columns you need (less I/O).

Use composite indexes (multi-column) when your queries often filter on multiple columns together (e.g. WHERE col1 = X AND col2 = Y).

Keep your indexes lean and tuned. Don’t over-index — too many indexes slows down writes and consumes space.

Final Thoughts

Indexing and hashing are essential tools for building performant database systems. They are not silver bullets but, when used smartly, can transform a slow system into a responsive one.

Start by analyzing your most frequent, slowest queries. Add or adjust indexes based on patterns. Measure impact with EXPLAIN and query profiling. Over time you'll develop instincts for which columns — and which kinds of indexing (B-Tree, hash, composite) — will yield the best performance.

DEV Community

Indexing, Hashing & Query Optimization in DBMS

Top comments (0)