Saifulhaq S

Posted on Oct 8

Indexing, Hashing & Query Optimization in Databases

#algorithms #database #performance

When working with databases, simply storing data isn’t enough — efficient retrieval is essential. As your dataset scales to thousands or millions of entries, poorly optimized queries can slow down your system. Techniques like indexing and hashing are crucial tools to speed up lookups, just like the index section of a book helps you find topics quickly.

Let’s break down how these work, the differences between them, and when to use each.

⸻

What Is an Index?

An index is a specialized data structure that accelerates access to rows in a database table. Rather than scanning every record to satisfy a query, the database can use the index to “jump” directly to likely matches.

You can think of it like the index pages in a textbook: they don’t contain the whole content, but lead you to exactly where certain topics are discussed.

Types of Indexes
• Primary index: Automatically generated on the primary key column.
• Secondary index: Created by the user (or DBA) on non-primary columns to speed up queries.
• Clustering index: Controls how data is physically ordered on storage media.
• Non-clustering (or non-clustered) index: A separate data structure that points to the actual data without rearranging it.

B-Tree and B+Tree Indexing

Modern databases often use B-Tree or B+Tree structures:
• In B-Tree, both internal and leaf nodes may hold keys and pointers.
• In B+Tree, internal nodes only keep keys, and all actual data pointers reside in leaf nodes.
• Leaf nodes in a B+Tree are typically linked in sequence, making range queries (e.g. “between”) efficient.

These tree structures balance depth and breadth so that lookup, insert, and delete operations remain comparatively fast.

⸻

Hash Indexing

A hash index works differently. It applies a hash function to the key value and uses the resulting hash to determine which “bucket” the record belongs to.
• Strengths: Ideal for exact-match queries like column = constant.
• Weaknesses: Poor for range-based queries (BETWEEN, <, >, etc.) and sorting operations.

Because hashing effectively scatters data across buckets without inherent order, it’s not useful when query patterns involve ordering or searching within ranges.

❌ Not good for:
Range-based queries or sorting operations (like BETWEEN, <, >).

📘 Use Case: When your application frequently runs exact-match lookups.

🧾 Example: Students Table

Let’s create a sample table to see how indexing helps:

CREATE TABLE Students (
roll_no INT PRIMARY KEY,
name VARCHAR(100),
age INT,
grade CHAR(1)
);

Step 1: Create Indexes
-- B-Tree index (default)
CREATE INDEX idx_roll_btree ON Students (roll_no);

-- Hash index (if supported by your DBMS)
CREATE INDEX idx_roll_hash ON Students USING HASH (roll_no);

Step 2: Run Queries
-- Equality check (best for hash or B-tree)
SELECT * FROM Students WHERE roll_no = 50;

-- Range query (best for B-tree or B+Tree)
SELECT * FROM Students WHERE roll_no BETWEEN 10 AND 100;

👉 The B-Tree index handles both cases efficiently,
while the hash index excels only in equality lookups.

📊 When to Use Which Index
Use Case Best Index Reason
Equality lookups (=) Hash or B-Tree Hash is fastest for exact matches
Range queries B-Tree / B+Tree Maintains sorted order
Sequential access B+Tree Linked leaf nodes improve performance
Memory optimization Minimal indexing Too many indexes slow down inserts/updates
⚠️ Important Considerations

Storage Overhead: Every index consumes additional space.

Write Performance: More indexes = slower INSERT, UPDATE, DELETE.

Low-Cardinality Columns: Avoid indexing columns with few unique values (e.g., gender, status).

Maintenance: Indexes can fragment over time and may need rebuilding.

🧠 Query Optimization

Indexes are one of the most effective tools for query optimization.
But you can combine them with:

Query planning: Use EXPLAIN to analyze how queries execute.

Proper filtering: Avoid SELECT *; fetch only what’s needed.

Composite indexes: Combine multiple columns in one index for common query patterns.

✅ Summary
Concept Description
Indexing Data structure for faster lookups
B-Tree / B+Tree Supports ordering and range queries
Hash Indexing Best for equality checks
Query Optimization Uses indexes and execution plans for efficiency
🚀 Final Thoughts

Efficient indexing and query design are what make large-scale applications fast and reliable.
Understanding how and when to use B-Tree or Hash indexes can significantly improve your database performance.

Start small — analyze your queries, create the right indexes, and monitor performance.
A few thoughtful indexes can turn your slowest queries into instant results.

DEV Community

Indexing, Hashing & Query Optimization in Databases

Top comments (0)