DEV Community

Mugiil .B
Mugiil .B

Posted on

Indexing, Hashing

Indexing, Hashing & Query I/O in DBMS

Efficient data retrieval is one of the most important goals in any database system.
When we query a table, the DBMS must decide how to find the required data — scanning the entire table is slow.

That’s where Indexing, Hashing, and Query I/O optimization come into play.

📚 1️⃣ Indexing in DBMS
💡 What is an Index?

An index is a data structure that improves the speed of data retrieval operations on a table — similar to how an index in a book helps you find topics quickly.

Without an index, the DBMS must perform a full table scan, checking every row.
With an index, it can jump directly to the matching record.

🧱 Types of Indexes
Type Description
Primary Index Built on the primary key; records are stored in sorted order.
Secondary Index Created on non-primary attributes for faster lookup.
Clustered Index Reorders the actual data to match the index.
Non-Clustered Index Keeps a separate structure pointing to the actual data.
Dense Index Every record has an entry.
Sparse Index Only some records have entries (less space, more traversal).
🧠 Example:
CREATE INDEX idx_name
ON Employees (name);

Now, SELECT * FROM Employees WHERE name = 'John';
will use the index to find results faster 🚀

🧩 2️⃣ Hashing in DBMS

Hashing is another data access method — instead of sorting and searching, it uses a hash function to compute the location of data directly.

⚡ How it Works:
Hash Function → Hash(Key) = Address

Each key is converted into an address (or bucket) where the record is stored.

🧱 Example:

If Hash(101) → 5, record with key 101 will be stored in bucket 5.

🔹 Advantages

Very fast access for equality searches (e.g. WHERE id = 101).

No need to traverse indexes or sort data.

🔹 Disadvantages

Not efficient for range queries (BETWEEN, <, >, etc.)

May cause collisions (different keys map to same bucket).

⚙️ Collision Handling Techniques

Open Addressing — find another empty slot.

Chaining — use a linked list for multiple keys in the same bucket.

💾 3️⃣ Query I/O (Input/Output)

When a query runs, the DBMS spends most of its time performing I/O operations — reading and writing data pages from disk.
Optimizing I/O is key to improving performance.

🔍 Query I/O Workflow

Parse & validate SQL query.

Use the optimizer to choose the best plan (index scan, hash join, etc.).

Fetch data pages into buffer cache.

Return the result to the user.

🔧 Ways to Optimize Query I/O

Use appropriate indexes on frequently searched columns.

*Avoid SELECT ** (fetch only needed columns).

Use joins carefully — prefer indexed joins.

Partition large tables for faster access.

Analyze query plans (EXPLAIN in SQL).

🧾 Quick Summary
Concept Description Use Case
Indexing Sorted lookup structure for fast search Range queries
Hashing Direct address computation Equality search
Query I/O Disk operations during query execution Performance tuning

💡 Takeaway:
Indexes and hashing make searches lightning-fast ⚡, while efficient I/O management keeps your queries scalable and optimized. Together, they’re the core of any high-performance DBMS.




Top comments (0)