DEV Community

Mugiil .B
Mugiil .B

Posted on

Indexing, Hashing

Indexing, Hashing & Query I/O in DBMS

Efficient data retrieval is one of the most important goals in any database system.
When we query a table, the DBMS must decide how to find the required data β€” scanning the entire table is slow.

That’s where Indexing, Hashing, and Query I/O optimization come into play.

πŸ“š 1️⃣ Indexing in DBMS
πŸ’‘ What is an Index?

An index is a data structure that improves the speed of data retrieval operations on a table β€” similar to how an index in a book helps you find topics quickly.

Without an index, the DBMS must perform a full table scan, checking every row.
With an index, it can jump directly to the matching record.

🧱 Types of Indexes
Type Description
Primary Index Built on the primary key; records are stored in sorted order.
Secondary Index Created on non-primary attributes for faster lookup.
Clustered Index Reorders the actual data to match the index.
Non-Clustered Index Keeps a separate structure pointing to the actual data.
Dense Index Every record has an entry.
Sparse Index Only some records have entries (less space, more traversal).
🧠 Example:
CREATE INDEX idx_name
ON Employees (name);

Now, SELECT * FROM Employees WHERE name = 'John';
will use the index to find results faster πŸš€

🧩 2️⃣ Hashing in DBMS

Hashing is another data access method β€” instead of sorting and searching, it uses a hash function to compute the location of data directly.

⚑ How it Works:
Hash Function β†’ Hash(Key) = Address

Each key is converted into an address (or bucket) where the record is stored.

🧱 Example:

If Hash(101) β†’ 5, record with key 101 will be stored in bucket 5.

πŸ”Ή Advantages

Very fast access for equality searches (e.g. WHERE id = 101).

No need to traverse indexes or sort data.

πŸ”Ή Disadvantages

Not efficient for range queries (BETWEEN, <, >, etc.)

May cause collisions (different keys map to same bucket).

βš™οΈ Collision Handling Techniques

Open Addressing β€” find another empty slot.

Chaining β€” use a linked list for multiple keys in the same bucket.

πŸ’Ύ 3️⃣ Query I/O (Input/Output)

When a query runs, the DBMS spends most of its time performing I/O operations β€” reading and writing data pages from disk.
Optimizing I/O is key to improving performance.

πŸ” Query I/O Workflow

Parse & validate SQL query.

Use the optimizer to choose the best plan (index scan, hash join, etc.).

Fetch data pages into buffer cache.

Return the result to the user.

πŸ”§ Ways to Optimize Query I/O

Use appropriate indexes on frequently searched columns.

*Avoid SELECT ** (fetch only needed columns).

Use joins carefully β€” prefer indexed joins.

Partition large tables for faster access.

Analyze query plans (EXPLAIN in SQL).

🧾 Quick Summary
Concept Description Use Case
Indexing Sorted lookup structure for fast search Range queries
Hashing Direct address computation Equality search
Query I/O Disk operations during query execution Performance tuning

πŸ’‘ Takeaway:
Indexes and hashing make searches lightning-fast ⚑, while efficient I/O management keeps your queries scalable and optimized. Together, they’re the core of any high-performance DBMS.




Top comments (0)