What is Database Indexing by Hash Indexing approach?

Hash indexing is a database indexing technique that uses a hash function to compute an index value for each record in a table. The index value is used to locate the record quickly when a search is performed. In this answer, we will explain the hash indexing approach and provide an example of how it works.

Under the Hood Structure

The hash index structure consists of an array of buckets, each of which contains a linked list of records that have the same index value. The hash function takes the key value of a record and computes an index value, which is used to locate the bucket that contains the linked list of records with the same index value. When a record is inserted into the table, its key is hashed to determine the bucket where it should be stored. If there are no other records in the bucket with the same hash value, the record is inserted at the head of the linked list. If there are other records in the bucket with the same hash value, the new record is inserted at the end of the list.

Example

Suppose we have a table of employees with the following attributes:

EmployeeID
FirstName
LastName
Department
Salary

We want to create a hash index on the EmployeeID attribute. We start by choosing a hash function that takes an EmployeeID value and generates a unique hash code for that value. For example, we might use the following hash function:

hash(EmployeeID) = EmployeeID % 10

This function takes the EmployeeID value and returns the remainder when it is divided by 10. This gives us a hash code in the range 0-9.

Next, we create an array of 10 buckets to hold the records:

Bucket[0] -> null
Bucket[1] -> null
...
Bucket[9] -> null

When a record is inserted into the table, its EmployeeID value is hashed using the hash function to determine the bucket where it should be stored. For example, if the EmployeeID is 123456, the hash code would be 6 (hash(123456) = 123456 % 10 = 6), so the record would be stored in bucket 6.

If there are no other records in bucket 6, the new record is inserted at the head of the linked list:

Bucket[0] -> null
Bucket[1] -> null
...
Bucket[5] -> null
Bucket[6] -> [123456, John, Smith, Sales, 50000, null]
Bucket[7] -> null
...
Bucket[9] -> null

If there are other records in bucket 6, the new record is inserted at the end of the linked list:

Bucket[0] -> null
Bucket[1] -> null
...
Bucket[5] -> null
Bucket[6] -> [123456, John, Smith, Sales, 50000, null] -> [789012, Jane, Doe, Marketing, 60000, null]
Bucket[7] -> null
...
Bucket[9] -> null

To retrieve a record from the hash table, the EmployeeID value is hashed using the hash function to determine the bucket where the record should be stored. For example, if we want to retrieve the record with EmployeeID 123456, we hash the value to get a hash code of 6, then traverse the linked list in bucket 6 until we find the record with the matching EmployeeID:

hash(123456) = 6

Process

When a search is performed on a table that has a hash index, the search key is passed to the hash function, which computes an index value. The index value is used to locate the bucket that contains the linked list of records with the same index value. The linked list is then searched sequentially until the record with the matching key value is found.

Design

The design of a hash index depends on several factors, including the size of the table, the expected number of records, and the characteristics of the data.

The number of buckets is chosen to balance memory usage and search performance. Collisions occur when two or more records have the same index value, and are handled using chaining or open addressing.