DEV Community

Cover image for Understanding Hash Indexing in PostgreSQL: Hash (Part 3)
hammadsaleemm
hammadsaleemm

Posted on

Understanding Hash Indexing in PostgreSQL: Hash (Part 3)

The first article described PostgreSQL indexing engine, the second one dealt with the interface of access methods, and now we are ready to discuss specific types of indexes. Let's start with hash index.

When it comes to database indexing, PostgreSQL provides several options, one of which is hash indexing. In this article, we will discuss the general theory behind hash indexing, its structure, and how it works in PostgreSQL.

What is Hash Indexing?

A hash index is a type of database indexing technique that maps keys to values using a hash function. It is similar to a regular array, but instead of being indexed by an integer, it is indexed by a hash value. Hashing is a way to associate a small number with a value of any data type. In PostgreSQL, hash indexing is used for fast data access where keys are relatively small and the data set is static.

Hash Index Structure

The hash index in PostgreSQL is structured in the following way:

Meta Page: This is the first page in the hash index, containing information about what is inside the index.
Bucket Pages: The main pages of the index, storing data as "hash code - TID" pairs.
Overflow Pages: Structured the same way as bucket pages and used when one page is insufficient for a bucket.
Bitmap Pages: These pages keep track of overflow pages that are currently clear and can be reused for other buckets.

How Hash Indexing Works in PostgreSQL:

When inserting into the hash index, the hash function is computed for the key, which always returns the "integer" type in PostgreSQL, which is in the range of 232 ≈ 4 billion values. The number of buckets initially equals two and dynamically increases to adjust to the data size. The bucket number is computed from the hash code using bit arithmetic, and the TID is stored in the appropriate bucket.

When searching the hash index, the hash function is computed for the key, and the bucket number is obtained. The contents of the bucket are then searched for the matching TIDs with the appropriate hash codes. Since stored "hash code - TID" pairs are ordered, this is done efficiently.

However, collisions can occur where different keys have the same four-byte hash codes. In such cases, the access method asks the general indexing engine to verify each TID by rechecking the condition in the table row.

Mapping Data Structures to Pages

From the perspective of the buffer cache manager, all information and index rows must be packed into pages. In PostgreSQL hash indexing, four kinds of pages are used: meta pages, bucket pages, overflow pages, and bitmap pages.

When the index increases, PostgreSQL creates twice as many buckets and pages as were last created. Overflow pages are allocated as needed and are tracked in bitmap pages, which are also allocated as needed.

Conclusion

In conclusion, hash indexing is a useful technique for fast data access in static data sets with relatively small keys. PostgreSQL's hash indexing uses a hash function to map keys to values, and the data is stored in buckets. While collisions can occur, the index structure is designed to handle them efficiently. Understanding the structure and workings of hash indexing in PostgreSQL can help optimize database performance.

Top comments (0)