Cuckoo Hashing: An Efficient Collision Resolution Technique with O(1) Lookup Time

#systemdesign #hashing #software #programmers

Overview

Cuckoo Hashing is an advanced technique used to resolve hash collisions efficiently. Unlike traditional hashing methods, which may degrade to O(n) in the worst case, Cuckoo Hashing ensures a worst-case time complexity of O(1) for lookups, insertions, and deletions.

Concept of Cuckoo Hashing

Cuckoo Hashing employs a "kick-out" strategy when a collision occurs during insertion. It utilizes two separate hash tables (h1 and h2) and two hash functions (f1(x) and f2(x)). When inserting a data element d1 into h1, if a collision occurs, the existing element in h1 is displaced (kicked out) and moved to h2 based on the second hash function. This process repeats between h1 and h2 until an empty position is found.

To prevent infinite loops during insertion, a predefined constant (max_hash_loop) is used to limit reinsertions. If this limit is reached, the hash tables are resized, and a rehashing operation is performed to accommodate new elements while maintaining O(1) insertion time complexity.

Key Features:

Utilizes two hash tables for efficient collision resolution.
Employs two distinct hash functions to determine data placement.
Supports fast insertion, search, and deletion operations.
Implements a rehashing mechanism to handle infinite insertion loops.

Implementation of Cuckoo Hashing in C++

#include <iostream>
#include <vector>
#include <cstdlib>
#include <ctime>

using namespace std;

#define TABLE_SIZE 11  // Initial hash table size
#define MAX_REHASH 10  // Maximum reinsertion attempts before rehashing

class CuckooHashTable {
private:
    vector<int> table1, table2;
    vector<bool> occupied1, occupied2;

    int hash1(int key) { return key % TABLE_SIZE; }
    int hash2(int key) { return (key / TABLE_SIZE) % TABLE_SIZE; }

    void rehash() {
        cout << "Rehashing required!" << endl;
        vector<int> oldTable1 = table1, oldTable2 = table2;
        vector<bool> oldOccupied1 = occupied1, oldOccupied2 = occupied2;

        int newSize = TABLE_SIZE * 2;
        table1.assign(newSize, -1);
        table2.assign(newSize, -1);
        occupied1.assign(newSize, false);
        occupied2.assign(newSize, false);

        for (int i = 0; i < oldTable1.size(); i++) {
            if (oldOccupied1[i]) insert(oldTable1[i]);
            if (oldOccupied2[i]) insert(oldTable2[i]);
        }
    }

public:
    CuckooHashTable() {
        table1.assign(TABLE_SIZE, -1);
        table2.assign(TABLE_SIZE, -1);
        occupied1.assign(TABLE_SIZE, false);
        occupied2.assign(TABLE_SIZE, false);
    }

    void insert(int key) {
        int pos1 = hash1(key);
        int pos2 = hash2(key);
        int loopCounter = 0;
        int currKey = key;

        while (loopCounter < MAX_REHASH) {
            if (!occupied1[pos1]) {
                table1[pos1] = currKey;
                occupied1[pos1] = true;
                return;
            }
            swap(currKey, table1[pos1]);

            if (!occupied2[pos2]) {
                table2[pos2] = currKey;
                occupied2[pos2] = true;
                return;
            }
            swap(currKey, table2[pos2]);

            pos1 = hash1(currKey);
            pos2 = hash2(currKey);
            loopCounter++;
        }

        rehash();
        insert(currKey);
    }

    bool search(int key) {
        return (occupied1[hash1(key)] && table1[hash1(key)] == key) ||
               (occupied2[hash2(key)] && table2[hash2(key)] == key);
    }

    void remove(int key) {
        if (occupied1[hash1(key)] && table1[hash1(key)] == key) {
            occupied1[hash1(key)] = false;
            table1[hash1(key)] = -1;
            return;
        }
        if (occupied2[hash2(key)] && table2[hash2(key)] == key) {
            occupied2[hash2(key)] = false;
            table2[hash2(key)] = -1;
            return;
        }
        cout << "Key not found!" << endl;
    }

    void display() {
        cout << "Table 1: ";
        for (int i = 0; i < TABLE_SIZE; i++)
            cout << (occupied1[i] ? to_string(table1[i]) : "-") << " ";
        cout << "\nTable 2: ";
        for (int i = 0; i < TABLE_SIZE; i++)
            cout << (occupied2[i] ? to_string(table2[i]) : "-") << " ";
        cout << "\n";
    }
};

int main() {
    CuckooHashTable hashTable;
    hashTable.insert(10);
    hashTable.insert(20);
    hashTable.insert(30);
    hashTable.insert(25);
    hashTable.insert(35);
    hashTable.insert(40);
    hashTable.insert(50);

    hashTable.display();

    cout << "Searching 25: " << (hashTable.search(25) ? "Found" : "Not Found") << endl;
    cout << "Searching 100: " << (hashTable.search(100) ? "Found" : "Not Found") << endl;

    hashTable.remove(30);
    hashTable.display();
    return 0;
}

Explanation:

Data Structure Overview:

Two separate hash tables (table1 and table2).
Boolean vectors (occupied1 and occupied2) track occupied slots.

Hash Functions:

hash1(key) = key % TABLE_SIZE
hash2(key) = (key / TABLE_SIZE) % TABLE_SIZE

Insertion Process:

Try inserting in table1. If occupied, swap with the existing element.
Try inserting the displaced element in table2. If occupied, continue swapping.
If a reinsertion loop is detected (MAX_REHASH reached), rehashing is triggered.

Search and Deletion:

Searching checks both tables.
Deletion marks the slot as unoccupied.

Conclusion

Cuckoo Hashing provides an efficient solution for resolving hash collisions with a guaranteed worst-case O(1) lookup time. By using two hash tables and a kick-out mechanism, it ensures high performance and avoids long probing sequences. The rehashing mechanism allows dynamic resizing when insertion loops occur, maintaining efficiency even in dynamic workloads. This method is particularly useful in applications where quick access time is critical, such as caches, networking applications, and real-time systems.

References:

https://en.wikipedia.org/wiki/Cuckoo_hashing

Top comments (1)

Ravindra Kumar • Feb 25 '25

Good !