Hash tables are one of the most important data structures in computer science. They power everything from PHP's associative arrays to database indexes, and understanding how they work will make you a better developer. Let's dive deep! 🚀
Table of Contents
- What is a Hash Table?
- How Hash Tables Work
- Hash Functions
- Collision Resolution
- Load Factor and Amortized Analysis
- Complete PHP Implementation
- Time Complexity
- Real-World Applications
What is a Hash Table?
A hash table (also called hash map) is a data structure that implements an associative array - a structure that maps keys to values. Think of it as a super-fast lookup table.
Key characteristics:
- Average O(1) time complexity for insert, delete, and search
- Uses a hash function to compute an index into an array of buckets
- Handles collisions when multiple keys hash to the same index
<?php
// PHP's built-in associative array IS a hash table!
$userAges = [
"Alice" => 25,
"Bob" => 30,
"Charlie" => 35
];
echo $userAges["Alice"]; // 25 - O(1) lookup!
?>
How Hash Tables Work
The magic of hash tables comes from hashing - transforming a key into an array index.
The Process
1. Key → Hash Function → Hash Code → Index
2. Store/Retrieve value at that index
Visual Example
Key: "Alice"
Hash Function: hash("Alice") = 2458734
Index: 2458734 % 10 = 4
Array:
Index: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
Value: - - - - "Alice" - - - - -
Hash Functions
A good hash function should:
- Be deterministic - same input always produces same output
- Distribute uniformly - minimize collisions
- Be fast - O(1) computation time
- Minimize collisions - different inputs should produce different outputs
Simple Hash Function Example
<?php
function simpleHash($key, $tableSize) {
$hashValue = 0;
$len = strlen($key);
for ($i = 0; $i < $len; $i++) {
$hashValue += ord($key[$i]);
}
return $hashValue % $tableSize;
}
// Example
echo simpleHash("Alice", 10) . "\n"; // Output: 5
echo simpleHash("Bob", 10) . "\n"; // Output: 5 # Collision!
?>
Better Hash Function (Polynomial Rolling Hash)
<?php
function betterHash($key, $tableSize) {
$hashValue = 0;
$len = strlen($key);
for ($i = 0; $i < $len; $i++) {
$hashValue += ord($key[$i]) * pow(31, $i);
}
return abs($hashValue) % $tableSize;
}
echo betterHash("Alice", 10) . "\n"; // Different results
echo betterHash("Bob", 10) . "\n"; // Less likely to collide
?>
PHP's Built-in Hash Functions
<?php
// PHP has several built-in hash functions
echo crc32("Alice") . "\n"; // Fast, but collisions possible
echo md5("Alice") . "\n"; // 128-bit hash
echo sha1("Alice") . "\n"; // 160-bit hash
echo hash('sha256', "Alice"); // 256-bit hash
?>
Collision Resolution
No matter how good your hash function is, collisions will happen when two keys hash to the same index. There are two main strategies to handle this:
1. Chaining (Separate Chaining)
Each bucket contains an array (or linked list) of all entries that hash to that index.
Index Bucket (Array)
0 → []
1 → [["Alice", 25], ["Zoe", 28]]
2 → [["Bob", 30]]
3 → []
4 → [["Charlie", 35], ["Dave", 22]]
Visual Representation:
Put("Alice", 25) → hash("Alice") = 1
Put("Bob", 30) → hash("Bob") = 2
Put("Zoe", 28) → hash("Zoe") = 1 ← Collision! Add to bucket 1
Pros:
- Simple to implement
- No limit on load factor (can exceed 1.0)
- Deletion is straightforward
- Performance degrades gracefully
Cons:
- Extra memory for arrays/linked lists
- Poor cache performance
- Overhead per entry
2. Open Addressing
All entries are stored directly in the array. When a collision occurs, probe for the next available slot.
Linear Probing
Check the next slot, then the next, then the next...
Insert "Alice" at index 4, but it's occupied:
Try: index 4 → occupied
index 5 → occupied
index 6 → empty ✓ (insert here)
Pros:
- Memory efficient (no extra pointers)
- Cache-friendly (contiguous memory)
- Better performance at low load factors
Cons:
- Clustering - long sequences of occupied slots
- Load factor must stay < 1.0 (typically < 0.7)
- Deletion is complex (need tombstones)
- Performance degrades sharply as table fills
Load Factor and Amortized Analysis
The load factor (α) is the ratio of entries to buckets:
α = n / m
where n = number of entries
m = number of buckets
Why Does HashMap Stay O(1)?
Hash tables resize when load factor exceeds a threshold (typically 0.75). But resizing costs O(n) to rehash all entries. How is it still O(1)?
The answer: Amortized Analysis 📊
The Math
Let's trace insertions with initial capacity 4:
Insertions 1-3: O(1) each, no resize
Insertion 4: Triggers resize (3/4 = 0.75)
Cost: O(4) to rehash → new size 8
Insertions 5-6: O(1) each
Insertion 7: Triggers resize (6/8 = 0.75)
Cost: O(8) to rehash → new size 16
Insertions 8-12: O(1) each
Insertion 13: Triggers resize (12/16 = 0.75)
Cost: O(16) to rehash → new size 32
Total cost for n insertions:
- Regular insertions: n × O(1) = O(n)
- Resize costs: O(4) + O(8) + O(16) + ... + O(n)
- This is a geometric series: O(2n) = O(n)
- Total: O(n) + O(2n) = O(3n) = O(n)
Average per operation: O(n) / n = O(1) ✨
Why It Works
- Resizes are rare - each element participates in at most log(n) resizes
- Gaps grow exponentially - after resizing to k, you get k/2 cheap insertions
- Cost is distributed - expensive resize is "paid for" by cheap operations
Complete PHP Implementation
Hash Table with Chaining
Let's build a production-ready hash table in PHP using chaining for collision resolution:
<?php
/**
* Hash Table implementation using Chaining (Separate Chaining)
*
* Features:
* - Dynamic resizing at 0.75 load factor
* - O(1) average time complexity
* - Supports any hashable key type
*/
class HashTable {
private array $buckets;
private int $capacity;
private int $size;
private const LOAD_FACTOR_THRESHOLD = 0.75;
public function __construct(int $initialCapacity = 16) {
$this->capacity = $initialCapacity;
$this->size = 0;
$this->buckets = array_fill(0, $this->capacity, []);
}
/**
* Hash function to compute index from key
*/
private function hash($key): int {
if (is_string($key)) {
$hashValue = 0;
$len = strlen($key);
for ($i = 0; $i < $len; $i++) {
$hashValue = ($hashValue * 31 + ord($key[$i]));
}
return abs($hashValue) % $this->capacity;
}
// For numeric or other types
return abs(crc32(serialize($key))) % $this->capacity;
}
/**
* Resize the hash table when load factor exceeds threshold
*/
private function resize(): void {
echo "Resizing from {$this->capacity} to " . ($this->capacity * 2) . "\n";
$oldBuckets = $this->buckets;
$this->capacity *= 2;
$this->buckets = array_fill(0, $this->capacity, []);
$this->size = 0;
// Rehash all entries
foreach ($oldBuckets as $bucket) {
foreach ($bucket as $entry) {
$this->put($entry['key'], $entry['value']);
}
}
}
/**
* Insert or update key-value pair
* Time Complexity: O(1) average, O(n) worst case
*/
public function put($key, $value): void {
// Check if resize is needed
if ($this->size / $this->capacity >= self::LOAD_FACTOR_THRESHOLD) {
$this->resize();
}
$index = $this->hash($key);
// Check if key already exists and update
foreach ($this->buckets[$index] as &$entry) {
if ($entry['key'] === $key) {
$entry['value'] = $value;
return;
}
}
// Key doesn't exist, add new entry
$this->buckets[$index][] = ['key' => $key, 'value' => $value];
$this->size++;
}
/**
* Get value by key
* Time Complexity: O(1) average, O(n) worst case
*/
public function get($key) {
$index = $this->hash($key);
foreach ($this->buckets[$index] as $entry) {
if ($entry['key'] === $key) {
return $entry['value'];
}
}
return null; // Key not found
}
/**
* Remove key-value pair
* Time Complexity: O(1) average, O(n) worst case
*/
public function remove($key) {
$index = $this->hash($key);
foreach ($this->buckets[$index] as $i => $entry) {
if ($entry['key'] === $key) {
$value = $entry['value'];
array_splice($this->buckets[$index], $i, 1);
$this->size--;
return $value;
}
}
return null; // Key not found
}
/**
* Check if key exists
*/
public function containsKey($key): bool {
return $this->get($key) !== null;
}
/**
* Get all keys
*/
public function keys(): array {
$keys = [];
foreach ($this->buckets as $bucket) {
foreach ($bucket as $entry) {
$keys[] = $entry['key'];
}
}
return $keys;
}
/**
* Get all values
*/
public function values(): array {
$values = [];
foreach ($this->buckets as $bucket) {
foreach ($bucket as $entry) {
$values[] = $entry['value'];
}
}
return $values;
}
/**
* Get all entries as associative array
*/
public function toArray(): array {
$result = [];
foreach ($this->buckets as $bucket) {
foreach ($bucket as $entry) {
$result[$entry['key']] = $entry['value'];
}
}
return $result;
}
/**
* Get size
*/
public function size(): int {
return $this->size;
}
/**
* Check if empty
*/
public function isEmpty(): bool {
return $this->size === 0;
}
/**
* Get load factor
*/
public function getLoadFactor(): float {
return $this->size / $this->capacity;
}
/**
* Display hash table structure (for debugging)
*/
public function displayStructure(): void {
echo "\n=== Hash Table Structure ===\n";
for ($i = 0; $i < $this->capacity; $i++) {
echo "Bucket $i: ";
if (empty($this->buckets[$i])) {
echo "empty";
} else {
$entries = [];
foreach ($this->buckets[$i] as $entry) {
$entries[] = "{$entry['key']}=>{$entry['value']}";
}
echo "[" . implode(", ", $entries) . "]";
}
echo "\n";
}
echo "\n";
}
public function __toString(): string {
$items = [];
foreach ($this->buckets as $bucket) {
foreach ($bucket as $entry) {
$items[] = "'{$entry['key']}' => '{$entry['value']}'";
}
}
return "{" . implode(", ", $items) . "}";
}
}
// ==================== DEMONSTRATION ====================
echo "=== Hash Table with Chaining Demo ===\n\n";
// Create hash table
$ht = new HashTable(4);
// Insert key-value pairs
echo "Inserting key-value pairs...\n";
$ht->put("Alice", 25);
$ht->put("Bob", 30);
$ht->put("Charlie", 35);
$ht->put("David", 28);
$ht->put("Eve", 32);
$ht->put("Frank", 27);
echo "\nHash Table: " . $ht . "\n";
echo "Size: " . $ht->size() . "\n";
echo "Load Factor: " . number_format($ht->getLoadFactor(), 2) . "\n";
$ht->displayStructure();
// Get values
echo "--- Getting Values ---\n";
echo "Alice's age: " . $ht->get("Alice") . "\n";
echo "Bob's age: " . $ht->get("Bob") . "\n";
// Update value
echo "\n--- Updating Value ---\n";
$ht->put("Alice", 26);
echo "Alice's new age: " . $ht->get("Alice") . "\n";
// Check existence
echo "\n--- Checking Existence ---\n";
echo "Contains 'Charlie': " . ($ht->containsKey("Charlie") ? "true" : "false") . "\n";
echo "Contains 'Zoe': " . ($ht->containsKey("Zoe") ? "true" : "false") . "\n";
// Remove entry
echo "\n--- Removing Entry ---\n";
$removed = $ht->remove("David");
echo "Removed David (age $removed)\n";
echo "Hash Table: " . $ht . "\n";
// Get all keys and values
echo "\n--- All Keys and Values ---\n";
echo "Keys: [" . implode(", ", $ht->keys()) . "]\n";
echo "Values: [" . implode(", ", $ht->values()) . "]\n";
// Convert to array
echo "\n--- Convert to Array ---\n";
print_r($ht->toArray());
?>
Hash Table with Open Addressing
Now let's implement open addressing with linear probing:
<?php
/**
* Hash Table implementation using Open Addressing (Linear Probing)
*
* Features:
* - Linear probing for collision resolution
* - Tombstone markers for deletion
* - Dynamic resizing at 0.7 load factor
* - Memory efficient
*/
class HashTableOpenAddressing {
private array $keys;
private array $values;
private array $deleted;
private int $capacity;
private int $size;
private const LOAD_FACTOR_THRESHOLD = 0.7;
public function __construct(int $initialCapacity = 16) {
$this->capacity = $initialCapacity;
$this->size = 0;
$this->keys = array_fill(0, $this->capacity, null);
$this->values = array_fill(0, $this->capacity, null);
$this->deleted = array_fill(0, $this->capacity, false);
}
/**
* Hash function
*/
private function hash($key): int {
if (is_string($key)) {
$hashValue = 0;
$len = strlen($key);
for ($i = 0; $i < $len; $i++) {
$hashValue = ($hashValue * 31 + ord($key[$i]));
}
return abs($hashValue) % $this->capacity;
}
return abs(crc32(serialize($key))) % $this->capacity;
}
/**
* Linear probing: try next slot
*/
private function probe(int $index): int {
return ($index + 1) % $this->capacity;
}
/**
* Resize the hash table
*/
private function resize(): void {
echo "Resizing from {$this->capacity} to " . ($this->capacity * 2) . "\n";
$oldKeys = $this->keys;
$oldValues = $this->values;
$oldDeleted = $this->deleted;
$oldCapacity = $this->capacity;
$this->capacity *= 2;
$this->keys = array_fill(0, $this->capacity, null);
$this->values = array_fill(0, $this->capacity, null);
$this->deleted = array_fill(0, $this->capacity, false);
$this->size = 0;
// Rehash all non-deleted entries
for ($i = 0; $i < $oldCapacity; $i++) {
if ($oldKeys[$i] !== null && !$oldDeleted[$i]) {
$this->put($oldKeys[$i], $oldValues[$i]);
}
}
}
/**
* Insert or update key-value pair
*/
public function put($key, $value): void {
// Check if resize is needed
if ($this->size / $this->capacity >= self::LOAD_FACTOR_THRESHOLD) {
$this->resize();
}
$index = $this->hash($key);
$startIndex = $index;
// Linear probing
while ($this->keys[$index] !== null) {
// Update if key exists
if ($this->keys[$index] === $key) {
$this->values[$index] = $value;
$this->deleted[$index] = false;
return;
}
// If slot is deleted, we can reuse it
if ($this->deleted[$index]) {
break;
}
$index = $this->probe($index);
// Table is full
if ($index === $startIndex) {
throw new Exception("Hash table is full");
}
}
// Insert new entry
$this->keys[$index] = $key;
$this->values[$index] = $value;
$this->deleted[$index] = false;
$this->size++;
}
/**
* Get value by key
*/
public function get($key) {
$index = $this->hash($key);
$startIndex = $index;
while ($this->keys[$index] !== null || $this->deleted[$index]) {
if ($this->keys[$index] === $key && !$this->deleted[$index]) {
return $this->values[$index];
}
$index = $this->probe($index);
if ($index === $startIndex) {
break;
}
}
return null; // Key not found
}
/**
* Remove key-value pair using tombstone
*/
public function remove($key) {
$index = $this->hash($key);
$startIndex = $index;
while ($this->keys[$index] !== null || $this->deleted[$index]) {
if ($this->keys[$index] === $key && !$this->deleted[$index]) {
$value = $this->values[$index];
$this->deleted[$index] = true; // Mark as deleted (tombstone)
$this->size--;
return $value;
}
$index = $this->probe($index);
if ($index === $startIndex) {
break;
}
}
return null; // Key not found
}
/**
* Check if key exists
*/
public function containsKey($key): bool {
return $this->get($key) !== null;
}
/**
* Get size
*/
public function size(): int {
return $this->size;
}
/**
* Check if empty
*/
public function isEmpty(): bool {
return $this->size === 0;
}
/**
* Get load factor
*/
public function getLoadFactor(): float {
return $this->size / $this->capacity;
}
/**
* Display hash table structure (for debugging)
*/
public function displayStructure(): void {
echo "\n=== Hash Table Structure ===\n";
for ($i = 0; $i < $this->capacity; $i++) {
echo "Index $i: ";
if ($this->keys[$i] === null) {
echo "null";
} elseif ($this->deleted[$i]) {
echo "[DELETED] {$this->keys[$i]}=>{$this->values[$i]}";
} else {
echo "{$this->keys[$i]}=>{$this->values[$i]}";
}
echo "\n";
}
echo "\n";
}
public function __toString(): string {
$items = [];
for ($i = 0; $i < $this->capacity; $i++) {
if ($this->keys[$i] !== null && !$this->deleted[$i]) {
$items[] = "'{$this->keys[$i]}' => '{$this->values[$i]}'";
}
}
return "{" . implode(", ", $items) . "}";
}
}
// ==================== DEMONSTRATION ====================
echo "\n\n=== Hash Table with Open Addressing Demo ===\n\n";
// Create hash table
$ht = new HashTableOpenAddressing(8);
// Insert key-value pairs
echo "Inserting key-value pairs...\n";
$ht->put("Alice", 25);
$ht->put("Bob", 30);
$ht->put("Charlie", 35);
$ht->put("David", 28);
echo "\nHash Table: " . $ht . "\n";
echo "Size: " . $ht->size() . "\n";
echo "Load Factor: " . number_format($ht->getLoadFactor(), 2) . "\n";
$ht->displayStructure();
// Get values
echo "--- Getting Values ---\n";
echo "Alice's age: " . $ht->get("Alice") . "\n";
echo "Bob's age: " . $ht->get("Bob") . "\n";
// Update value
echo "\n--- Updating Value ---\n";
$ht->put("Alice", 26);
echo "Alice's new age: " . $ht->get("Alice") . "\n";
// Remove entry
echo "\n--- Removing Entry ---\n";
$removed = $ht->remove("Bob");
echo "Removed Bob (age $removed)\n";
echo "Hash Table: " . $ht . "\n";
$ht->displayStructure();
// Add more entries to see tombstones
echo "--- Adding More Entries ---\n";
$ht->put("Eve", 32);
$ht->put("Frank", 27);
$ht->displayStructure();
// Test resizing
echo "--- Testing Resizing ---\n";
$htSmall = new HashTableOpenAddressing(4);
for ($i = 0; $i < 8; $i++) {
$htSmall->put("key$i", $i * 10);
}
echo "Final size: " . $htSmall->size() . "\n";
?>
Time Complexity
Average Case (with good hash function)
| Operation | Time Complexity |
|---|---|
| Insert | O(1) amortized |
| Search | O(1) |
| Delete | O(1) |
| Space | O(n) |
Worst Case (all keys hash to same bucket)
| Operation | Chaining | Open Addressing |
|---|---|---|
| Insert | O(n) | O(n) |
| Search | O(n) | O(n) |
| Delete | O(n) | O(n) |
Comparison: Chaining vs Open Addressing
| Factor | Chaining | Open Addressing |
|---|---|---|
| Load Factor | Can exceed 1.0 | Must stay < 1.0 (< 0.7) |
| Memory | Extra overhead for arrays | Compact |
| Cache | Poor | Good |
| Deletion | Easy | Complex (tombstones) |
| Clustering | None | Severe |
| Implementation | Simple | Moderate |
| Degradation | Graceful | Sharp |
Real-World Applications
Hash tables are everywhere! Here are some common use cases:
1. Database Indexing
<?php
// Fast user lookup by ID
$userCache = new HashTable();
$userCache->put(12345, ['name' => 'Alice', 'email' => 'alice@example.com']);
$userCache->put(67890, ['name' => 'Bob', 'email' => 'bob@example.com']);
$user = $userCache->get(12345); // O(1) lookup
?>
2. Caching
<?php
class Cache {
private HashTable $cache;
public function __construct() {
$this->cache = new HashTable();
}
public function get($url) {
return $this->cache->get($url);
}
public function set($url, $data) {
$this->cache->put($url, $data);
}
}
$cache = new Cache();
$cache->set("https://api.example.com/users", $apiResponse);
$data = $cache->get("https://api.example.com/users"); // Fast!
?>
3. Counting Frequencies
<?php
function wordFrequency($text) {
$freq = new HashTable();
$words = explode(' ', strtolower($text));
foreach ($words as $word) {
$count = $freq->get($word) ?? 0;
$freq->put($word, $count + 1);
}
return $freq;
}
$text = "hello world hello php world";
$freq = wordFrequency($text);
echo "Frequency of 'hello': " . $freq->get('hello'); // 2
?>
4. Session Management
<?php
class SessionManager {
private HashTable $sessions;
public function __construct() {
$this->sessions = new HashTable();
}
public function createSession($userId) {
$sessionId = bin2hex(random_bytes(16));
$this->sessions->put($sessionId, [
'userId' => $userId,
'createdAt' => time()
]);
return $sessionId;
}
public function getSession($sessionId) {
return $this->sessions->get($sessionId);
}
}
?>
5. Two Sum Problem (LeetCode)
<?php
function twoSum($nums, $target) {
$ht = new HashTable();
foreach ($nums as $i => $num) {
$complement = $target - $num;
if ($ht->containsKey($complement)) {
return [$ht->get($complement), $i];
}
$ht->put($num, $i);
}
return null;
}
$result = twoSum([2, 7, 11, 15], 9); // [0, 1]
?>
6. Duplicate Detection
<?php
function hasDuplicate($array) {
$seen = new HashTable();
foreach ($array as $item) {
if ($seen->containsKey($item)) {
return true;
}
$seen->put($item, true);
}
return false;
}
echo hasDuplicate([1, 2, 3, 4, 5]); // false
echo hasDuplicate([1, 2, 3, 2, 5]); // true
?>
PHP's Built-in Associative Arrays
PHP's associative arrays ARE hash tables! Here's how they compare:
<?php
// PHP's built-in (uses hash table internally)
$builtin = [
"Alice" => 25,
"Bob" => 30
];
// Our implementation
$custom = new HashTable();
$custom->put("Alice", 25);
$custom->put("Bob", 30);
// Both are O(1) for lookups!
echo $builtin["Alice"]; // 25
echo $custom->get("Alice"); // 25
?>
When to use custom implementation:
- Learning purposes
- Need specific hash function
- Custom collision resolution strategy
- Memory optimization
- Performance profiling
When to use PHP's arrays:
- Production code (99% of cases)
- Already optimized in C
- Supports many built-in functions
- Battle-tested and reliable
Performance Tips
- Choose appropriate initial capacity - avoid frequent resizing
$ht = new HashTable(1000); // If you know you'll store ~750 items
- Monitor load factor - keep it under 0.75 for chaining, 0.7 for open addressing
if ($ht->getLoadFactor() > 0.8) {
// Consider manual resize or optimization
}
- Use good hash functions - PHP's built-in functions are usually best
// Good
$index = abs(crc32($key)) % $capacity;
// Better for strings
$index = abs(hash('fnv1a64', $key)) % $capacity;
- Cache hash values for frequently accessed keys
class CachedHashTable extends HashTable {
private array $hashCache = [];
private function hash($key): int {
if (!isset($this->hashCache[$key])) {
$this->hashCache[$key] = parent::hash($key);
}
return $this->hashCache[$key];
}
}
Conclusion
Hash tables are fundamental data structures that provide:
- ✅ O(1) average time complexity
- ✅ Flexible key-value storage
- ✅ Efficient memory usage
- ✅ Wide range of applications
Key Takeaways:
- Hash tables use a hash function to map keys to array indices
- Collisions are handled by chaining or open addressing
- Load factor determines when to resize (typically 0.75)
- Amortized analysis proves O(1) average performance
- PHP's associative arrays are built on hash tables
Now you understand how hash tables work under the hood! Use this knowledge to write better code and ace those technical interviews. 🚀
Further Reading
Did you find this helpful? Drop a ❤️ and let me know what data structure you'd like to see next!
Connect with me: linkedin
Happy coding! 💻✨
Top comments (0)