Summary of Leetcode explore card.

organizes data using hash functions in order to support quick insertion and search

Two different kinds:

Hash set

set data structure - no repeated values.Hash map

map data structure - store (key,value) pairs

# Hash Function

maps a big number or string to a small integer that can be used as index in hash table.

# Concepts

Further reading on concepts here & for java specific info

## Load Factor

`number of of elements stored in table / number of buckets`

average number of key-value pairs per bucket.

It offers

tradeoff between time and space complexity

Load factor of Java `Hashmap`

is `0.75`

## Capacity / Initial Capacity

`number of buckets`

initial capaity of Java `HashMap`

is `16`

## Threshold

`Current Capacity * Load Factor`

Capacity of Hashmap is doubled each time it reaches threshold

In java, Hashmap created with initial capacity 16 & load factor of 0.75. Therefore threshold is `16 * 0.75 = 12`

.

So, capacity of HashMap increased from 16 to 32 after 12th element added to HashMap.

## Rehashing

Hashing again. (eg. when capacity is increased)

When new HashMap object with new capacity is created, all old elements (key-value pairs) are placed into new object after recalculating their hashcode.

Initial capacity and load factor must be chosen such that they minimize the number of rehashing operations.

__ Example __

Load factor is 1.0f. (Rehashing take place after filling 100% of current capacity)

- saves space
- increases retrieval time of existing elements

Load factor is 0.5f.

- rehashing after 50% of current capacity filled.
- increase number of rehashing operations.

# Hash Table

use hash function to map keys to buckets

- insert new key
- hash function decides which bucket key is assigned
- key is stored in corresponding bucket

- search for key
- hash table uses hash function to find bucket
- search in bucket for key

eg. `y = x % 5`

is the hashing function

1987 & 2 are in same bucket. (bucket 2)

use hashing function, search bucket 2 for the key.

__Two basic operations__

- Insert
- Search

## Designing a Hash Table

### 1. Hash Function

assign key to bucket

`as uniformly as possible`

Ideally, key and bucket is a one-to-one mapping. However, there is always a trade off between `amount of buckets`

& `capacity of buckets`

.

Good Hash function must:

- efficiently computable
- uniformly distribute keys

### 2. Collision Reduction

collision is when > 1 key is assigned to 1 bucket

Ideally, hash function is one-to-one mapping for key & bucket.

Considerations

- How to organize values in bucket?
- What is too many values are assigned to one bucket?
- How to search for target value in bucket?

Must consider:

- capacity of bucket
- number of keys that might be mapped to the same bucket

given `N`

=number of keys in bucket

- N: small & constant -->
`array`

to store keys`O(N)`

- N: variable / large -->
`height balanced binary search tree`

to store keys`O(log(n))`

# Handling Collisions

situation where newly inserted key maps to an already occupied slot in hash table.

Since a hash function produces a small number for a key which is a big integer or string. There is a possibility 2 keys result in same value. (eg. Birthday Paradox, need 23 people for 50% chance of sharing same birthday)

## 1. Separate Chaining

each cell of hash table point to a linked list of records that have same hash function value.

requires additional memory outside table.

### Data Structures for Separate Chaining

- Linked List
- ArrayList
- Trees

### Separate Chaining - Complexity

Complexity depends on complexity of underlying data structure supporting Separate Chaining

#### Average

Average assumes uniform hashing

more details here

```
lf = load factor
n = number of elements
m = number of buckets
α = n / m
```

#### unsuccessful

`O(α)`

For unsuccessful find, linked list in separate chaining must be exhaustively searched. Average length of linked list in uniform hashing is load factor

#### successful

`O(1+(α/2))`

In successful find, linked list will definitely contain target key. On average, target key can be found in first half of linked list.

#### Worst

`O(N)`

#### Best

`O(1)`

## 2. Open Addressing

all elements stored in the hash table itself.

Since all elements stored in hash table, at any point size of table must be >= total number of keys.

Each table entry contains either a record or NIL. To search for element, examine table slots one by one.

__Insert(k)__

Keep probing until an empty slot is found & insert key

__Search(k)__

Keep probing until slot’s key == k or an empty slot is reached.

__Delete(k)__

Simply deleting a key may result in failures in search.

Slots of deleted keys are marked specially as “deleted”. Items can be inserted into deleted slot but search cannot stop at deleted stop.

### Open Addressing - Implementations

#### 1. Linear Probing

probe for the next spot

```
If slot hash(x) % S is full, then we try (hash(x) + 1) % S
If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S
If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S
```

#### Problem - Primary clustering problem

tendency for a collision resolution scheme to create long runs of filled slots

nearthe hash position of keys.

blocks of occupied cells can form. The cluster can grow bigger and reduce performance of hashmap. (must iterate through entire cluster to find value)

#### 2. Quadratic Probing

Look for the next i

^{2}th slot in the ith iteration

```
If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S
If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S
If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S
```

#### Problem - Secondary Clustering

tendency to create long runs of filled slots

awayfrom hash position of keys

Still more efficient than primary clustering. It aims to *break up* primary clusters and probe more widely separated cells.

Primary and Secondary Clustering explained in more detail here

#### 3. Double Hashing

Use another hash function hash2(x) and look for i*hash2(x) slot in i

^{th}rotation.

```
If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S
If (hash(x) + 1*hash2(x)) % S is also full, then we try (hash(x) + 2*hash2(x)) % S
If (hash(x) + 2*hash2(x)) % S is also full, then we try (hash(x) + 3*hash2(x)) % S
```

#### Requirements

- second hash function can
**never evaluate to 0** - Ensure all cells can be probed

### Open Addressing - Complexity

More details here

```
m = Number of slots in the hash table
n = Number of keys to be inserted in the hash table
Load factor α = n/m ( < 1 )
```

#### Average - Unsuccessful

average analysis assume **uniform hashing**

__ Unsuccessful Search __: `O(1/(1 - α))`

__ insert __: `O(1/(1-α))`

Derivation

```
probability of successful search = p
Probability 1st probe successful = (m-n)/m = 1-α
Probability 2nd probe successful = (m-n)/(m-1) > 1-α
```

Therefore at every probe, probability is at least 1-α. So in worst case, for unsuccessful search minimum number of executions expected is `1/p = 1/(1-α)`

#### Average - Successful

`O((1/α) * log (1/(1-α))`

#### Best Case

`O(1)`

#### Worst Case

`O(N)`

## Open Addressing VS Chaining

Open Addressing: better cache performance

Chaining: less sensitive to functions

# Hash Set

Hashset is an implementation of `set`

data structures.

`set`

data structures stores **no repeated values**

no repeated values

Under the hood, Hashset is backed by java HashMap that maps all its keys to a single constant object

## Typical usage

check if value has appeared or not.

```
Set<Type> hashset = new HashSet<>();
for (Type key: keys) {
if (hashset.contains(key)) {
// there is duplicate in set
return true;
}
// no duplicat add key
hashset.add(key);
}
```

# Hash Table

hash functions are used to link key and values

implementation of `map`

data structure which stores data in `(key,value)`

pairs

build mapping relation between key & information

used when user needs more information than key.

## Under the Hood

Internally, it maintains an array of buckets.

When put(key,value) / get(key)

- hascode() method of key object is called to find the bucket index.
- If hashcode() > number of buckets, it is converted to a number <= number of buckets with formula
`index = hashCode(key) & (n-1).`

Bitwise AND operator is used so only lower bits of hashCode are considered.

__ Collision Resolution__

Java HashMap resolves collisions uses separate chaining (LinkedList)

Once hashCode of key object is calculated. Iterate through Linked List of bucket to find the value / put the value at end of linked list.

More detailed readings can be found on GeeksforGeeks (contains step by steps process / case studies.) For more brief summaries: DZone & javarevisited.

## Usages

```
// 1. initialize a hash map
Map<Integer, Integer> hashmap = new HashMap<>();
// 2. insert a new (key, value) pair
hashmap.putIfAbsent(0, 0);
hashmap.putIfAbsent(2, 3);
// 3. insert a new (key, value) pair or update the value of existed key
hashmap.put(1, 1);
hashmap.put(1, 2);
// 4. get the value of specific key
System.out.println("The value of key 1 is: " + hashmap.get(1));
// 5. delete a key
hashmap.remove(2);
// 6. check if a key is in the hash map
if (!hashmap.containsKey(2)) {
System.out.println("Key 2 is not in the hash map.");
}
// 7. get the size of the hash map
System.out.println("The size of hash map is: " + hashmap.size());
// 8. iterate the hash map
for (Map.Entry<Integer, Integer> entry : hashmap.entrySet()) {
System.out.print("(" + entry.getKey() + "," + entry.getValue() + ") ");
}
System.out.println("are in the hash map.");
// 9. clear the hash map
hashmap.clear();
// 10. check if the hash map is empty
if (hashmap.isEmpty()) {
System.out.println("hash map is empty now!");
}
```

### Get more information (other than key)

From array of indices, return

indicesof two numbers such that they add up to specific target

In this case, hashamp is needed, need to store the values and check that `target - current_value`

can be found in the hashmap.

#### Common problems

- array of integers, return list of indices where numbers add up to given target
- check if two strings are isomorphic. (replace 1 character with another can get from first string input to second string input)

### Aggregate by key

aggregate all information by key

The key to solving this category of problems is **deciding strategy when existing key is encountered** (eg. increment hashmap value by 1, replace original value with new value?)

#### Common problem

- find first non-repeating character in string and return its index, else return -1

## Deign Hash Key

designing the hash key is about `building a mapping relationship`

between original information & key used by hashmap.

`deciding what the hash key in hashmap should be`

Hash Key needs to guarantee

- All values belong to the same group will be mapped in the same group.
- Values which needed to be separated into different groups will not be mapped into the same group.

`hash function fulfills criteria 1 but NOT 2`

Mapping function should satisfy both.

### Example

- Anagrams HashMap to groups anagrams (eg. 'eat','ate' same group. 'act' different group) Hash Key mapping could be to use sorted string. 'eat' & 'ate' will be mapped to key 'aet'.

# more readings

https://home.cse.ust.hk/~dekai/2012H/lectures/l32_hash.pdf

http://courses.csail.mit.edu/6.006/fall11/lectures/lecture10.pdf

http://www.cs.toronto.edu/~ylzhang/csc263w15/slides/lec05-hashing.pdf

## Discussion (0)