Quick Intro into Bloom Filter

#systemdesign #datastructures #programming

Bloom filter is in-memory space efficient data structure for checking whether an object is in the set or not.

Difference between bloom filter and the usual hash table is that bloom filter use less memory and therefore has some drawbacks:

It cannot save the object itself / the pointer to the object, just some bit telling that tells the object already in the set or not
Bloom filter can return false positive result (but cannot return false negative)

Two components of bloom filter are:

Write data into bloom filter:

Put the object into k hash functions, the result will be k hash values in range [0, n)
for each hash value, set the element at A[hash value] to 1

Querying the bloom filter:

Put the object into k hash functions, the result will be k hash values in range [0, n)
for each hash value, get the value at A[hash value]
if all value is 1, then the object possibly already in set, else it is definitely not in set

Bloom filter can be used by applications that need efficient memory and can tolerate some false positive results. Some of the examples are:

Password Checking
- Some weak password are put inside the bloom filter, then use it to check the password provided by user. If the bloom filter returns true, the password is probably weak, ask the user to provide another password else we can confidently say that the password is strong.
Spellchecking
- Insert all valid English words into the bloom filter. When user type something, ask the bloom filter. If the bloom filter returns false, the word is definitely not valid else it is probably valid
Username checking
- When user register, after saving to the database, insert the username into bloom filter
- When another user register, check if the provided username in bloom filter or not
  - if yes, then it is probably already taken
  - else, it is definitely available
Speed up reading in LSM Tree Storage Engine
- in LSM Tree, the read involving checking multiple level of SSTable
- the idea is to maintain a bloom filter for each level of SSTable, before actually search the SSTable, first we look at the bloom filter
  - if false, then the key is definitely not in that level
  - else, it is probably in that level

DEV Community