Let's say you're building a feature to search for available usernames. You want to make sure that the username is not already taken.
GET /username?username=paul
You could check the database to see if the username is already taken, but that could be slow. Instead, you can use a bloom filter to check if the username is already taken without checking the database.
What is a bloom filter?
A bloom filter is a probabilistic data structure that can be used to test if an element is a member of a set. It is a space-efficient probabilistic data structure, that is, it uses space proportional to the number of elements inserted in the data structure, but it is not guaranteed to be 100% accurate.
The main advantage is that it is very fast. If the element is not in the set, then the bloom filter will always return false. If the element is in the set, then the bloom filter might return true, so you need to check the database to make sure. That reduces the number of database queries.
How does it work?
A bloom filter is a bit array of length n, where the value of each bit is initially 0. As new elements are added, they are hashed multiple times and the corresponding bits are set to 1.
When you want to check if an element is in the filter, you hash the element and check if the corresponding bits are set to 1. If they are, then the element is probably in the filter. If they are not, then the element is definitely not in the filter.
i.e. if you want to check if the username paul
is in the filter, you would hash paul
and check if the corresponding bits are set to 1. If they are, then the username is probably in the filter. If they are not, then the username is definitely not in the filter.
How to create a bloom filter
There are some packages that you can use to create a bloom filter. For example, bloom-filters is a JavaScript implementation of bloom filters.
Top comments (0)