DEV Community

Cover image for The Algorithm That Refuses to Give a Straight Answer (Just Like My Ex)
Developer-friendly
Developer-friendly

Posted on

The Algorithm That Refuses to Give a Straight Answer (Just Like My Ex)

Your Database Is Secretly Saying "I Don't Know" (And That's Genius)

Imagine you're trying to create an Instagram account.

You type:

Username: may_hua
Enter fullscreen mode Exit fullscreen mode

You click Check Availability.

One second later...

✅ Available!

Easy.

But have you ever wondered...

How did Instagram answer that so quickly?

Does it search through hundreds of millions of usernames every single time?

If yes...

I hope their database enjoys suffering.


The Obvious Solution

Let's pretend we're building our own social media app.

Our database looks like this.

alice
bob
charlie
david
emma
...
500,000,000 more usernames
Enter fullscreen mode Exit fullscreen mode

Someone types:

may_hua
Enter fullscreen mode Exit fullscreen mode

Our program asks the database,

"Hey... does this username exist?"

The database searches.

It finds the answer.

Done.

This works perfectly.

So...

Why am I writing a whole blog?

Because engineers are never satisfied.

Someone looked at this and said,

"Can we make it even faster?"

Classic engineer behavior.


The Real Problem

Imagine your website becomes famous.

(Delulu. ✨)

Now every second,

  • 20,000 people log in
  • 10,000 people sign up
  • thousands of people search for usernames

Every one of those requests asks the database.

Even though databases are incredibly fast, millions of unnecessary lookups still consume resources.

Then one engineer had a weird idea.

A very weird idea.


"What if we don't store the usernames?"

...

Wait.

What?

Isn't that the whole point?

Apparently not.

Welcome to the Bloom Filter.


Meet the Magic Light Board

Forget databases for a moment.

Imagine we have a board with only 20 light bulbs.

Initially they all look like this.

⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪
⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪
Enter fullscreen mode Exit fullscreen mode

White means OFF.

That's all our Bloom Filter has.

No usernames.

No text.

No list.

Just lights.

You're probably thinking,

"How on earth is this supposed to remember 500 million usernames?"

Excellent question.

It doesn't.

At least... not directly.


The Magical Machine

Now imagine we own a magical machine.

You put a word into it.

apple
Enter fullscreen mode Exit fullscreen mode

The machine says

7
Enter fullscreen mode Exit fullscreen mode

You put another word.

banana
Enter fullscreen mode Exit fullscreen mode

It says

15
Enter fullscreen mode Exit fullscreen mode

Another.

orange
Enter fullscreen mode Exit fullscreen mode

It says

2
Enter fullscreen mode Exit fullscreen mode

This magical machine is called a hash function.

It simply converts any text into a number.

Don't worry about how it works today.

Just think of it as a machine that loves turning words into numbers.


Let's Add Our First Username

Suppose someone creates an account.

alice
Enter fullscreen mode Exit fullscreen mode

Our magical machine says

Turn on light #7.
Enter fullscreen mode Exit fullscreen mode

Now our board becomes

⚪ ⚪ ⚪ ⚪ ⚪ ⚪ 🟢 ⚪ ⚪ ⚪
⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪
Enter fullscreen mode Exit fullscreen mode

That's it.

We don't store

alice
Enter fullscreen mode Exit fullscreen mode

We only turned on one light.

Strange.

Very strange.


But One Light Isn't Enough

Imagine this happens.

alice

↓

7
Enter fullscreen mode Exit fullscreen mode

Then...

banana

↓

7
Enter fullscreen mode Exit fullscreen mode

Uh oh.

Now both usernames point to the same light.

Who turned it on?

Alice?

Banana?

The light refuses to answer.

This is called a hash collision.

So Bloom Filters do something clever.

Instead of using one hash function...

they use several.


Three Magical Machines

Now we have three machines.

For the username

alice
Enter fullscreen mode Exit fullscreen mode

Machine 1 says

3
Enter fullscreen mode Exit fullscreen mode

Machine 2 says

10
Enter fullscreen mode Exit fullscreen mode

Machine 3 says

17
Enter fullscreen mode Exit fullscreen mode

Instead of turning on one light...

we turn on three.

⚪ ⚪ 🟢 ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ 🟢
⚪ ⚪ ⚪ ⚪ ⚪ ⚪ 🟢 ⚪ ⚪ ⚪
Enter fullscreen mode Exit fullscreen mode

Alice has left three little fingerprints.

Not an actual fingerprint.

Please don't press your thumb on your monitor.


Add Another Username

Now someone registers

bob
Enter fullscreen mode Exit fullscreen mode

The machines say

5

10

19
Enter fullscreen mode Exit fullscreen mode

Turn those lights on.

Notice something?

Light 10 was already ON.

That's completely fine.

Lights can be shared.

Nobody gets jealous.


Time for the Big Question

Someone types

alice
Enter fullscreen mode Exit fullscreen mode

Does she exist?

Our three machines produce

3

10

17
Enter fullscreen mode Exit fullscreen mode

Let's check the lights.

3 ✅ ON

10 ✅ ON

17 ✅ ON
Enter fullscreen mode Exit fullscreen mode

Everything is ON.

The Bloom Filter answers...

🤔 Maybe.

Wait.

Not...

✅ Yes?

Nope.

Only...

🤔 Maybe.

Why?

Because other usernames might have turned on those same lights.

The Bloom Filter isn't completely sure.


Another Person Arrives

Now someone searches

may_hua
Enter fullscreen mode Exit fullscreen mode

The machines produce

2

6

11
Enter fullscreen mode Exit fullscreen mode

Let's check.

2 ❌ OFF
Enter fullscreen mode Exit fullscreen mode

Game over.

The Bloom Filter immediately says

❌ Definitely NOT.

It doesn't even bother checking lights 6 or 11.

One OFF light is enough.


Why Is It So Confident?

Think about it.

If may_hua had ever been inserted,

light #2 would have been turned ON.

But it's OFF.

So the username could never have been added.

It's impossible.

This is why Bloom Filters can confidently say

"Definitely not."


The Weirdest Part

Imagine our database contains only

alice
bob
charlie
Enter fullscreen mode Exit fullscreen mode

Their lights accidentally create this pattern.

3

7

10
Enter fullscreen mode Exit fullscreen mode

Now someone searches

david
Enter fullscreen mode Exit fullscreen mode

And...

surprise!

David also hashes to

3

7

10
Enter fullscreen mode Exit fullscreen mode

Every light is ON.

The Bloom Filter says

🤔 Maybe.

The database checks.

...

David isn't there.

Oops.

The Bloom Filter made a mistake.

This is called a false positive.

And believe it or not...

that's perfectly okay.


But Can It Make the Opposite Mistake?

Suppose Alice really exists.

Can the Bloom Filter ever say

❌ Definitely not?

No.

Never.

Not even once.

That's the superpower of a Bloom Filter.

It can accidentally say

"Maybe"

when the answer is actually "No."

But it will never say

"Definitely not"

when the answer is actually "Yes."

Computer scientists call this:

  • ✅ No false negatives
  • ⚠️ Possible false positives

Fancy words.

Simple idea.


So... Why Is This Useful?

Imagine one million username checks.

Without a Bloom Filter,

the database receives one million requests.

With a Bloom Filter,

maybe 990,000 of them are answered immediately with

❌ Definitely not.

Only the remaining requests need to reach the database.

Less work.

Less waiting.

Happier servers.

Probably happier engineers too.


Where Is This Used?

Bloom Filters quietly work behind the scenes in many large systems.

They're used in things like:

  • Google Chrome Safe Browsing
  • Large databases
  • Distributed caches
  • Search engines
  • Storage systems

They all use the same idea:

"If we already know something definitely doesn't exist, why waste time asking again?"


Final Thoughts

When I first learned about Bloom Filters, I expected another complicated algorithm full of scary mathematics.

Instead, I found something beautifully simple.

It doesn't try to know everything.

It only tries to eliminate impossible answers.

Sometimes the smartest answer isn't

"Yes."

It's simply

"I'm absolutely sure the answer is no."

Top comments (0)