Developer-friendly

Posted on Jun 29

The Algorithm That Refuses to Give a Straight Answer (Just Like My Ex)

#programming #datastructures #algorithms #security

Your Database Is Secretly Saying "I Don't Know" (And That's Genius)

Imagine you're trying to create an Instagram account.

You type:

Username: may_hua

You click Check Availability.

One second later...

✅ Available!

Easy.

But have you ever wondered...

How did Instagram answer that so quickly?

Does it search through hundreds of millions of usernames every single time?

If yes...

I hope their database enjoys suffering.

The Obvious Solution

Let's pretend we're building our own social media app.

Our database looks like this.

alice
bob
charlie
david
emma
...
500,000,000 more usernames

Someone types:

may_hua

Our program asks the database,

"Hey... does this username exist?"

The database searches.

It finds the answer.

Done.

This works perfectly.

So...

Why am I writing a whole blog?

Because engineers are never satisfied.

Someone looked at this and said,

"Can we make it even faster?"

Classic engineer behavior.

The Real Problem

Imagine your website becomes famous.

(Delulu. ✨)

Now every second,

20,000 people log in
10,000 people sign up
thousands of people search for usernames

Every one of those requests asks the database.

Even though databases are incredibly fast, millions of unnecessary lookups still consume resources.

Then one engineer had a weird idea.

A very weird idea.

"What if we don't store the usernames?"

...

Wait.

What?

Isn't that the whole point?

Apparently not.

Welcome to the Bloom Filter.

Meet the Magic Light Board

Forget databases for a moment.

Imagine we have a board with only 20 light bulbs.

Initially they all look like this.

⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪
⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪

White means OFF.

That's all our Bloom Filter has.

No usernames.

No text.

No list.

Just lights.

You're probably thinking,

"How on earth is this supposed to remember 500 million usernames?"

Excellent question.

It doesn't.

At least... not directly.

The Magical Machine

Now imagine we own a magical machine.

You put a word into it.

apple

The machine says

You put another word.

banana

It says

Another.

orange

It says

This magical machine is called a hash function.

It simply converts any text into a number.

Don't worry about how it works today.

Just think of it as a machine that loves turning words into numbers.

Let's Add Our First Username

Suppose someone creates an account.

alice

Our magical machine says

Turn on light #7.

Now our board becomes

⚪ ⚪ ⚪ ⚪ ⚪ ⚪ 🟢 ⚪ ⚪ ⚪
⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪

That's it.

We don't store

alice

We only turned on one light.

Strange.

Very strange.

But One Light Isn't Enough

Imagine this happens.

alice

↓

7

Then...

banana

↓

7

Uh oh.

Now both usernames point to the same light.

Who turned it on?

Alice?

Banana?

The light refuses to answer.

This is called a hash collision.

So Bloom Filters do something clever.

Instead of using one hash function...

they use several.

Three Magical Machines

Now we have three machines.

For the username

alice

Machine 1 says

Machine 2 says

Machine 3 says

Instead of turning on one light...

we turn on three.

⚪ ⚪ 🟢 ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ 🟢
⚪ ⚪ ⚪ ⚪ ⚪ ⚪ 🟢 ⚪ ⚪ ⚪

Alice has left three little fingerprints.

Not an actual fingerprint.

Please don't press your thumb on your monitor.

Add Another Username

Now someone registers

bob

The machines say

Turn those lights on.

Notice something?

Light 10 was already ON.

That's completely fine.

Lights can be shared.

Nobody gets jealous.

Time for the Big Question

Someone types

alice

Does she exist?

Our three machines produce

Let's check the lights.

3 ✅ ON

10 ✅ ON

17 ✅ ON

Everything is ON.

The Bloom Filter answers...

🤔 Maybe.

Wait.

Not...

✅ Yes?

Nope.

Only...

🤔 Maybe.

Why?

Because other usernames might have turned on those same lights.

The Bloom Filter isn't completely sure.

Another Person Arrives

Now someone searches

may_hua

The machines produce

Let's check.

2 ❌ OFF

Game over.

The Bloom Filter immediately says

❌ Definitely NOT.

It doesn't even bother checking lights 6 or 11.

One OFF light is enough.

Why Is It So Confident?

Think about it.

If may_hua had ever been inserted,

light #2 would have been turned ON.

But it's OFF.

So the username could never have been added.

It's impossible.

This is why Bloom Filters can confidently say

"Definitely not."

The Weirdest Part

Imagine our database contains only

alice
bob
charlie

Their lights accidentally create this pattern.

Now someone searches

david

And...

surprise!

David also hashes to

Every light is ON.

The Bloom Filter says

🤔 Maybe.

The database checks.

...

David isn't there.

Oops.

The Bloom Filter made a mistake.

This is called a false positive.

And believe it or not...

that's perfectly okay.

But Can It Make the Opposite Mistake?

Suppose Alice really exists.

Can the Bloom Filter ever say

❌ Definitely not?

No.

Never.

Not even once.

That's the superpower of a Bloom Filter.

It can accidentally say

"Maybe"

when the answer is actually "No."

But it will never say

"Definitely not"

when the answer is actually "Yes."

Computer scientists call this:

✅ No false negatives
⚠️ Possible false positives

Fancy words.

Simple idea.

So... Why Is This Useful?

Imagine one million username checks.

Without a Bloom Filter,

the database receives one million requests.

With a Bloom Filter,

maybe 990,000 of them are answered immediately with

❌ Definitely not.

Only the remaining requests need to reach the database.

Less work.

Less waiting.

Happier servers.

Probably happier engineers too.

Where Is This Used?

Bloom Filters quietly work behind the scenes in many large systems.

They're used in things like:

Google Chrome Safe Browsing
Large databases
Distributed caches
Search engines
Storage systems

They all use the same idea:

"If we already know something definitely doesn't exist, why waste time asking again?"

Final Thoughts

When I first learned about Bloom Filters, I expected another complicated algorithm full of scary mathematics.

Instead, I found something beautifully simple.

It doesn't try to know everything.

It only tries to eliminate impossible answers.

Sometimes the smartest answer isn't

"Yes."

It's simply

"I'm absolutely sure the answer is no."

DEV Community

The Algorithm That Refuses to Give a Straight Answer (Just Like My Ex)

Your Database Is Secretly Saying "I Don't Know" (And That's Genius)

The Obvious Solution

The Real Problem

"What if we don't store the usernames?"

Meet the Magic Light Board

The Magical Machine

Let's Add Our First Username

But One Light Isn't Enough

Three Magical Machines

Add Another Username

Time for the Big Question

Another Person Arrives

Why Is It So Confident?

The Weirdest Part

But Can It Make the Opposite Mistake?

So... Why Is This Useful?

Where Is This Used?

Final Thoughts

Top comments (0)