Bloom filters felt like a purely academic data structure—until an agent pipeline started repeating work. At that point, they became immediately practical.
Problem
The system needed a fast, low-cost way to check whether something had probably been seen before.
Not certainty. A strong enough signal to avoid redundant work.
Failure Mode
The agent repeatedly:
- revisited identical document IDs
- re-triggered the same tool calls
- reprocessed items already handled minutes earlier
This created:
- unnecessary latency
- increased compute cost
- degraded pipeline efficiency
A lightweight pre-check layer was required.
Approach
Introduce a Bloom filter as a front-line gate:
- If definitely new → process
- If possibly seen → verify via authoritative store
Properties leveraged:
- No false negatives
- Acceptable false positives
Mental Model
A Bloom filter consists of:
- a fixed-size bit array
- multiple hash functions
- a probabilistic membership check
Insert
- hash value multiple times
- set corresponding bits to
1
Query
- if any bit is
0→ definitely not present - if all bits are
1→ possibly present
Implementation
class BloomFilter {
private bits = new Uint8Array(2048);
private readonly seeds = [17, 31, 53, 73];
private hash(value: string, seed: number) {
let hash = seed;
for (let i = 0; i < value.length; i++) {
hash = (hash * 33 + value.charCodeAt(i)) % this.bits.length;
}
return hash;
}
add(value: string) {
for (const seed of this.seeds) {
this.bits[this.hash(value, seed)] = 1;
}
}
has(value: string) {
return this.seeds.every(
(seed) => this.bits[this.hash(value, seed)] === 1
);
}
}
Where It Fit in My Agent Stack
I ended up using Bloom filters in three key places:
1. Event Deduplication
Before the agent processes anything, I filter out repeated inputs. This alone removed a lot of noise.
2. Retrieval Optimization
While scanning candidate documents, I skip anything that has likely been seen before. This reduced unnecessary lookups.
3. Tool Call Short-Circuiting
This was the biggest win.
Agents tend to repeat tool calls when context becomes messy. A Bloom filter doesn’t fix reasoning, but it stops the system from wasting cycles on the same targets again and again.
The Tradeoff I Respect
I don’t use Bloom filters when I need certainty.
I use them when I need:
- speed
- low memory usage
- a fast first-pass filter
They are not a source of truth.
They are a guardrail.
Final Take
Bloom filters work best as a front-line defense against wasted effort.
They don’t fix reasoning.
They don’t improve intelligence.
What they do is enforce discipline in the system—quietly, efficiently, and at scale.
In agent pipelines, that’s often exactly what is missing.
Discussion
How do you handle deduplication in your AI workflows?
- Redis / Postgres with exact checks?
- Probabilistic structures like Bloom or Cuckoo filters?
- Something hybrid?


Top comments (0)