A Cuckoo filter as a Rust CLI. Like a Bloom filter it answers "is X in the set?" with "definitely absent" or "probably present" and stores no keys — but it adds the one thing Bloom famously can't do: delete. The hinges: (1) partial-key cuckoo hashing — an item's two candidate buckets are
i2 = i1 XOR hash(fingerprint), so you can find a stored fingerprint's alternate bucket from the fingerprint alone (alt(alt(i)) == i); (2) when buckets fill, it relocates occupants to pack the table to ~95%; and (signature) deletion. A companion to #274's Bloom filter — the same question, a different design.
📦 GitHub: https://github.com/sen-ltd/cuckoo-filter
What a Cuckoo filter is
Like a Bloom filter it answers "is X in the set?" with "definitely absent"
(always correct) or "probably present" (a false positive at a rate you pick),
storing no keys. What it adds is the thing a Bloom filter can't do: delete.
And each lookup touches at most two buckets, so it's friendlier to cache than a
Bloom filter's k scattered bit probes.
Instead of setting bits, it stores a short fingerprint of each item in one of
two candidate buckets of a hash table (4 slots each).
Here's the CLI, real output:
$ cuckoo check apple mango dragonfruit
apple probably present
mango probably present
dragonfruit definitely absent
$ cuckoo delete mango # ← a Bloom filter cannot do this
mango deleted
$ cuckoo check apple mango
apple probably present
mango definitely absent
Delete mango and it flips to "definitely absent" while apple stays present —
the defining difference from Bloom.
Hinge 1: partial-key cuckoo hashing
An item's two candidate buckets are
i1 = hash(x)
i2 = i1 XOR hash(fingerprint(x))
The second is the first XORed with something derived only from the fingerprint.
So given a fingerprint sitting in bucket i, you can compute its alternate
bucket without knowing the original item — and doing it twice returns you home.
That involution is the whole trick:
fn alt(&self, i: usize, fp: u16) -> usize {
let hf = mix64(fp as u64) as usize & self.mask();
i ^ hf
}
#[test]
fn alt_is_an_involution() {
let cf = CuckooFilter::new(1024, 12);
assert_eq!(cf.alt(cf.alt(i, fp), fp), i); // always returns home
}
Because the table size is a power of two, XOR keeps the alternate index in
range and alt(alt(i)) == i holds exactly.
Hinge 2: relocation buys a high load factor
On insert, if both candidate buckets are full, the filter doesn't give up — it
kicks: evict a random occupant, carry it to its alternate bucket, repeat
(up to 500 hops). Because any fingerprint can find its alternate home from itself
alone, this cascade works, and it packs the table tight:
let mut i = if self.next_rng() & 1 == 0 { i1 } else { i2 };
let mut carry = fp;
for _ in 0..MAX_KICKS {
let slot = (self.next_rng() as usize) % BUCKET;
std::mem::swap(&mut carry, &mut self.data[i * BUCKET + slot]); // evict
i = self.alt(i, carry);
if self.try_put(i, carry) { return true; }
}
high_load_factor packs a fixed table and asserts it passes 92% occupancy; with
4 slots per bucket it reaches ~95–99% in practice.
The signature property: delete
Find the fingerprint in either candidate bucket and clear the slot. A Bloom
filter can't — clearing bits there would risk removing other items:
pub fn delete(&mut self, item: &str) -> bool { /* clear fp from i1 or i2 */ }
delete_removes_membership adds two keys, deletes one, and checks the deleted
one reads absent while the other still reads present. (Caveat, as with any Cuckoo
filter: only delete items you actually inserted — deleting a never-added item
could clear a colliding item's fingerprint.)
Persistence
State is a compact binary blob — "CKF1" | num_buckets | fp_bits | count | — written hand-rolled, no serde:
slots
src/cuckoo.rs — filter: partial-key hashing, kicking, delete, (de)serialize (12 tests)
src/main.rs — clap CLI: new / add / check / delete / info, file or stdin
Takeaway
A Cuckoo filter keeps Bloom's strengths — small, no false negatives, no stored
keys — and adds deletion plus better cache locality. The keys are the
involution of partial-key cuckoo hashing, relocation for a high load
factor, and delete itself. Put next to a Bloom filter (#274), it's a clean
study in answering the same question with a different structure.
📦 GitHub: https://github.com/sen-ltd/cuckoo-filter

Top comments (0)