DEV Community

SEN LLC
SEN LLC

Posted on

Building a Cuckoo Filter CLI in Rust — the Deletion a Bloom Filter Can't Do, Partial-Key Cuckoo Hashing, and a High Load Factor via Relocation

A Cuckoo filter as a Rust CLI. Like a Bloom filter it answers "is X in the set?" with "definitely absent" or "probably present" and stores no keys — but it adds the one thing Bloom famously can't do: delete. The hinges: (1) partial-key cuckoo hashing — an item's two candidate buckets are i2 = i1 XOR hash(fingerprint), so you can find a stored fingerprint's alternate bucket from the fingerprint alone (alt(alt(i)) == i); (2) when buckets fill, it relocates occupants to pack the table to ~95%; and (signature) deletion. A companion to #274's Bloom filter — the same question, a different design.

📦 GitHub: https://github.com/sen-ltd/cuckoo-filter

Screenshot

What a Cuckoo filter is

Like a Bloom filter it answers "is X in the set?" with "definitely absent"
(always correct) or "probably present" (a false positive at a rate you pick),
storing no keys. What it adds is the thing a Bloom filter can't do: delete.
And each lookup touches at most two buckets, so it's friendlier to cache than a
Bloom filter's k scattered bit probes.

Instead of setting bits, it stores a short fingerprint of each item in one of
two candidate buckets of a hash table (4 slots each).

Here's the CLI, real output:

$ cuckoo check apple mango dragonfruit
apple        probably present
mango        probably present
dragonfruit  definitely absent

$ cuckoo delete mango          # ← a Bloom filter cannot do this
mango  deleted

$ cuckoo check apple mango
apple  probably present
mango  definitely absent
Enter fullscreen mode Exit fullscreen mode

Delete mango and it flips to "definitely absent" while apple stays present —
the defining difference from Bloom.

Hinge 1: partial-key cuckoo hashing

An item's two candidate buckets are

i1 = hash(x)
i2 = i1 XOR hash(fingerprint(x))
Enter fullscreen mode Exit fullscreen mode

The second is the first XORed with something derived only from the fingerprint.
So given a fingerprint sitting in bucket i, you can compute its alternate
bucket without knowing the original item — and doing it twice returns you home.
That involution is the whole trick:

fn alt(&self, i: usize, fp: u16) -> usize {
    let hf = mix64(fp as u64) as usize & self.mask();
    i ^ hf
}
Enter fullscreen mode Exit fullscreen mode
#[test]
fn alt_is_an_involution() {
    let cf = CuckooFilter::new(1024, 12);
    assert_eq!(cf.alt(cf.alt(i, fp), fp), i); // always returns home
}
Enter fullscreen mode Exit fullscreen mode

Because the table size is a power of two, XOR keeps the alternate index in
range and alt(alt(i)) == i holds exactly.

Hinge 2: relocation buys a high load factor

On insert, if both candidate buckets are full, the filter doesn't give up — it
kicks: evict a random occupant, carry it to its alternate bucket, repeat
(up to 500 hops). Because any fingerprint can find its alternate home from itself
alone, this cascade works, and it packs the table tight:

let mut i = if self.next_rng() & 1 == 0 { i1 } else { i2 };
let mut carry = fp;
for _ in 0..MAX_KICKS {
    let slot = (self.next_rng() as usize) % BUCKET;
    std::mem::swap(&mut carry, &mut self.data[i * BUCKET + slot]); // evict
    i = self.alt(i, carry);
    if self.try_put(i, carry) { return true; }
}
Enter fullscreen mode Exit fullscreen mode

high_load_factor packs a fixed table and asserts it passes 92% occupancy; with
4 slots per bucket it reaches ~95–99% in practice.

The signature property: delete

Find the fingerprint in either candidate bucket and clear the slot. A Bloom
filter can't — clearing bits there would risk removing other items:

pub fn delete(&mut self, item: &str) -> bool { /* clear fp from i1 or i2 */ }
Enter fullscreen mode Exit fullscreen mode

delete_removes_membership adds two keys, deletes one, and checks the deleted
one reads absent while the other still reads present. (Caveat, as with any Cuckoo
filter: only delete items you actually inserted — deleting a never-added item
could clear a colliding item's fingerprint.)

Persistence

State is a compact binary blob — "CKF1" | num_buckets | fp_bits | count |
slots
— written hand-rolled, no serde:

src/cuckoo.rs — filter: partial-key hashing, kicking, delete, (de)serialize (12 tests)
src/main.rs   — clap CLI: new / add / check / delete / info, file or stdin
Enter fullscreen mode Exit fullscreen mode

Takeaway

A Cuckoo filter keeps Bloom's strengths — small, no false negatives, no stored
keys — and adds deletion plus better cache locality. The keys are the
involution of partial-key cuckoo hashing, relocation for a high load
factor, and delete itself. Put next to a Bloom filter (#274), it's a clean
study in answering the same question with a different structure.

📦 GitHub: https://github.com/sen-ltd/cuckoo-filter

Top comments (0)