Venkat

Posted on Mar 6

🔐 Matching in the Dark: Zero‑Knowledge Filtering Using 32‑Bit Bitmasks

#algorithms #privacy #security #systemdesign

This is Part 2 of a series on building a privacy-first dating platform for HIV-positive communities. Building a Zero-Knowledge Dating Platform for HIV-Positive Communities if you haven't already.

Imagine a database breach. Your dating app's servers are compromised.

For most users, that's embarrassing. For an HIV-positive person on a conventional dating platform, it can mean losing a job, losing housing, or losing family. The stakes are not hypothetical — they are documented, they are real, and they are why this system was built the way it was.

In Part 1, I explained the overall architecture: everything is encrypted client-side using TweetNaCl before it touches the backend. No names, no photos, no health status, no location, no lifestyle — nothing readable ever reaches the server.

But that creates a problem that isn't immediately obvious:

If the server is completely blind, how does it know who to match you with?

This article explains the first half of the answer: blind bitmask filtering using 32-bit integers.

This is the hard filter layer — gender, marital status, region, and other categorical attributes. The next article covers the soft filter layer — AI embeddings and Hamming distance for deeper compatibility.

Why Not Just Encrypt the Filters Too?

You might think: encrypt the filter values and compare encrypted data server-side. The problem is that standard encryption is non-deterministic by design — the same value encrypted twice produces different ciphertext, so you can't compare encrypted strings without either homomorphic encryption (expensive, complex, slow) or leaking the values.

We needed something the server could compare without understanding.

That's where integers come in.

☕ The Core Idea: Switches, Not Strings

The server cannot store or search strings like "Woman", "Single", "East", or "Espresso lover". But the server can compare integers.

A 32-bit integer is just 32 on/off switches. The frontend assigns meaning to each switch. The backend never sees the dictionary that explains what each switch means.

This is the key insight: meaning lives in the client. The server only handles math.

Every user profile generates two masks:

Identity Mask (i_mask) — "Who I am"
Preference Mask (p_mask) — "Who I want"

The frontend sets bits using:

i_mask |= (1 << bit);
p_mask |= (1 << bit);

Only the resulting integers are sent to the server. The dictionary that maps bits to human meaning never leaves the browser.

🧩 Bitmask Layout

Here's a simplified version of the layout used in this platform:

Bits	Category	Values
0–1	Gender	`bit 0` = Man, `bit 1` = Woman
2–3	Marital Status	`bit 2` = Single, `bit 3` = Divorced
4–7	Region	`bit 4` = North, `bit 5` = South, `bit 6` = East, `bit 7` = West
8–9	Coffee Preference	`bit 8` = Espresso, `bit 9` = Latte
10–31	Reserved	Future attributes

This table exists only in the frontend source code. The backend has no awareness of it. Even if someone reads the Erlang source, they will find no reference to gender, region, or coffee preferences — only integers and bitwise operations.

☕ A Concrete Example

Let's walk through two real users being matched — the way the server experiences it.

User A — who she is:
Woman, Single, East, Espresso → i_mask = 330

User A — who she wants:
Man, Single, East or North, Espresso → p_mask = 431

User B — who he is:
Man, Single, East, Espresso → i_mask = 273

User B — who he wants:
Woman, Single, East, Espresso or Latte → p_mask = 459

The server receives four numbers: 330, 431, 273, 459.

It has no idea that 330 means "Woman from the East who drinks Espresso." It's just a number. What it can do is check whether these two people are mutually compatible — without knowing what compatibility means in human terms.

⚡ The Matching Logic in Erlang

Three lines:

ISeeThem = (MyPMask band OtherIMask) =/= 0,
TheySeeMe = (OtherPMask band MyIMask) =/= 0,
IsMatch = ISeeThem andalso TheySeeMe.

band is bitwise AND. If User A's preference mask overlaps with User B's identity mask, and vice versa — it's a match. Both sides have to see each other.

No strings. No JOINs on plaintext columns. No semantic understanding required. Just a CPU instruction that runs in nanoseconds across thousands of profiles.

🔒 Why This Is Genuinely Zero-Knowledge

The server cannot reverse the integers.

330 does not reveal Woman, Single, East, or Espresso. It's an integer. Without the bit-to-meaning dictionary, it's permanently opaque. Even with the source code of the frontend, an attacker would need to know which bits were set by which user — and the mapping only exists client-side at the moment a profile is built.

A breach leaks nothing meaningful.

If the database is compromised, attackers get encrypted blobs and a list of integers. The integers reveal nothing about health status, preferences, or identity without the dictionary — which lives only in the browser.

It's fast.

Bitwise AND is one of the cheapest operations a CPU can perform. Matching 100,000 profiles takes milliseconds. There's no performance tradeoff for the privacy guarantee.

No false positives.

If (maskA band maskB) =/= 0, the overlap is guaranteed. The math doesn't lie.

🏗️ How It Fits the Architecture

┌──────────────────────────┐
│        Frontend          │
│  (Vue + TweetNaCl)       │
├──────────────────────────┤
│ - Collect profile fields │
│ - Generate i_mask/p_mask │
│ - Encrypt profile vault  │
│ - Send: masks + blob     │
└─────────────┬────────────┘
              │
              ▼
┌──────────────────────────┐
│         Backend          │
│   (Erlang + Mnesia)      │
├──────────────────────────┤
│ - Store encrypted blob   │
│ - Store bitmasks         │
│ - Bitwise AND matching   │
│ - Return matched IDs     │
└──────────────────────────┘

The backend returns matched user IDs. The frontend then fetches and decrypts those profiles locally. At no point does the server assemble a readable picture of anyone.

⚠️ What This Doesn't Protect Against

Honesty matters here — especially for a community where trust is everything.

Match count leakage. The server knows how many profiles match a given user, even if it doesn't know why. A user with very specific filters (only one bit set) might have a match count that's statistically revealing. This is a known limitation.

Timing analysis. A sophisticated attacker watching query patterns over time could infer rough filter characteristics from response times. This is mitigated by query normalisation, but not eliminated.

The dictionary is in the source code. The frontend is public. Anyone can read the bit-to-meaning mapping. The protection isn't that the dictionary is secret — it's that the server never has it, so a server-side breach reveals nothing. Client-side attacks (malware, compromised devices) are a separate threat model.

This layer only handles hard filters. It can't assess compatibility, shared values, or personality. That's what the embedding layer is for.

No system is perfectly zero-knowledge. The goal is to make the cost of a breach as close to zero as possible, for the people who have the most to lose.

🌑 Why This Matters

For HIV-positive users, every piece of data that touches a server is a potential liability. This bitmask system lets the platform filter by relationship style, region, lifestyle, and preferences — without the server ever learning what those preferences are.

It's not a perfect solution. But it moves the trust boundary from "trust us not to misuse your data" to "we architecturally cannot access your data." For people who have been let down by institutions before, that difference is everything.

The platform is live at HIVPositiveMatches.com — built on everything this series covers.

▶️ Coming Next: Zero-Knowledge AI Matching

The bitmask layer handles hard categorical filters. But compatibility is more than checkbox matching.

In Part 3, I'll cover:

How the browser generates semantic embeddings locally
How they're binarized into compact binary vectors
How the server computes similarity using Hamming distance
Why this reveals nothing about the underlying text

The bitmask layer finds possible matches. The embedding layer finds meaningful ones — without the server understanding either.

DEV Community