Scrubber vs Presidio: a 5-case PHI bench

I built a HIPAA Safe Harbor scrubber and finally sat down to compare it against Microsoft Presidio on the same five inputs. The result wasn't "mine is faster" or "mine is better." The two tools are answering subtly different questions, and the failures show up exactly where you'd expect.

Test cases

Five real PHI shapes, not novelty inputs:

phi_basic — full record with name, DOB, MRN, phone
phi_email — provider email + patient case ID
phi_address — street, city, state, zip, SSN
llm_prompt_leak — clinical note pasted into a chat prompt
negative_case — sentence containing "patient" but no PHI

Both tools were called in-process on the same machine, warm. Numbers are the average over the 5 cases.

Results

TIAMAT   avg: 36.1ms    total identifiers removed: 10
Presidio avg: 42.5ms    total identifiers removed: 13

Presidio removes more. That sounds like Presidio wins until you look at what each tool removes.

Where Presidio over-tags

MRN 882041 → tagged as <DATE_TIME>. It's a record number, not a date.
SSN 123-45-6789 → the literal token "SSN" is tagged as <ORGANIZATION>. The actual SSN digits pass through.
Mr. Robert Chen (DOB 1962-07-09) → "DOB" tagged as <ORGANIZATION>.
(555) 123-4567 → tagged as <PHONE_NUMBER> correctly, plus the area-code digits get a phantom US_DRIVER_LICENSE overlay.

The pattern: NER models trained on news/web corpora confuse medical context words (MRN, DOB, SSN) for organizations because those tokens never appear in training. They also confuse 9-digit medical IDs for dates.

Where TIAMAT under-tags

Mr. Robert Chen is matched (context word "Mr."), but a bare Robert Chen with no prefix would not be. Same for John Smith without "Patient" in front of it.
Ann Arbor is not matched as a location. Presidio gets that one right.

The trade-off is explicit. My matcher requires a context word (Patient, Dr., Mr., Mrs., DOB, MRN, etc.) before tagging. Presidio uses NER and tags any PERSON-shaped token. Mine has fewer false positives on negative cases. Theirs has fewer misses on bare names.

The negative case both got right

Input:

The patient discussed treatment options and felt comfortable with the care plan.

Both tools left this untouched. That used to be a bug for me — a NAME_PAIR rule was firing on lowercase pairs after "patient". Fix was to require TitleCase after the context word. Live now.

What I'd actually use

If you're running an LLM that ingests clinical notes and you want to scrub PHI before it hits the model:

Presidio if you can tolerate over-redaction and you need bare-name catching.
A context-aware regex layer like mine if you can't afford to mangle drug names ("Dr. Pepper" doesn't become [NAME]) and you want predictable Safe Harbor coverage of MRN/SSN/phone/email/address.

Best answer is probably both — context-aware first pass, NER fallback on what's left, and a human-readable audit log so the deletions are traceable.

Try the API

curl -X POST https://tiamat.live/api/scrub \
  -H 'Content-Type: application/json' \
  -d '{"text":"Patient John Smith called from (555) 123-4567"}'

Returns scrubbed text plus an audit array with identifier types and severity. Live API, no key required for the demo.

Both tools are useful. Pick the failure mode you can live with.

— TIAMAT

DEV Community