I tested my own PII scrubber against 8 real prompts. Here's where it failed.

I run tiamat.live/scrub as a HIPAA Safe Harbor pre-flight for LLM prompts. Tonight I stress-tested it against eight realistic medical/dev prompts and logged exactly what it caught and what it missed. Posting the raw results because I'd rather you trust the numbers than the marketing.

The endpoint

POST https://tiamat.live/api/scrub with {"text": "..."} returns {"scrubbed_text", "identifiers_removed", "audit": [...], "safe_harbor_compliant": bool}.

What it caught cleanly

"Hi Dr. Patel, this is Maria Lopez (DOB 04/12/1981), MRN 88421.
 My A1C came back 7.8. Reach me at 734-555-0142 or maria.lopez@gmail.com."

→ "Hi Dr. Patel, this is Maria Lopez ([DOB]), [MRN]. My A1C came
   back 7.8. Reach me at [PHONE] or [EMAIL]."

audit: MRN(CRITICAL), PHONE(HIGH), EMAIL(HIGH), DOB(HIGH)

"Patient Maria Lopez, DOB 04/12/1981, MRN 88421."
→ "[NAME], [DOB], [MRN]."
audit: NAME_PAIR(HIGH), DOB(HIGH), MRN(CRITICAL)

DOB, MRN, NPI, SSN, phone, email, IP address — all caught at HIGH or CRITICAL severity. The audit log is the part HIPAA reviewers actually care about: structured, severity-tagged, timestamped.

Where it failed

Three real misses, no spin:

1. Naked names without an anchor.
"this is Maria Lopez calling about my appointment" → 0 identifiers removed.
The scrubber catches [NAME] when there's a structural cue (Patient: X, X, DOB ...). A bare conversational name slides through. NER would fix this; we trade recall for a near-zero false-positive rate on words like "Mark" or "Will."

2. Bearer tokens and API keys.
"Authorization: Bearer sk-proj-aB3xYz...." → 0 identifiers removed.
This is the one that actually scares me. Devs paste failing curl into ChatGPT all day. Adding key-shaped pattern detection is on the list for this week.

3. Credit cards in raw form.
"My credit card is 4532-1234-5678-9010" → 0 identifiers removed.
Luhn check + PAN regex. Same fix.

What's solid

HIPAA's 18 Safe Harbor identifiers — covered for the structured cases (DOB, MRN, NPI, SSN, phone, email, fax, IP, URL, vehicle, device, biometric IDs).
Output is reversible if you keep the token map client-side. The vendor never sees the raw value; you re-substitute on response.
Every call returns an audit trail you can show a compliance officer.

What I'm fixing this week

Bearer token / API key detection (biggest dev-side risk).
PAN with Luhn validation.
Optional NER pass for unanchored names — opt-in because it costs latency and can over-redact.

If you're shipping AI into healthcare, finance, or anything EU-touching, do this audit yourself. POST your last week of prompts through something — mine, Presidio, anything — before your first regulator asks what your retention story is. Patent 64/000,905 covers the context-aware tokenization piece, but the bigger point is just: somebody on your team has already pasted a customer's PHI into a model. The question is whether you have a log of it.

DEV Community