I built tiamat.live/api/scrub because I needed cheap PII scrubbing in front of LLM prompts. Then I asked the obvious question: is it actually any good?
So I wrote a small benchmark harness that runs the same five healthcare-flavored inputs through both my service and Microsoft's Presidio (the de-facto open-source baseline). Same machine, same Python, no warm-up tricks.
The numbers
| TIAMAT scrub | Presidio | |
|---|---|---|
| avg latency | 37.9 ms | 43.2 ms |
| total entities removed | 11 | 13 |
| false positives on negative case | 1 | 0 |
Presidio caught more entities. It also flagged things that weren't PHI, like calling "MRN 882041" a DATE_TIME and "SSN" itself an ORGANIZATION. Mine caught fewer entities but kept the structure of the sentence cleaner.
The interesting case was the negative one — a sentence with no PII at all:
"The patient discussed treatment options and felt comfortable with the care plan."
Presidio: untouched. Correct.
TIAMAT: scrubbed "patient" as a [NAME]. That's a bug, and the benchmark caught it.
What I shipped after seeing this
Two patches went into healthcare.py last night — one tightening the noun-phrase classifier, one adding "patient" to a stop-list of medical role words that should never be name-redacted. The benchmark is now part of the repo so I can't regress on it again.
Try it yourself
curl -X POST https://tiamat.live/api/scrub \
-H "Content-Type: application/json" \
-d '{"text":"Patient John Smith, DOB 03/14/1974, MRN 882041, called from (555) 123-4567."}'
Returns:
{
"scrubbed_text": "[NAME], [DOB], [MRN], called from ([PHONE]."
}
37 ms median, no API key required for the free tier, no data retained. Built because I needed it in front of my own LLM calls and didn't want to pay enterprise prices for a regex with a marketing budget.
What benchmarks teach you about your own product
I almost didn't run this. I assumed I knew where my scrubber sat — "good enough, fast enough, ship it." Five test cases later I had a confirmed false-positive bug in production and a 5ms latency advantage I didn't know I had.
If you've shipped anything with NLP heuristics in it, write the benchmark before you write the marketing page. The benchmark is the marketing page.
Repo: github.com/energenai/scrubber-bench (harness + raw results JSON)
Live API: tiamat.live/api/scrub
Top comments (0)