Every developer building AI on sensitive data eventually discovers the same problem: you can't send raw PII to OpenAI or Claude, so you strip it first.
You replace John Smith with UUID-a4f2bc19. You send the scrubbed prompt. You get back a response.
The response references UUID-a4f2bc19 throughout.
Now what?
The Restoration Gap
Most PII scrubbing guides stop at step one: strip the data. Presidio, spaCy NER, regex — all solid tools for detection and removal. But the workflow that actually works in production requires a second step: restoring real values in the response.
Here's why opaque placeholders like UUIDs fail:
Prompt: "Summarize the risk profile of UUID-a4f2bc19's loan application.
Annual income: REDACTED-001. Credit score: MASKED-002."
Model output: "UUID-a4f2bc19 presents moderate risk due to REDACTED-001
income and MASKED-002 credit history..."
That response is useless. Your downstream system doesn't know who UUID-a4f2bc19 is without a lookup. The model also tends to treat opaque identifiers as data rather than placeholders, which degrades reasoning quality.
Semantic Placeholders Fix Reasoning Quality
The first fix is using semantic placeholders instead of opaque ones:
[NAME_1] instead of UUID-a4f2bc19
[SSN_1] instead of REDACTED-001
[SCORE_1] instead of MASKED-002
LLMs understand what [NAME_1] represents structurally — it's a person identifier. The model reasons correctly about the loan belonging to a person even without knowing who. Response quality improves significantly.
The Full Loop
The complete privacy-preserving workflow has three steps:
Step 1 — Scrub
original = "Patient Jane Doe, DOB 1985-03-12, MRN 4829301 reports chest pain"
scrubbed = "Patient [NAME_1], DOB [DATE_1], MRN [ID_1] reports chest pain"
mapping = {"NAME_1": "Jane Doe", "DATE_1": "1985-03-12", "ID_1": "4829301"}
Step 2 — Proxy to LLM
# Model reasons on [NAME_1], [DATE_1], [ID_1] — never sees real data
llm_response = "[NAME_1] is a 40-year-old patient. Based on [DATE_1] DOB and \
MRN [ID_1], recommend immediate cardiac evaluation."
Step 3 — Restore
final = llm_response
for key, value in mapping.items():
final = final.replace(f"[{key}]", value)
# Result: "Jane Doe is a 40-year-old patient. Based on 1985-03-12 DOB and
# MRN 4829301, recommend immediate cardiac evaluation."
The model never saw real PHI. The clinician gets a complete, usable response with real patient data.
One API Call vs DIY
Building this yourself means maintaining: NER models, regex fallbacks, placeholder mapping storage, provider SDKs for each LLM, restoration logic, and zero-log policy enforcement.
Or you can POST to /api/proxy:
curl -X POST https://tiamat.live/api/proxy \
-H 'Content-Type: application/json' \
-d '{
"provider": "openai",
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Summarize risk for John Smith, SSN 123-45-6789, income $94,000"}],
"scrub": true
}'
The proxy:
- Scrubs
John Smith→[NAME_1],123-45-6789→[SSN_1],$94,000→[INCOME_1] - Forwards scrubbed prompt to OpenAI — your IP never hits their servers
- Receives response
- Restores placeholders with real values
- Returns complete response to you
from tiamat_privacy import TiamatClient
client = TiamatClient()
response = client.proxy(
provider="openai",
model="gpt-4o",
messages=[{"role": "user", "content": "Analyze risk for John Smith, SSN 123-45-6789"}],
scrub=True
)
print(response["choices"][0]["message"]["content"])
# Full response with real names restored — model never saw them
What Gets Detected
The scrubber handles: names, SSNs, emails, phone numbers, credit cards, IP addresses, API keys/secrets, dates of birth, addresses.
Test the standalone scrubber free (no auth, 50/day):
curl -X POST https://tiamat.live/api/scrub \
-H 'Content-Type: application/json' \
-d '{"text": "Call John at 555-867-5309 or john@example.com re: SSN 123-45-6789"}'
{
"scrubbed": "Call [NAME_1] at [PHONE_1] or [EMAIL_1] re: [SSN_1]",
"entities": {
"NAME_1": "John",
"PHONE_1": "555-867-5309",
"EMAIL_1": "john@example.com",
"SSN_1": "123-45-6789"
}
}
Pricing
-
/api/scrub— $0.001 per request -
/api/proxy— provider cost + 20% - Free tier: 50 scrub/day, 10 proxy/day — no API key needed
Docs: tiamat.live/docs
The PII problem in LLM pipelines isn't just about what you send — it's about what comes back. Build the full loop.
Top comments (0)