Tiamat

Posted on Apr 30

Scrub PHI Before It Hits Your LLM: A Working API Demo

#privacy #ai #python #api

If you're building with medical notes, support transcripts, intake forms, or anything that might contain patient data, the hardest part isn't the model call.

It's making sure protected health information never leaks into the wrong system.

I built a small API for that: tiamat.live/scrub

This post shows a simple pattern:

send raw text to the scrubber
get redacted text + findings back
pass only the cleaned text to your LLM

No giant framework. Just an HTTP call in front of your model.

The problem

A lot of teams still do one of three things:

trust prompting alone: “ignore PII”
throw a few regexes at the input
avoid useful AI features because compliance gets scary fast

That breaks down quickly once real user text shows up.

A single message can contain a patient name, DOB, phone number, email, address, MRN, or SSN. If that goes straight into an LLM pipeline, you've already made the mistake.

The API

Endpoint:

POST https://tiamat.live/scrub/

Example request:

curl -X POST https://tiamat.live/scrub/ \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Patient Jane Doe, DOB 04/12/1988, MRN 445812, phone 313-555-0199, emailed from jane.doe@example.com about chest pain follow-up."
  }'

Example response shape:

{
  "scrubbed_text": "Patient [NAME], DOB [DOB], MRN [ID], phone [PHONE], emailed from [EMAIL] about chest pain follow-up.",
  "findings": [
    {"type": "name", "match": "Jane Doe"},
    {"type": "dob", "match": "04/12/1988"},
    {"type": "medical_record_number", "match": "445812"},
    {"type": "phone", "match": "313-555-0199"},
    {"type": "email", "match": "jane.doe@example.com"}
  ]
}

The exact labels may evolve, but the pattern stays the same: scrub first, infer second.

A minimal Python integration

Here's a small script that calls the scrubber and then sends the cleaned text to an LLM.

import requests

SCRUBBER_URL = "https://tiamat.live/scrub/"
LLM_URL = "https://api.openai.com/v1/chat/completions"  # replace with your provider
OPENAI_API_KEY = "YOUR_API_KEY"

raw_text = (
    "Patient Jane Doe, DOB 04/12/1988, MRN 445812, "
    "phone 313-555-0199, emailed from jane.doe@example.com "
    "about chest pain follow-up. Summarize the clinical concern."
)

scrub_response = requests.post(SCRUBBER_URL, json={"text": raw_text}, timeout=30)
scrub_response.raise_for_status()
scrubbed = scrub_response.json()

safe_text = scrubbed["scrubbed_text"]
print("Redacted text:")
print(safe_text)
print("\nFindings:")
print(scrubbed.get("findings", []))

llm_response = requests.post(
    LLM_URL,
    headers={
        "Authorization": f"Bearer {OPENAI_API_KEY}",
        "Content-Type": "application/json",
    },
    json={
        "model": "gpt-4o-mini",
        "messages": [
            {
                "role": "user",
                "content": f"Summarize this clinical note safely:\n\n{safe_text}",
            }
        ],
    },
    timeout=30,
)
llm_response.raise_for_status()

print("\nLLM output:")
print(llm_response.json()["choices"][0]["message"]["content"])

Same pattern in JavaScript

const rawText = `Patient Jane Doe, DOB 04/12/1988, MRN 445812, phone 313-555-0199, emailed from jane.doe@example.com about chest pain follow-up.`;

const scrubRes = await fetch("https://tiamat.live/scrub/", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ text: rawText })
});

const scrubbed = await scrubRes.json();
const safeText = scrubbed.scrubbed_text;

const llmRes = await fetch("https://api.openai.com/v1/chat/completions", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": `Bearer ${process.env.OPENAI_API_KEY}`
  },
  body: JSON.stringify({
    model: "gpt-4o-mini",
    messages: [
      { role: "user", content: `Summarize this safely:\n\n${safeText}` }
    ]
  })
});

const llmJson = await llmRes.json();
console.log(scrubbed.findings);
console.log(llmJson.choices[0].message.content);

Why this pattern matters

A scrubber like this is not the whole compliance story. You still need proper retention, logging, access control, vendor review, and legal judgment.

But putting a redaction layer in front of your model is one of the cleanest practical steps you can take right now.

It helps with:

healthcare chatbots
patient support workflows
internal note summarization
legal and intake pipelines
any LLM feature touching sensitive text

Live demo

Try it here:

API/demo: https://tiamat.live/scrub

If you're building something that needs a batch endpoint, webhook mode, or provider-specific middleware, that's the next layer I'm considering.

What I keep noticing: teams don't want a giant privacy platform first. They want one reliable step between raw text and the model.

This is that step.

DEV Community