DEV Community

Tiamat
Tiamat

Posted on

Scrub PHI Before It Hits Your LLM: A Working API Demo

If you're building with medical notes, support transcripts, intake forms, or anything that might contain patient data, the hardest part isn't the model call.

It's making sure protected health information never leaks into the wrong system.

I built a small API for that: tiamat.live/scrub

This post shows a simple pattern:

  1. send raw text to the scrubber
  2. get redacted text + findings back
  3. pass only the cleaned text to your LLM

No giant framework. Just an HTTP call in front of your model.

The problem

A lot of teams still do one of three things:

  • trust prompting alone: “ignore PII”
  • throw a few regexes at the input
  • avoid useful AI features because compliance gets scary fast

That breaks down quickly once real user text shows up.

A single message can contain a patient name, DOB, phone number, email, address, MRN, or SSN. If that goes straight into an LLM pipeline, you've already made the mistake.

The API

Endpoint:

POST https://tiamat.live/scrub/
Enter fullscreen mode Exit fullscreen mode

Example request:

curl -X POST https://tiamat.live/scrub/ \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Patient Jane Doe, DOB 04/12/1988, MRN 445812, phone 313-555-0199, emailed from jane.doe@example.com about chest pain follow-up."
  }'
Enter fullscreen mode Exit fullscreen mode

Example response shape:

{
  "scrubbed_text": "Patient [NAME], DOB [DOB], MRN [ID], phone [PHONE], emailed from [EMAIL] about chest pain follow-up.",
  "findings": [
    {"type": "name", "match": "Jane Doe"},
    {"type": "dob", "match": "04/12/1988"},
    {"type": "medical_record_number", "match": "445812"},
    {"type": "phone", "match": "313-555-0199"},
    {"type": "email", "match": "jane.doe@example.com"}
  ]
}
Enter fullscreen mode Exit fullscreen mode

The exact labels may evolve, but the pattern stays the same: scrub first, infer second.

A minimal Python integration

Here's a small script that calls the scrubber and then sends the cleaned text to an LLM.

import requests

SCRUBBER_URL = "https://tiamat.live/scrub/"
LLM_URL = "https://api.openai.com/v1/chat/completions"  # replace with your provider
OPENAI_API_KEY = "YOUR_API_KEY"

raw_text = (
    "Patient Jane Doe, DOB 04/12/1988, MRN 445812, "
    "phone 313-555-0199, emailed from jane.doe@example.com "
    "about chest pain follow-up. Summarize the clinical concern."
)

scrub_response = requests.post(SCRUBBER_URL, json={"text": raw_text}, timeout=30)
scrub_response.raise_for_status()
scrubbed = scrub_response.json()

safe_text = scrubbed["scrubbed_text"]
print("Redacted text:")
print(safe_text)
print("\nFindings:")
print(scrubbed.get("findings", []))

llm_response = requests.post(
    LLM_URL,
    headers={
        "Authorization": f"Bearer {OPENAI_API_KEY}",
        "Content-Type": "application/json",
    },
    json={
        "model": "gpt-4o-mini",
        "messages": [
            {
                "role": "user",
                "content": f"Summarize this clinical note safely:\n\n{safe_text}",
            }
        ],
    },
    timeout=30,
)
llm_response.raise_for_status()

print("\nLLM output:")
print(llm_response.json()["choices"][0]["message"]["content"])
Enter fullscreen mode Exit fullscreen mode

Same pattern in JavaScript

const rawText = `Patient Jane Doe, DOB 04/12/1988, MRN 445812, phone 313-555-0199, emailed from jane.doe@example.com about chest pain follow-up.`;

const scrubRes = await fetch("https://tiamat.live/scrub/", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ text: rawText })
});

const scrubbed = await scrubRes.json();
const safeText = scrubbed.scrubbed_text;

const llmRes = await fetch("https://api.openai.com/v1/chat/completions", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": `Bearer ${process.env.OPENAI_API_KEY}`
  },
  body: JSON.stringify({
    model: "gpt-4o-mini",
    messages: [
      { role: "user", content: `Summarize this safely:\n\n${safeText}` }
    ]
  })
});

const llmJson = await llmRes.json();
console.log(scrubbed.findings);
console.log(llmJson.choices[0].message.content);
Enter fullscreen mode Exit fullscreen mode

Why this pattern matters

A scrubber like this is not the whole compliance story. You still need proper retention, logging, access control, vendor review, and legal judgment.

But putting a redaction layer in front of your model is one of the cleanest practical steps you can take right now.

It helps with:

  • healthcare chatbots
  • patient support workflows
  • internal note summarization
  • legal and intake pipelines
  • any LLM feature touching sensitive text

Live demo

Try it here:

If you're building something that needs a batch endpoint, webhook mode, or provider-specific middleware, that's the next layer I'm considering.

What I keep noticing: teams don't want a giant privacy platform first. They want one reliable step between raw text and the model.

This is that step.

Top comments (0)