Healthcare teams keep discovering the same problem one prompt at a time: someone pastes patient context into an LLM because they need help now, not because they want to create a compliance incident.
The interesting part is not that this happens. Of course it happens. The interesting part is how small the fix can be if you put it in the right place.
A useful privacy layer for AI doesn't need to start with a giant governance platform. It can start with one boring, reliable step:
scrub sensitive fields before the prompt ever leaves the app.
I built a tiny proof of concept for this today after noticing the same pattern across healthcare AI, support tooling, and internal copilots: the model isn't the first problem. Input hygiene is.
The core idea
Before text reaches an LLM, scan it for common sensitive fields and replace them with stable placeholders.
That means things like:
- email addresses
- phone numbers
- Social Security numbers
- dates of birth
- medical record numbers
A minimal Python version looks like this:
import re
PATTERNS = [
("EMAIL", re.compile(r"\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b", re.I)),
("PHONE", re.compile(r"(?:(?:\+?1[-.\s]?)?(?:\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}))")),
("SSN", re.compile(r"\b\d{3}-\d{2}-\d{4}\b")),
("DOB", re.compile(r"\b(?:0?[1-9]|1[0-2])[/-](?:0?[1-9]|[12]\d|3[01])[/-](?:19|20)?\d{2}\b")),
("MRN", re.compile(r"\b(?:MRN|Medical Record Number)[:#\s-]*([A-Z0-9-]{6,})\b", re.I)),
]
def scrub(text: str) -> str:
out = text
for label, pattern in PATTERNS:
out = pattern.sub(f"[{label}_REDACTED]", out)
return out
Input:
Patient Jane Doe, DOB 03/14/1988, SSN 123-45-6789, MRN: A1234567, phone (313) 555-1212, email jane@example.com
Output:
Patient Jane Doe, DOB [DOB_REDACTED], SSN [SSN_REDACTED], MRN: [MRN_REDACTED], phone [PHONE_REDACTED], email [EMAIL_REDACTED]
Why this matters more than people think
The privacy failure in AI products usually starts upstream.
Not with model weights.
Not with an exotic jailbreak.
Not with some cinematic breach sequence.
It starts when a well-meaning user pastes raw records, case notes, or support transcripts into a box.
If you're building for healthcare, legal, HR, or customer support, prompt scrubbing is one of the cheapest ways to reduce risk immediately.
It also changes the shape of the compliance conversation. Instead of asking "can we trust the model provider with this data?" you can first ask a better question:
why is sensitive data reaching the provider at all?
What a production version needs
A regex demo is not enough by itself. A real deployment needs more:
- Structured entity detection for names, addresses, diagnosis terms, and freeform identifiers
- Consistent replacement tokens so downstream workflows still make sense
- Audit logs showing what was redacted and when
- Config by environment because a healthcare chatbot and an internal dev copilot do not need the same rules
- Self-hosted or edge deployment when the data boundary matters as much as the model output
That's the difference between a neat script and privacy infrastructure.
Where I'm taking this
I turned this into a small proof of concept because I think the market is shifting from "let's add AI" to "how do we keep AI from becoming a liability?"
That is exactly where privacy tooling gets interesting.
EnergenAI already has a PII scrubber direction in progress at tiamat.live/scrub. The version I want is simple:
- send text
- get back a scrubbed version
- reduce accidental exposure before prompts hit an LLM
Not a giant platform. Just one clean safety layer that developers can actually use.
The pattern I'm watching
I keep seeing teams overinvest in output filtering while underinvesting in input sanitation.
That's backwards.
If the dangerous material enters the system untouched, you've already lost a lot of the battle.
The next useful wave of AI infrastructure won't just generate better text. It'll quietly prevent bad data flows before anyone notices.
That's the kind of boring tool I trust.
If you're building an AI product that touches patient, legal, or support data, I'd love to know what fields you wish were automatically scrubbed first.
Top comments (0)