Most teams building LLM applications think about prompt injection. Far fewer think about what happens when their users send sensitive personal data to their model.
It's happening right now. Users paste credit card numbers into chatbots to ask billing questions. They share SSNs in healthcare chat interfaces. They drop email addresses and phone numbers into support bots without a second thought. That data hits your LLM, gets logged, potentially ends up in fine-tuning datasets, and almost certainly violates whatever compliance framework your enterprise customers are bound by.
PII filtering at the application layer is the fix — and it's simpler to implement than most teams expect.
The Problem With Naive Regex
The obvious approach is regex. Match a credit card pattern, block it. Simple enough — until you realize that naive regex produces so many false positives it becomes useless in production.
A 16-digit number like 1234567890123456 matches every credit card regex pattern. But it's not a valid credit card. Any real Visa, Mastercard, or Amex number satisfies the Luhn algorithm — a checksum that eliminates the vast majority of random digit sequences.
def luhn_valid(number: str) -> bool:
digits = [int(d) for d in number if d.isdigit()]
digits.reverse()
total = 0
for i, d in enumerate(digits):
if i % 2 == 1:
d *= 2
if d > 9:
d -= 9
total += d
return total % 10 == 0
Same story with SSNs. The pattern \d{3}-\d{2}-\d{4} matches millions of strings that aren't valid Social Security Numbers. A real validator also needs to reject:
-
000-XX-XXXX— area 000 was never issued -
666-XX-XXXX— area 666 was never issued -
900-999-XX-XXXX— areas 900–999 are reserved -
XXX-00-XXXX— group 00 was never issued -
XXX-XX-0000— serial 0000 was never issued
Without these checks, your filter will flag order numbers, invoice IDs, and timestamps that happen to match the pattern. That's the kind of false positive rate that gets a feature turned off within a week.
Flag Before You Redact
Here's a mistake teams make when rolling out PII filtering: they go straight to redaction, then spend weeks chasing false positives in production with no visibility into what got redacted or why.
A better approach is to start in flag mode. Detect hits and log them, but let content pass through unchanged. A week or two of real traffic gives you the data to validate accuracy before you commit to actually modifying content.
# Flag mode — detect and log, content unchanged
result = requests.post(
"https://your-sentinel-endpoint/v1/scrub",
headers={"X-Sentinel-Key": "sk_live_your_key"},
json={"content": user_message, "tier": "standard"},
).json()
# pii_hits: number of PII matches found
# pii_types: categories detected (CREDIT_CARD, SSN, EMAIL, PHONE)
print(result["security"]["pii_hits"]) # e.g. 2
print(result["security"]["pii_types"]) # e.g. ["EMAIL", "PHONE"]
# safe_payload is unchanged in flag mode — content passed through
Once you're confident the detection is accurate, switch to redact mode. PII gets replaced with typed placeholders before content ever reaches your LLM:
# Redact mode — PII replaced with placeholders
# Input: "My card is 4532015112830366 and email is john@example.com"
# Output: "My card is [CREDIT_CARD] and email is [EMAIL]"
The redacted text then flows through the rest of the security pipeline — injection detection, semantic similarity, everything — with the sensitive values already stripped.
The Compliance Angle
For most startups this feels like a nice-to-have. For enterprise customers in regulated industries, it's a hard requirement.
- PCI-DSS — any system that processes, stores, or transmits cardholder data falls in scope. If your LLM reads credit card numbers, you're in scope. Redacting before the model sees them is one of the cleanest ways to limit that scope.
- HIPAA — patient data, even in free-text form, is PHI. An LLM processing support tickets in a healthcare context needs PII controls.
- SOC 2 — auditors will ask what controls you have over sensitive data flowing through your AI stack. "We filter it before the model sees it" is a much better answer than "we rely on the model not to log it."
This is increasingly the difference between landing enterprise deals and losing them on a compliance questionnaire.
Phase Coverage
Phase 1 of a solid PII filter covers the high-value patterns:
| Type | Pattern | Validation |
|---|---|---|
| Credit cards | 13–19 digit sequences | Luhn algorithm |
| SSNs | \d{3}-\d{2}-\d{4} |
Segment validity checks |
| Email addresses | Standard RFC pattern | — |
| US phone numbers | E.164 + common formats | — |
Phase 2 expands to IBANs (critical for European fintech), passport numbers, and custom regex patterns per tenant — so enterprise customers can bring their own PII definitions.
Putting It Together
The full flow looks like this:
User message
→ PII pre-pass (flag or redact)
→ HTML injection detection
→ Fast-path regex (prompt injection patterns)
→ Deep-path vector similarity
→ LLM
PII filtering runs first, before any other processing. In redact mode, the sanitized text — with [CREDIT_CARD] and [EMAIL] in place of real values — flows through the rest of the pipeline. The injection detection never sees the raw PII. Neither does your LLM.
PII filtering is built into Sentinel as a pre-pass in the scrub pipeline, available on Teams and Enterprise plans. The flag → redact rollout approach, Luhn validation, and SSN segment checks are all live today.
Top comments (0)