Cor E

Posted on Apr 24

Why Your LLM Probably Has a PII Problem (And How to Fix It)

#ai #webdev #infosec #llm

Most teams building LLM applications think about prompt injection. Far fewer think about what happens when their users send sensitive personal data to their model.

It's happening right now. Users paste credit card numbers into chatbots to ask billing questions. They share SSNs in healthcare chat interfaces. They drop email addresses and phone numbers into support bots without a second thought. That data hits your LLM, gets logged, potentially ends up in fine-tuning datasets, and almost certainly violates whatever compliance framework your enterprise customers are bound by.

PII filtering at the application layer is the fix — and it's simpler to implement than most teams expect.

The Problem With Naive Regex

The obvious approach is regex. Match a credit card pattern, block it. Simple enough — until you realize that naive regex produces so many false positives it becomes useless in production.

A 16-digit number like 1234567890123456 matches every credit card regex pattern. But it's not a valid credit card. Any real Visa, Mastercard, or Amex number satisfies the Luhn algorithm — a checksum that eliminates the vast majority of random digit sequences.

def luhn_valid(number: str) -> bool:
    digits = [int(d) for d in number if d.isdigit()]
    digits.reverse()
    total = 0
    for i, d in enumerate(digits):
        if i % 2 == 1:
            d *= 2
            if d > 9:
                d -= 9
        total += d
    return total % 10 == 0

Same story with SSNs. The pattern \d{3}-\d{2}-\d{4} matches millions of strings that aren't valid Social Security Numbers. A real validator also needs to reject:

000-XX-XXXX — area 000 was never issued
666-XX-XXXX — area 666 was never issued
900-999-XX-XXXX — areas 900–999 are reserved
XXX-00-XXXX — group 00 was never issued
XXX-XX-0000 — serial 0000 was never issued

Without these checks, your filter will flag order numbers, invoice IDs, and timestamps that happen to match the pattern. That's the kind of false positive rate that gets a feature turned off within a week.

Flag Before You Redact

Here's a mistake teams make when rolling out PII filtering: they go straight to redaction, then spend weeks chasing false positives in production with no visibility into what got redacted or why.

A better approach is to start in flag mode. Detect hits and log them, but let content pass through unchanged. A week or two of real traffic gives you the data to validate accuracy before you commit to actually modifying content.

# Flag mode — detect and log, content unchanged
result = requests.post(
    "https://your-sentinel-endpoint/v1/scrub",
    headers={"X-Sentinel-Key": "sk_live_your_key"},
    json={"content": user_message, "tier": "standard"},
).json()

# pii_hits: number of PII matches found
# pii_types: categories detected (CREDIT_CARD, SSN, EMAIL, PHONE)
print(result["security"]["pii_hits"])   # e.g. 2
print(result["security"]["pii_types"])  # e.g. ["EMAIL", "PHONE"]
# safe_payload is unchanged in flag mode — content passed through

Once you're confident the detection is accurate, switch to redact mode. PII gets replaced with typed placeholders before content ever reaches your LLM:

# Redact mode — PII replaced with placeholders
# Input:  "My card is 4532015112830366 and email is john@example.com"
# Output: "My card is [CREDIT_CARD] and email is [EMAIL]"

The redacted text then flows through the rest of the security pipeline — injection detection, semantic similarity, everything — with the sensitive values already stripped.

The Compliance Angle

For most startups this feels like a nice-to-have. For enterprise customers in regulated industries, it's a hard requirement.

PCI-DSS — any system that processes, stores, or transmits cardholder data falls in scope. If your LLM reads credit card numbers, you're in scope. Redacting before the model sees them is one of the cleanest ways to limit that scope.
HIPAA — patient data, even in free-text form, is PHI. An LLM processing support tickets in a healthcare context needs PII controls.
SOC 2 — auditors will ask what controls you have over sensitive data flowing through your AI stack. "We filter it before the model sees it" is a much better answer than "we rely on the model not to log it."

This is increasingly the difference between landing enterprise deals and losing them on a compliance questionnaire.

Phase Coverage

Phase 1 of a solid PII filter covers the high-value patterns:

Type	Pattern	Validation
Credit cards	13–19 digit sequences	Luhn algorithm
SSNs	`\d{3}-\d{2}-\d{4}`	Segment validity checks
Email addresses	Standard RFC pattern	—
US phone numbers	E.164 + common formats	—

Phase 2 expands to IBANs (critical for European fintech), passport numbers, and custom regex patterns per tenant — so enterprise customers can bring their own PII definitions.

Putting It Together

The full flow looks like this:

User message
  → PII pre-pass (flag or redact)
    → HTML injection detection
      → Fast-path regex (prompt injection patterns)
        → Deep-path vector similarity
          → LLM

PII filtering runs first, before any other processing. In redact mode, the sanitized text — with [CREDIT_CARD] and [EMAIL] in place of real values — flows through the rest of the pipeline. The injection detection never sees the raw PII. Neither does your LLM.

PII filtering is built into Sentinel as a pre-pass in the scrub pipeline, available on Teams and Enterprise plans. The flag → redact rollout approach, Luhn validation, and SSN segment checks are all live today.

Top comments (2)

PEACEBINFLOW • Apr 24

The flag-before-redact rollout pattern is one of those ideas that seems obvious in retrospect but almost nobody does it on their first pass. Everyone wants to jump straight to blocking—it feels more secure, more proactive. But what actually happens is you deploy redaction on Friday, spend Monday morning fielding angry messages from users whose legitimate queries got mangled, and then someone on the team quietly bumps up the threshold or disables the filter entirely just to stop the noise. Flag mode buys you the data to avoid that cycle.

What I think is the quieter architectural insight here is the ordering of the pipeline. PII filtering runs first, before injection detection, before anything else. That means the injection detector never sees the raw PII. That's not just about security—it's about scope. If your injection detector is processing credit card numbers, congratulations, your injection detector is now part of your PCI-DSS scope. Moving PII filtering to the very front of the pipeline means everything downstream operates on sanitized text. Each layer only handles what it's designed to handle. That kind of separation of concerns is easy to say and harder to actually enforce in a real pipeline.

The Luhn algorithm as a gate on regex matches is the kind of practical touch that separates production-grade filtering from a hackathon proof of concept. A 16-digit number matches the credit card pattern, but it's not a credit card unless it satisfies the checksum. That one validation probably eliminates 90% of false positives on its own. It's the same reasoning behind the SSN segment checks: the pattern alone is a sieve, the semantic validation makes it actually useful. I'm curious how the Phase 2 custom regex-per-tenant works in practice—do enterprise customers actually bring their own PII definitions, or do they mostly rely on the built-in ones? The idea is elegant but I wonder if the average enterprise security team has the appetite to maintain their own regex library.

Cor E • May 4

I tried to make things as 'non-blocking' as possible. I've also been on that other end of support and if customers are winjing then I would be the 1st to do exactly what you said, change the threshold or shut it off.

PII is extremely tricky. Phase II is meant more to cover some of the European IBANs etc. purely basic info that can be handled with > 90% certainty.

Phase III would allow teams to define their own regex, but you are right, expecting companies to have a regex lib handy seems assuming to say the least. I was thinking of training a small classifier LLM to recognize PII and flag it for the audit trail, no regex needed.

Anyway, there's a lot to think about here, I want to tread carefully. I still need to figure out logging.

Regarding PCI-DSS, I'm sure there are financial institutions that are trying to solve this issue right now, without any audit trail, you are relying on vendors affirmations that all of your companies data is safe in their cloud without ever knowing what data was truely sent.