Mukunda Rao Katta

Posted on May 25

Stop sending your users' SSNs to the LLM

#hermeschallenge #ai #python #agents

The support ticket that woke me up

A user writes into support:

"Hi, I need help with my account. My SSN is 523-45-6789 and the card I used was 4111 1111 1111 1111."

Your support agent forwards the full message body to the LLM so it can draft a helpful reply. No redaction layer. The model sees the SSN and the card number in plaintext. It echoes them back in the reply. Now those values are in your inference logs, your context window history, and maybe your fine-tuning pipeline.

Nobody intended this. Nobody wrote code to make it happen. It happened because the default is to forward everything.

llm-pii-redact adds the redaction layer that should have been there from the start.

The shape of the fix

from llm_pii_redact import PiiRedactor

redactor = PiiRedactor()

user_message = (
    "Hi, I need help with my account. "
    "My SSN is 523-45-6789 and the card I used was 4111 1111 1111 1111."
)

redacted, session = redactor.redact(user_message)
print(redacted)
# "Hi, I need help with my account. My SSN is [SSN_1] and the card I used was [CC_1]."

You send redacted to the model. The model never sees the raw values. It drafts a reply referencing [SSN_1] and [CC_1] as placeholders.

After the model returns, you restore if you need to:

model_reply = "I can see your SSN [SSN_1] is already on file. The card [CC_1] looks correct."

restored = redactor.restore(model_reply, session)
print(restored)
# "I can see your SSN 523-45-6789 is already on file. The card 4111 1111 1111 1111 looks correct."

The session object holds the mapping. Same entity, same placeholder, within that session.

# Multiple occurrences of the same value get the same placeholder
text = "Email me at bob@example.com or forward to bob@example.com"
redacted, session = redactor.redact(text)
# "Email me at [EMAIL_1] or forward to [EMAIL_1]"

Custom patterns slot in alongside the built-ins:

import re

redactor = PiiRedactor(custom_patterns=[
    ("EMPLOYEE_ID", re.compile(r"\bEMP-\d{6}\b")),
])

text = "Employee EMP-004821 submitted the request."
redacted, session = redactor.redact(text)
# "Employee [EMPLOYEE_ID_1] submitted the request."

What it does NOT do

It does not understand context. "My number is five five five" will not be caught. Regex sees surface patterns, not meaning.
It does not strip PII from structured data like JSON payloads or nested tool call arguments. It operates on strings.
It does not guarantee complete coverage. Novel PII formats, non-English names, or obfuscated values can slip through.
It does not redact images, audio, or any non-text content.

Think of it as a first-pass filter, not a compliance guarantee.

Inside the lib: why Luhn matters for credit cards

Most PII libraries detect credit card numbers with a regex like \b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b. That pattern catches real cards. It also catches a lot of things that are not cards.

ISBN-13 numbers are 13 digits but often appear in 4-4-4-1 groupings when formatted. Internal product IDs, loyalty numbers, and license keys can all be 16 digits. A naive regex flags them all.

The Luhn algorithm is a simple checksum that every valid credit card number satisfies. A string that matches the 16-digit pattern but fails the Luhn check is not a credit card number. Random-looking strings and most internal IDs fail it.

llm-pii-redact only flags a credit card candidate if it passes both the regex and the Luhn check:

def _luhn_check(number: str) -> bool:
    digits = [int(d) for d in number if d.isdigit()]
    total = 0
    for i, digit in enumerate(reversed(digits)):
        if i % 2 == 1:
            digit *= 2
            if digit > 9:
                digit -= 9
        total += digit
    return total % 10 == 0

This cuts false positives significantly without adding any dependency.

When this is useful

Support pipelines. Any flow where raw user messages go into an LLM prompt is an exposure risk. Redact before forwarding.

RAG ingestion. Documents in your retrieval corpus may contain PII. Redacting at ingestion time keeps it out of retrieved chunks that land in prompts.

Fine-tuning datasets. Training data cleaned through this filter will have fewer raw PII artifacts baked into model weights.

Audit trails. Log the redacted prompt, not the original. Your inference logs stay clean without losing useful context about what the model was asked.

Multi-turn agents. The session object persists across redact calls, so the same entity in turn 3 gets the same placeholder it got in turn 1.

When NOT to use this

When you need the PII to complete the task. If your agent needs to look up an account by SSN, redacting it before the lookup step breaks the flow. Redact only on the way to the model, not on the way to your own systems.

When compliance is the requirement. This library reduces exposure. It is not a compliance framework. HIPAA, PCI-DSS, and GDPR have specific requirements around data handling that go far beyond regex redaction.

When the volume is high and latency matters. This runs Python regex against your prompt strings synchronously. On very high-throughput pipelines, the overhead adds up. Profile before committing to inline redaction on every call.

Install

pip install llm-pii-redact

Zero runtime dependencies. Python 3.9+.

Source: github.com/MukundaKatta/llm-pii-redact

36 tests covering built-in patterns, Luhn validation, session consistency, custom patterns, and restore fidelity.

Sibling libraries

These cover adjacent problems in the same space:

Lib	Boundary	Repo
tool-secret-scrubber	Credential and token patterns (API keys, tokens, secrets), distinct focus from user PII	MukundaKatta/tool-secret-scrubber
prompt-shield	Prompt injection detection, catches adversarial inputs trying to hijack the agent	MukundaKatta/prompt-shield
agenttap	Wire-level capture of the exact JSON sent to and from the LLM, redacts credentials in transport	MukundaKatta/agenttap
llm-output-validator	Validates the model response after it returns, rule-based checks on the output side	MukundaKatta/llm-output-validator

The typical stack: redact input with llm-pii-redact, scrub any leaked secrets with tool-secret-scrubber, detect injection attempts with prompt-shield, validate the model's reply with llm-output-validator.

What is next

A few things I want to add:

Structured input support. Accept a dict or list and walk the string leaves, so tool call arguments and RAG documents can be redacted without manual extraction.
Named entity recognition hook. Let users plug in a spaCy or transformers model for name detection instead of relying on the prefix list. The prefix approach works for common English names but misses everything else.
Async variant. An aredact method for use inside async agent loops without forcing a thread hop.
Redaction summary. Return counts of what was found, by type, alongside the redacted string. Useful for dashboards and audit logs.

If you use this in a support pipeline or RAG system and hit an edge case, open an issue. Real-world PII patterns are messier than test fixtures.

Built for the Hermes Agent Challenge. Part of a series of small, focused libraries for building safer and more predictable LLM-powered agents.

DEV Community