Mukunda Rao Katta

Posted on May 25

llm-pii-redact: Remove PII From Prompts Before They Reach the Model

#hermeschallenge #ai #python #agents

1. The support ticket pipeline

A company had 80,000 customer support tickets in a database. They wanted to run an LLM over them to categorize complaint types, spot product defects, and find resolution patterns.

The tickets were unstructured text. Customer names, emails, phone numbers, and account numbers were all mixed into the body of the ticket. Sometimes a ticket had a partial SSN. Sometimes a customer typed their credit card number into the support chat by mistake.

Sending those tickets to an external LLM API was a GDPR problem. Sending them to an internal model was acceptable but still had audit requirements. The question was: could they redact the PII before the LLM sees the text, let the LLM work on the redacted version, and then restore the original values in the output if needed?

Yes. That is exactly what llm-pii-redact does.

2. Shape of the fix

from llm_pii_redact import PIIRedactor

redactor = PIIRedactor()

ticket = """
Customer alice@example.com called about billing error.
Phone: 555-867-5309. Last four of card: 4532015112830366.
DOB on file: 1985-03-12. Case ref: SS# 078-05-1120.
"""

redacted, mapping = redactor.redact(ticket)
print(redacted)
# Customer [EMAIL_1] called about billing error.
# Phone: [PHONE_1]. Last four of card: [CREDIT_CARD_1].
# DOB on file: [DATE_1]. Case ref: SS# [SSN_1].

# Send redacted text to the model:
response = llm.complete(f"Categorize this support ticket: {redacted}")

# Restore original values in the response:
restored = redactor.restore(response, mapping)

The same value always gets the same placeholder within a session. If the customer's email appears three times in the ticket, all three become [EMAIL_1]. The mapping is a dict from placeholder to original value. You keep the mapping, use the redacted text with the model, then restore when you need human-readable output.

For batch processing:

results = []
for ticket in tickets:
    redacted, mapping = redactor.redact(ticket)
    response = llm.complete(redacted)
    results.append({
        "ticket_id": ticket["id"],
        "response": response,
        "pii_mapping": mapping,
    })
    # Store mapping separately from response if needed for audit

Custom patterns:

from llm_pii_redact import PIIRedactor, PatternRule

# Add a custom pattern for internal account IDs like ACC-123456
redactor = PIIRedactor(extra_rules=[
    PatternRule(
        name="ACCOUNT_ID",
        pattern=r"\bACC-\d{6}\b",
    )
])

3. What it does NOT do

It does not use NLP or named entity recognition. All detection is regex-based. A name typed in prose ("Alice called in") will not be caught unless you add a custom NER step. The library catches structured PII: emails, phone numbers, SSNs, credit cards, dates of birth in common formats.

It does not redact images, PDFs, or audio. Text only.

It does not guarantee 100% recall. Regex patterns miss unusual formats. A phone number written as "five five five eight six seven" will not be caught. Novel SSN formats may slip through. Use this as a first-pass filter, not a compliance guarantee.

It does not anonymize: it redacts with consistent placeholders. The same email always maps to [EMAIL_1] within a session. Two different emails in the same text get [EMAIL_1] and [EMAIL_2]. The placeholders are reversible. This is not k-anonymity. If reversibility is a problem, set reversible=False and the mapping is discarded after redaction.

The Luhn check applies to credit card patterns. It filters out numbers that match the credit card regex but fail Luhn validation. This reduces false positives on random 16-digit numbers. It does not validate the card against any network.

4. Inside the library

The repo is at MukundaKatta/llm-pii-redact. There are 36 tests.

Core types:

PIIRedactor: main class. Constructor takes extra_rules: list[PatternRule] = None, reversible: bool = True.
redact(text: str) -> tuple[str, dict]: returns redacted text and the placeholder-to-original mapping.
restore(text: str, mapping: dict) -> str: swaps placeholders back to original values.
PatternRule: dataclass with name (used as placeholder prefix), pattern (regex string), luhn_check: bool = False.
RedactorSession: internal class that holds the counter state and mapping for one redaction pass. One session per redact() call.

Built-in rules:

Email: RFC 5321-ish regex. Conservative.
Phone: US/international formats including dashes, dots, parentheses.
SSN: NNN-NN-NNNN format.
Credit card: 13-19 digit groups matching Visa/Mastercard/Amex/Discover patterns, Luhn-validated.
Date of birth: common date formats (YYYY-MM-DD, MM/DD/YYYY, DD-MM-YYYY). Marks dates only when near DOB context keywords like "DOB", "date of birth", "born". Standalone dates are not redacted by default.

The ordering matters. Rules are applied in order. Email is applied before URL patterns to avoid fragmenting email addresses inside URLs. The default rule order is tuned for support ticket text.

Session isolation: each redact() call gets a fresh counter. [EMAIL_1] in one call is independent of [EMAIL_1] in another. If you need consistent placeholders across multiple related texts, pass a session object explicitly.

5. When this is useful, when it is not

Useful when:

You send customer-facing text (support tickets, chat logs, form submissions) to an LLM and want to strip structured PII before the API call.
You have audit requirements that demand a log of what PII was present in a given LLM request without storing the actual values in the log. Store the mapping key-encrypted, store the placeholder text in the log.
You are building a RAG pipeline over documents that may contain PII and want to index and query on the redacted version.
You are doing evals or fine-tuning on real data and need to scrub PII from the training/eval set before it leaves your infrastructure.

Not useful when:

Your PII is primarily names in unstructured prose. The library will not catch "Alice Smith" in running text without NER.
You need compliance certification. This is a developer tool, not a GDPR compliance solution. Get a proper data governance review.
Your text contains PII in non-English formats: phone number formats outside US/international patterns, national ID formats outside SSN. Add custom PatternRule entries for those.
You need anonymization rather than redaction. If the placeholder-to-value mapping must never exist, a different approach (generalization, k-anonymization) is needed.

6. Install

The package is pending PyPI publication.

# PyPI (pending):
pip install llm-pii-redact

# From source:
git clone https://github.com/MukundaKatta/llm-pii-redact
cd llm-pii-redact
pip install -e .

No runtime dependencies. Python 3.9+.

# Run the tests:
pytest tests/ -v
# 36 tests, all passing

7. Siblings in the stack

Library	What it does
`tool-secret-scrubber`	Strip API keys, tokens, JWTs from tool call logs
`prompt-shield`	Pattern-based prompt injection detector
`agentguard`	Egress allowlist for agent tool calls
`tool-error-classify`	Closed ErrorKind enum for tool exceptions
`llm-output-validator`	Rule-based validation of LLM output text

The combination that covers the most ground: llm-pii-redact strips PII from the prompt before it reaches the model. tool-secret-scrubber strips secrets from tool outputs before they go back into the prompt. prompt-shield catches injection attempts. Together they handle the three most common data leakage vectors in LLM pipelines.

8. What comes next

The biggest gap right now is name detection. Structured PII (email, SSN, credit card) is well-handled by regex. Unstructured PII (names in prose, addresses in natural language) requires a different approach.

I plan to add an optional NER integration layer. When you install llm-pii-redact[spacy] or llm-pii-redact[flair], the redactor can use those models to detect names and addresses. The base library stays zero-dependency. The NER layer is opt-in.

Second: a redact_batch() method with consistent cross-document placeholder numbering. Right now each redact() call starts counters at 1. If you want [EMAIL_1] to refer to the same email across 50 related documents (a conversation thread, for example), you need a persistent session. I want a clean API for that.

Third: a confidence score per redaction. Right now it is binary: pattern matched or not. A confidence score would let you flag uncertain redactions for human review before the text goes to the model.

The 36 tests focus on the most common real-world patterns. Edge cases (international numbers, mixed-format dates, partial SSNs) are covered but the test surface will keep growing as I run it against real data.

Source: github.com/MukundaKatta/llm-pii-redact

Top comments (1)

Ilya Ploskovitov • Jul 21

Hi Mukunda, I'm the author of PII-Shield (redaction engine w/ a WASM SDK for Node/Python). Your llm-pii-redact reversible redact/restore pattern for the 80k-ticket GDPR case is a clean solve. One question: was the pain mostly PII reaching the LLM prompt itself, or does it also show up in your logs/observability stack once things are running? Even a short reply helps - happy to compare notes on false-positive handling either way.