PII Redaction Before LLM Calls: Keep User Data Out of Your Prompt Logs

#hermeschallenge #ai #python #agents

A user message arrives: "My name is Sarah Chen, my email is sarah.chen@example.com, and my credit card is 4532 0151 2345 6789. I need help with my order."

Your agent logs this message. Your step log now contains a real name, email, and credit card number. If the log file is ever accessed by someone who should not see it, or shipped to a log aggregation service, you have a data breach.

llm-pii-redact strips common PII patterns from text before logging or before sending to the model.

The Shape of the Fix

from llm_pii_redact import PIIRedact

redact = PIIRedact()

user_message = "My name is Sarah Chen, email sarah.chen@example.com, card 4532-0151-2345-6789"

clean = redact.redact(user_message)
# "My name is [NAME], email [EMAIL], card [CREDIT_CARD]"

# Log the clean version
step_log.record_user_message(clean)

# Send the original to the model (or the clean version if your policy requires it)
response = call_llm(messages=[{"role": "user", "content": user_message}])

The redacted version goes to logs. You decide whether the original or redacted version goes to the model based on your privacy policy.

What It Does NOT Do

llm-pii-redact does not perform semantic NER (Named Entity Recognition). It uses regex patterns. It will catch email addresses, credit cards (with Luhn check), phone numbers, SSNs, and IPv4 addresses. It will not catch all names (only common patterns), obscure ID formats specific to your domain, or PII embedded in images.

It does not guarantee complete redaction. Any regex-based approach has false negatives. It is a best-effort layer, not a compliance guarantee. For regulated industries (HIPAA, GDPR, PCI-DSS), you need a proper PII detection service, not a regex library.

It does not redact LLM outputs. It takes input text and returns redacted text. If the model echoes back PII from its context in the response, you need to run redact() on the response too.

Inside the Library

The core patterns with their replacement tokens:

import re

PATTERNS = [
    # Email addresses
    (re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'), '[EMAIL]'),

    # Phone numbers (US formats: (555) 555-5555, 555-555-5555, 5555555555)
    (re.compile(r'(\(?\d{3}\)?[\s.\-]?\d{3}[\s.\-]?\d{4})'), '[PHONE]'),

    # SSN
    (re.compile(r'\b\d{3}-\d{2}-\d{4}\b'), '[SSN]'),

    # IPv4 addresses
    (re.compile(r'\b(?:\d{1,3}\.){3}\d{1,3}\b'), '[IP_ADDRESS]'),

    # Credit card numbers (Luhn-checked, handled separately)
]

class PIIRedact:
    def __init__(self, custom_patterns: list | None = None):
        self._patterns = list(PATTERNS)
        if custom_patterns:
            self._patterns.extend(custom_patterns)

    def redact(self, text: str) -> str:
        result = text
        for pattern, replacement in self._patterns:
            result = pattern.sub(replacement, result)

        # Credit card: find potential card numbers, Luhn-check before redacting
        result = self._redact_credit_cards(result)
        return result

    def _luhn_check(self, number: str) -> bool:
        digits = [int(d) for d in number if d.isdigit()]
        total = 0
        reverse = digits[::-1]
        for i, d in enumerate(reverse):
            if i % 2 == 1:
                d *= 2
                if d > 9:
                    d -= 9
            total += d
        return total % 10 == 0

    def _redact_credit_cards(self, text: str) -> str:
        # Pattern: 13-19 digit sequences, possibly with spaces/dashes
        card_pattern = re.compile(r'\b(?:\d{4}[\s\-]?){3}\d{4}\b')

        def replace_if_luhn(match):
            raw = match.group()
            digits_only = re.sub(r'\D', '', raw)
            if self._luhn_check(digits_only):
                return '[CREDIT_CARD]'
            return raw

        return card_pattern.sub(replace_if_luhn, text)

    def redact_dict(self, data: dict, keys: list[str] | None = None) -> dict:
        result = {}
        for k, v in data.items():
            if keys and k not in keys:
                result[k] = v
            elif isinstance(v, str):
                result[k] = self.redact(v)
            elif isinstance(v, dict):
                result[k] = self.redact_dict(v, keys)
            else:
                result[k] = v
        return result

The Luhn check prevents false positives for credit cards: only sequences that pass the Luhn algorithm are redacted. Without this check, long digit sequences (order IDs, tracking numbers) would be incorrectly flagged.

When to Use It

Use it for log redaction. Run redact() on every user message before writing to your step log. This is the primary use case: you want to retain logs for debugging while not storing raw PII.

Use it as a prompt filter when your privacy policy requires it. If you are building an agent for a healthcare or financial context and users should not be sending PII to the model, redact it before the model sees it.

Use it for redact_dict() on tool call inputs and outputs. If your tools return documents that may contain PII, redact the tool output before logging it.

Skip it as a compliance guarantee. In regulated industries, you need proper PII detection, not regex. Use this as a first line of defense, not the only one.

Install

pip install git+https://github.com/MukundaKatta/llm-pii-redact

# Or from PyPI
pip install llm-pii-redact

from llm_pii_redact import PIIRedact
from agent_step_log import StepLog

redact = PIIRedact(
    custom_patterns=[
        # Add domain-specific patterns
        (re.compile(r'\b[A-Z]{2}\d{6}[A-Z]\b'), '[PASSPORT]'),
        (re.compile(r'\bEmployee\s+#\d{5}\b'), '[EMPLOYEE_ID]'),
    ]
)

def handle_user_message(msg: str, run_id: str) -> str:
    # Redact before logging
    clean_for_log = redact.redact(msg)
    step_log.record_user_input(clean_for_log, run_id=run_id)

    # Original message goes to model (or use clean version if required by policy)
    response = call_llm(messages=[{"role": "user", "content": msg}])

    # Redact response before logging too
    clean_response = redact.redact(response.text)
    step_log.record_model_response(clean_response, run_id=run_id)

    return response.text

Sibling Libraries

Library	What it solves
`tool-secret-scrubber`	Strip API keys, tokens, and JWTs from tool logs
`llm-redact-secrets`	Redact credentials and secrets from LLM payloads
`agent-guard-rails`	Composable output filters including PII checks
`agent-step-log`	Step log where redacted versions should be written
`prompt-shield`	Prompt injection detection (a different threat)

The privacy stack: llm-pii-redact for user data, tool-secret-scrubber and llm-redact-secrets for credentials, agent-guard-rails for output filtering.

What's Next

Name detection: first/last name pairs using a lightweight lookup against a common name list. The current regex misses most names. A 10K-name list would catch the majority without requiring NLP.

Reversible redaction: replace PII with deterministic placeholder tokens ([EMAIL_1], [EMAIL_2]) that can be reversed for authorized users who need to see the original. Useful for debugging workflows where you need to trace a specific user's interaction.

Audit mode: redact.audit(text) that returns the original text plus a list of what was found and where, without modifying the text. Useful for reviewing what the redactor would remove before applying it to production logs.

Built as part of the agent-stack family: composable Python primitives for production LLM agents.