PII in Your Prompt Logs Is a Liability: Redact Before You Send

#hermeschallenge #ai #python #agents

The Logs That Told Too Much

You added prompt logging on a Tuesday. It was a quick addition: write every prompt to a local file before the API call. Great for debugging. You found three bugs in two days.

On Thursday your team ran a security scan. The scan flagged 47 files. The prompt log contained full names, email addresses, Social Security numbers formatted as "SSN: 123-45-6789", and a handful of credit card numbers that users had pasted into the chat. All of it was sitting in plaintext on disk.

The problem was not the logging. Prompt logging is useful. The problem was logging without redacting first. Those two things should not be coupled.

llm-pii-redact separates them. You redact the prompt before logging it and before sending it to the model if the model does not need the raw PII. When you need the original values back, you call restore(). The placeholders are reversible within a session.

The Shape of the Fix

from llm_pii_redact import Redactor

redactor = Redactor()

original = (
    "My name is Sarah Chen and my SSN is 487-34-9021. "
    "Please send the report to sarah.chen@example.com "
    "and bill card 4532015112830366."
)

redacted, mapping = redactor.redact(original)
print(redacted)
# My name is [NAME_1] and my SSN is [SSN_1].
# Please send the report to [EMAIL_1]
# and bill card [CARD_1].

# Log the redacted version, send to model, etc.
# Later, restore to original:
restored = redactor.restore(redacted, mapping)
assert restored == original

The mapping is a plain dict of {placeholder: original_value}. You control where it lives. Keep it in memory for a short session, serialize it to a secure store for longer jobs.

Credit card detection uses Luhn validation before flagging. A random 16-digit string that does not pass the Luhn check is not treated as a card number. This cuts false positives significantly.

You can add custom patterns:

from llm_pii_redact import Redactor, Pattern
import re

redactor = Redactor()
redactor.add_pattern(Pattern(
    name="EMPLOYEE_ID",
    regex=re.compile(r"\bEMP-\d{6}\b"),
))

text = "Employee EMP-004821 submitted the request."
redacted, mapping = redactor.redact(text)
# Employee [EMPLOYEE_ID_1] submitted the request.

Custom patterns use the same placeholder and mapping mechanism as built-in patterns. restore() works on custom placeholders without any extra configuration.

What It Does NOT Do

The library does not handle PII in structured data like JSON or CSV. It operates on plain strings. If your prompt is a JSON blob, serialize it to a string first, or extract and redact the relevant fields before building the prompt.

It does not use an LLM or external API for PII detection. Everything is regex-based. That means it is fast and works offline, but it will miss PII in non-standard formats and non-English text.

The Luhn check catches accidental false positives for credit cards, but it is not a security guarantee. A malicious actor who knows the library is in use could craft a number that passes Luhn but is not a real card. That is an adversarial scenario, not the primary use case.

The mapping is not encrypted by default. If you persist it to disk or a database, you are responsible for securing it. The library does not touch storage.

Redaction is not anonymization. The placeholders are reversible. If the mapping is compromised, the original values are exposed. For true anonymization, replace PII with synthetic but realistic values instead of reversible placeholders.

Inside the Library

The built-in pattern set covers: US Social Security numbers (\d{3}-\d{2}-\d{4}), email addresses (RFC 5321 simplified), US and international phone numbers, and credit card numbers (15-16 digits, Luhn-checked).

Patterns are applied in order. Each match is replaced with [TYPE_N] where N is a counter per type. This means two email addresses in the same text become [EMAIL_1] and [EMAIL_2]. The mapping keys are the placeholder strings, so restoration is a simple string replacement pass.

# Internals sketch (simplified)
class Redactor:
    def redact(self, text: str) -> tuple[str, dict[str, str]]:
        mapping = {}
        counters = {}
        for pattern in self._patterns:
            for match in pattern.regex.finditer(text):
                value = match.group()
                if pattern.name == "CARD" and not luhn_check(value):
                    continue
                counter = counters.get(pattern.name, 0) + 1
                counters[pattern.name] = counter
                placeholder = f"[{pattern.name}_{counter}]"
                mapping[placeholder] = value
                text = text.replace(value, placeholder, 1)
        return text, mapping

The 36 tests cover each built-in pattern independently, Luhn boundary cases (valid card passes, invalid card is ignored), custom pattern registration, restore round-trips, and overlapping patterns (SSN-like digits inside a phone number).

The one design choice I debated longest was whether restore() should accept the mapping as an argument or whether the Redactor instance should hold state. Accepting it as an argument won. Stateless restore means you can serialize the mapping and restore in a different process or on a different machine.

When It Helps and When It Doesn't

It helps for logging pipelines. Redact before writing to disk or sending to a log aggregator. Restore is rarely needed in this path since logs are for debugging, not for extracting the original data.

It helps when users paste sensitive data into prompts and the model does not need the real values to complete the task. A model summarizing a document does not need the actual SSN. Replace it with a placeholder. The summary is just as useful.

It helps less when the model's answer depends on the actual PII. If you ask the model to "validate this credit card number," redacting the number before sending it defeats the purpose.

It is not a substitute for access controls. Redacting prompts before logging is a defense-in-depth measure. It does not replace proper authentication, authorization, and data handling policies.

Install

pip install git+https://github.com/MukundaKatta/llm-pii-redact

Zero dependencies. Python 3.10+.

Quick start:

from llm_pii_redact import Redactor

r = Redactor()
safe_prompt, mapping = r.redact(user_prompt)
# log safe_prompt, send safe_prompt to model
# if you need originals: r.restore(safe_prompt, mapping)

Sibling Libraries

Library	What it does
`llm-output-validator`	Validate that output does not contain PII
`tool-secret-scrubber`	Strip API keys and tokens from tool logs
`llm-redact-secrets`	Redact API keys, JWTs, bearer tokens
`agent-redact`	Full redaction pipeline for agent sessions
`prompt-shield`	Pattern-based prompt injection detection

llm-pii-redact focuses on personal data. tool-secret-scrubber and llm-redact-secrets focus on credentials. They solve adjacent problems and are worth using together in a logging pipeline.

What's Next

The next feature is a streaming redaction mode. Currently, redact() operates on a complete string. For long prompts assembled incrementally, you need the full string before redacting. A streaming variant would process text in chunks, buffering at pattern boundaries.

The second planned feature is a "synthetic replacement" mode. Instead of [EMAIL_1], replace with a realistic but fake email address generated from a name pool. This makes redacted prompts more readable to humans while still protecting the real data.

The Hermes Agent Challenge gave me a clear forcing function: make the library usable in five lines of code for the common case. The Redactor class with no arguments and sensible defaults meets that bar. Custom patterns are opt-in. The basic redact-and-log workflow needs no configuration.

If you have patterns for PII formats specific to your region or industry, pull requests are welcome. The Pattern dataclass is simple and the test harness makes it easy to verify a new pattern without breaking existing ones.