Redacting PII in LLM Traces Without Losing Debuggability

#observability #llm #security #privacy

Book: Observability for LLM Applications
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

You instrumented your LLM app with OpenTelemetry. Every prompt, every completion, every tool call lands in your tracing backend as a nice searchable span. Then someone from security opens Langfuse, types a customer's last name into the search box, and gets back their full chat history, their email, and the last four digits of a card they pasted into a support thread.

That is a data-handling incident. The same instrumentation that lets you debug a bad answer also turned your trace store into an unsecured copy of every conversation your users ever had. And LLM traces are worse than ordinary logs, because the payload you most want to capture (the full prompt and the full completion) is exactly the free-text field where PII hides.

You do not fix this by capturing less. You fix it by deciding what leaves the process, and stripping the rest before it does.

Where PII actually leaks into spans

It is not one field. The OpenTelemetry GenAI semantic conventions put model content on the span as events and attributes, and user data leaks into several of them:

gen_ai.prompt / completion events. The obvious one. The user typed their address, their name, a phone number. It is now an event body on the span.
System prompt. Often holds retrieved context. In a RAG app, that retrieved context is rows from your own database — customer records, ticket history, account notes.
Tool-call arguments. An agent calls lookup_customer(email="..."). The argument is on the span as gen_ai.tool.call.arguments. Card numbers, SSNs, and emails ride along inside structured tool inputs constantly.
Resource and request attributes. enduser.id, custom user.email tags someone added "just for debugging," session metadata.
Error messages. A stack trace that interpolated the failing input. The classic place secrets and PII end up by accident.

The pattern is the same in all five: the data is correct telemetry, captured by code doing its job. The problem is that it is identifiable and it is now sitting in a backend with a search box and a broad access list.

Redact at the span processor, not the call site

The wrong place to redact is the call site. If every chat() wrapper has to remember to scrub its own arguments, one forgotten wrapper leaks forever, and you cannot audit coverage.

The right place is a single span processor that every span passes through on its way out of the process. One choke point. One policy. Easy to test, easy to audit.

In the OpenTelemetry Python SDK, that is a SpanProcessor whose on_end runs before the exporter. Here is the shape:

import re
from opentelemetry.sdk.trace import SpanProcessor

EMAIL = re.compile(r"[\w.+-]+@[\w-]+\.[\w.-]+")
CARD = re.compile(r"\b(?:\d[ -]*?){13,16}\b")
PHONE = re.compile(r"\b\+?\d[\d -]{7,}\d\b")

SENSITIVE_KEYS = {
    "gen_ai.prompt",
    "gen_ai.completion",
    "gen_ai.tool.call.arguments",
    "user.email",
}

def scrub(text: str) -> str:
    text = EMAIL.sub("[EMAIL]", text)
    text = CARD.sub("[CARD]", text)
    text = PHONE.sub("[PHONE]", text)
    return text

class RedactingProcessor(SpanProcessor):
    def on_end(self, span) -> None:
        attrs = dict(span.attributes or {})
        for key, value in attrs.items():
            if not isinstance(value, str):
                continue
            if key in SENSITIVE_KEYS or "prompt" in key:
                span._attributes[key] = scrub(value)

One honest caveat about that snippet: writing to span._attributes reaches into SDK internals, because at on_end the public span.attributes map is read-only and the SDK gives you no in-place mutation hook. In production you do the same scrub inside a custom SpanExporter that wraps the spans before export, where you own the data you are about to send. The processor version above shows the policy in one place; the exporter wrapper is the supported path to enforce it.

Regex is the floor, not the ceiling. It catches structured PII — emails, cards, phones — with high precision and almost no cost. It will not catch a name typed in free text. For that you add a named-entity recognizer such as Microsoft Presidio in the same processor, behind the regex pass:

from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine

analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()

def scrub_entities(text: str) -> str:
    results = analyzer.analyze(
        text=text,
        entities=["PERSON", "LOCATION", "US_SSN"],
        language="en",
    )
    return anonymizer.anonymize(
        text=text, analyzer_results=results
    ).text

Presidio is slower than regex, so run it on a sampled fraction of spans if throughput matters, and keep the regex pass on every span. The two together give you cheap coverage of the common cases and best-effort coverage of the hard ones.

Hash what you still need to correlate

Stripping everything to [EMAIL] is safe and useless. The first thing you will want during an incident is "show me every trace from the user who reported this." If the email is [EMAIL] everywhere, you cannot group by user anymore.

The fix is a keyed hash. Replace the identifier with a deterministic token so the same input always maps to the same output, but the token cannot be reversed back to the value.

import hashlib
import hmac
import os

PII_HASH_KEY = os.environ["PII_HASH_KEY"].encode()

def pseudonymize(value: str) -> str:
    digest = hmac.new(
        PII_HASH_KEY, value.encode(), hashlib.sha256
    ).hexdigest()
    return f"u_{digest[:12]}"

Now alice@example.com becomes u_9f2c1a7b4e08 on every span. You can filter, group, and count by user without storing the email. The HMAC key is the part that matters: a plain SHA-256 of an email is trivially reversible by an attacker who hashes a list of candidate emails and matches. The keyed HMAC closes that, as long as the key lives in your secret manager and not in the repo.

Apply the same treatment to anything you need as a correlation key — user ID, session ID, account number. Free text gets redacted; identifiers get pseudonymized.

What to keep so you can still debug

The goal is not an empty trace. The goal is a trace that lets you reconstruct what happened without storing who it happened to. After redaction, keep:

Structure and timing. Span names, parent-child relationships, durations, status codes. Most LLM bugs are about the wrong tool firing or a step timing out, and none of that is PII.
Which tools ran, with redacted arguments. That an agent called refund_order then email_customer is the whole story of a tool-misfire bug. The order ID inside can be hashed.
Token counts, model ID, cost, finish reason. All non-identifying, all load-bearing for cost and drift work.
The shape of the text. Length, language, whether a citation was present, whether the JSON parsed. You can record prompt.length and completion.has_citation as derived attributes and they leak nothing.

You lose the ability to read a customer's exact words off the span. That is the trade. In exchange, your trace store stops being a liability, and the person doing the 2 a.m. debugging still has the call graph, the timings, the tool sequence, and a stable per-user token to group by.

A redaction policy you can write down

Put this in a document your security team can sign off on, because "we redact some stuff in a processor" is not auditable. A workable policy has four columns: field, classification, action, retention.

field                          class      action        retain
gen_ai.prompt                  free-text  regex+NER     30d
gen_ai.completion              free-text  regex+NER     30d
gen_ai.tool.call.arguments     mixed      regex+NER     30d
user.email / enduser.id        direct-id  HMAC token    90d
session.id                     direct-id  HMAC token    90d
span name / timing / status    none       keep          90d
token counts / model / cost    none       keep          1y

Two rules make the policy hold. First, default-deny on new attributes: an attribute not on the list is dropped, not kept, so a new user.phone someone adds next quarter does not silently leak until the next audit. Second, the processor is the only path to the exporter, enforced in code review, so there is no second route by which raw text reaches the backend.

Redaction at the edge of the process, pseudonyms for correlation, structure kept for debugging, and a written policy with default-deny. That is the whole approach, and it survives both the incident and the audit.

If this was useful

Tracing an LLM app and keeping it compliant pull in opposite directions, and the span processor is where you reconcile them. Observability for LLM Applications works through the OpenTelemetry GenAI conventions, what belongs on a span, and how to instrument prompts, tools, and RAG retrievals without turning your trace store into a breach waiting to happen.