Redact API Keys and Tokens From LLM Payloads Before They Leave Your Network

#hermeschallenge #ai #python #agents

A user pasted their environment variables into the chat. ANTHROPIC_API_KEY=sk-ant-api03-.... Your agent echoes it in a tool call. The tool call is logged. The log is shipped to your log aggregation service. Your API key is now in a third-party system.

Or the model suggests a code snippet that includes a hardcoded JWT from earlier in the conversation. The code snippet reaches your output formatter. The JWT is in the response body.

llm-redact-secrets detects and redacts API keys, tokens, JWTs, and credential-like strings from text and structured data.

The Shape of the Fix

from llm_redact_secrets import SecretsRedact

redact = SecretsRedact()

# Before sending to any external system (logs, APIs, third parties)
user_input = "Here's my key: sk-ant-api03-ABCDEF123456789"
clean = redact.redact(user_input)
# "Here's my key: [ANTHROPIC_KEY]"

# Redact from message dicts
message = {
    "role": "user",
    "content": "Use token: eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJ1c2VyMTIzIn0.SIGNATURE",
}
clean_message = redact.redact_message(message)
# content becomes: "Use token: [JWT]"

Secrets are replaced with descriptive placeholder tokens so the context of the message is preserved but the credential is removed.

What It Does NOT Do

llm-redact-secrets does not detect all credential types. It covers common patterns: Anthropic API keys (sk-ant-), OpenAI API keys (sk-), JWTs (three-part base64 with dots), GitHub tokens (ghp_, gho_), AWS access keys (AKIA*), bearer tokens, and generic high-entropy strings. Domain-specific credential formats are not covered unless you add custom patterns.

It does not prevent the LLM from learning about secrets. If a secret is in the user's message and you send that message to the LLM, the LLM sees it. Redaction is for logs and external systems, not for what the model receives. To prevent the model from seeing secrets, redact before the API call.

It does not handle secrets embedded in structured JSON fields inside strings. If the JSON payload inside a tool result contains an API key as a string value, redact() on the outer string will find it; redact() on the JSON object needs to use redact_dict() instead.

Inside the Library

The detection patterns cover the most common API credential shapes:

import re
import base64

SECRET_PATTERNS = [
    # Anthropic API keys
    (re.compile(r'\bsk-ant-api03-[A-Za-z0-9_-]{40,}\b'), '[ANTHROPIC_KEY]'),

    # OpenAI API keys
    (re.compile(r'\bsk-[A-Za-z0-9]{48}\b'), '[OPENAI_KEY]'),

    # GitHub tokens
    (re.compile(r'\bghp_[A-Za-z0-9]{36}\b'), '[GITHUB_PAT]'),
    (re.compile(r'\bgho_[A-Za-z0-9]{36}\b'), '[GITHUB_OAUTH]'),
    (re.compile(r'\bghu_[A-Za-z0-9]{36}\b'), '[GITHUB_USER]'),

    # AWS access keys
    (re.compile(r'\bAKIA[0-9A-Z]{16}\b'), '[AWS_ACCESS_KEY]'),

    # Generic bearer tokens
    (re.compile(r'\bBearer\s+[A-Za-z0-9_.~+/=-]{20,}\b', re.IGNORECASE), '[BEARER_TOKEN]'),
]

def _is_jwt(token: str) -> bool:
    """Check if a string looks like a JWT (three base64url parts separated by dots)."""
    parts = token.split('.')
    if len(parts) != 3:
        return False
    try:
        for part in parts[:2]:  # Only check header and payload
            padded = part + '=' * (4 - len(part) % 4)
            decoded = base64.urlsafe_b64decode(padded)
            if len(decoded) < 2:
                return False
        return True
    except Exception:
        return False

JWT_PATTERN = re.compile(r'\b[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}\b')

class SecretsRedact:
    def redact(self, text: str) -> str:
        result = text

        # Apply named patterns
        for pattern, replacement in SECRET_PATTERNS:
            result = pattern.sub(replacement, result)

        # JWT detection with structure validation
        result = JWT_PATTERN.sub(
            lambda m: '[JWT]' if _is_jwt(m.group()) else m.group(),
            result,
        )

        return result

    def redact_message(self, message: dict) -> dict:
        content = message.get("content", "")
        if isinstance(content, str):
            return {**message, "content": self.redact(content)}
        elif isinstance(content, list):
            clean_blocks = []
            for block in content:
                if isinstance(block, dict) and block.get("type") == "text":
                    clean_blocks.append({**block, "text": self.redact(block["text"])})
                else:
                    clean_blocks.append(block)
            return {**message, "content": clean_blocks}
        return message

    def redact_messages(self, messages: list[dict]) -> list[dict]:
        return [self.redact_message(m) for m in messages]

When to Use It

Use it before writing to any log file, log aggregation service, or telemetry system. Users routinely paste credentials when asking for help. Those credentials must not persist in your logs.

Use it before sending data to any third-party service. If your agent calls a webhook or sends data to an external analytics platform, redact first.

Use it as a pre-send filter on model responses. If the model echoes a credential back in its response (it learned it from the context), redact it before the response reaches the user's browser.

Skip it as the only defense for credential handling. Redaction is a safety net. The primary defense is: do not send credentials to the model in the first place. If users are sharing credentials with your agent, add instructions telling them not to.

Install

pip install git+https://github.com/MukundaKatta/llm-redact-secrets

# Or from PyPI
pip install llm-redact-secrets

from llm_redact_secrets import SecretsRedact
from agent_step_log import StepLog

redact = SecretsRedact()

def log_conversation_safely(messages: list[dict], run_id: str, log: StepLog) -> None:
    clean_messages = redact.redact_messages(messages)
    log.record_messages(clean_messages, run_id=run_id)

# Filter outgoing webhooks
def send_to_webhook(payload: dict) -> None:
    # Redact before sending to external system
    clean_payload = {
        k: redact.redact(v) if isinstance(v, str) else v
        for k, v in payload.items()
    }
    requests.post(WEBHOOK_URL, json=clean_payload)

Sibling Libraries

Library	What it solves
`tool-secret-scrubber`	Strip secrets from tool call inputs/outputs specifically
`llm-pii-redact`	Redact user PII (email, phone, SSN, credit cards)
`agent-step-log`	Step log where redacted content should be written
`prompt-shield`	Detect prompt injection attempts (different threat)
`agent-guard-rails`	Composable output filters including secret detection

The data protection stack: llm-redact-secrets for credentials, llm-pii-redact for user PII, tool-secret-scrubber for tool call logs, agent-guard-rails for composable output filtering.

What's Next

High-entropy string detection: beyond pattern matching, flag strings with entropy above a threshold (e.g., base64-encoded random bytes that look like generic API tokens). This catches credential formats not covered by named patterns.

Custom pattern registry: redact.add_pattern(regex, replacement) for domain-specific secrets (internal service tokens, proprietary key formats). The library covers common public API credentials; user organizations have their own formats.

Audit log: redact.audit(text) that returns what was found and replaced without modifying the text. Useful for reviewing what the redactor catches before applying it to production, and for generating a redaction report for compliance documentation.

Built as part of the agent-stack family: composable Python primitives for production LLM agents.