The conversation you saved to disk still has the SSN in it

#hermeschallenge #ai #python #agents

How the audit finding landed

The support team built an AI agent to handle billing questions. The agent was good. Customers liked it. The team saved full conversation histories to disk so they could replay failures and improve the prompts.

A security audit came back six weeks later.

Every JSONL file on the server contained raw customer data. SSNs. Credit card numbers. Account PINs spoken in natural language. The conversations were just plain text in a file, one JSON object per line, no redaction, no encryption. The agent had been faithfully recording exactly what customers said.

Nobody had thought about it. The code that wrote the conversations to disk called json.dumps in a loop. It had no hook to inspect or transform the messages before writing. Adding one meant touching every place in the codebase that persisted conversations, and there were more of those places than anyone expected.

conversation-codec is a library that handles one narrow job: serialize and deserialize LLM message lists to JSONL, with a built-in slot for write-time redaction and optional encryption at rest.

The shape of the fix

from conversation_codec import ConversationCodec
from llm_pii_redact import PiiRedactor

redactor = PiiRedactor()

def redact_message(msg: dict) -> dict:
    if msg.get("role") == "user":
        msg = dict(msg)
        msg["content"] = redactor.redact(msg["content"])
    return msg

codec = ConversationCodec(
    path="conversations/session-1234.jsonl",
    redact=redact_message,
)

messages = [
    {"role": "user", "content": "My SSN is 523-45-6789, please help with my account."},
    {"role": "assistant", "content": "I can help with that. Can you verify your account number?"},
    {"role": "user", "content": "Sure, it is 00112233."},
]

codec.write(messages)

loaded = codec.read()
print(loaded[0]["content"])
# "My SSN is [SSN], please help with my account."

The SSN is gone from disk. It was never written. The redact callable ran before the file was touched.

Without encryption, you can add it too

from cryptography.fernet import Fernet

key = Fernet.generate_key()  # store this safely, not on disk next to the data

codec = ConversationCodec(
    path="conversations/session-1234.jsonl",
    redact=redact_message,
    fernet_key=key,
)

codec.write(messages)
# the file on disk is encrypted ciphertext

loaded = codec.read()
# transparent decryption on read

Redaction and encryption compose. The redact callable runs first, so the encrypted file contains the masked version, not the original.

Install

pip install conversation-codec

No dependencies outside the standard library. Fernet encryption requires cryptography, which is an optional install.

pip install conversation-codec cryptography

What it does NOT do

It does not decide what counts as PII. That is your job or llm-pii-redact's job.
It does not manage conversation state in memory. It reads and writes. You own the list between calls.
It does not do streaming or partial writes. One write call serializes the full message list.
It does not provide key management for Fernet. You supply the key and you are responsible for keeping it safe.

Inside the lib: write-time redaction

The design choice that matters here is when redaction runs.

Most developers think about redaction at read time. The idea is: store the raw data, apply a filter when you retrieve it. This is convenient because you can change the filter later without re-processing files.

Write-time redaction is the opposite choice. The unredacted data never touches the file. The filter runs before serialization. You lose the ability to re-derive the original from the stored copy, and that is the whole point.

If your redaction function replaces a user's email address with [EMAIL], the stored file contains [EMAIL]. A breach of the stored file does not expose the real address. A breach of the unredacted in-memory conversation is a different threat model and requires different controls.

The trade-off is irreversibility. Once written, the original is gone from that file. If your redaction logic had a bug and missed a field, you can fix the logic and re-process future conversations, but the past files stay as-is.

For compliance use cases, irreversibility is often the feature, not the limitation. HIPAA, PCI-DSS, and GDPR generally care about what is on disk and what is in your logs. "We never stored the raw value" is a cleaner answer than "we stored it but filtered it at query time."

When this is useful

Support agents and chat assistants that log conversations for replay or audit. Any agent that takes free-form user input, which may contain PII by accident, and needs a durable record.

Fine-tuning data pipelines where you want masked versions for training and cannot include real customer data in training sets.

Debugging sessions where you replay a conversation from a file. If the file was written with redaction, replaying it gives you the same masked view a reviewer would see, not the raw user input.

Multi-turn agents that checkpoint conversation state to disk. If the agent crashes and restarts, it reads back the masked conversation and continues. The checkpoint does not become a liability.

When NOT to use it

If you need the original content to be recoverable from disk, write-time redaction is the wrong tool. You would need to store a separately keyed copy of the original alongside the masked version, which is a different system entirely.

If your redaction requirements change frequently and you need to re-apply updated rules to historical data, storing raw content with a read-time filter gives you more flexibility. That choice comes with the audit risk described above.

If you are building a system where the LLM itself needs to see the original PII to do its job (for example, a medical records tool where the model needs the actual diagnosis codes), redaction before inference, not before persistence, is where you want to focus.

Siblings

Lib	Boundary	Repo
`llm-pii-redact`	the redact callable you pass to conversation-codec	MukundaKatta/llm-pii-redact
`agent-message-window`	trim the conversation before persisting	MukundaKatta/agent-message-window
`agent-resume`	persist checkpoints alongside conversations	MukundaKatta/agent-resume
`agenttap`	captures raw wire-level prompts, different from message history	MukundaKatta/agenttap

llm-pii-redact and conversation-codec are designed to be used together. llm-pii-redact handles the detection and replacement logic. conversation-codec handles the file I/O and calls your redact function at the right moment. Neither knows about the other. You wire them up in the constructor call.

agent-message-window is a different concern. It trims old messages to keep the conversation within token limits. If you trim and then persist, you get a shorter file. If you persist and then trim for inference, the full history is on disk but the model only sees a window. These are different choices about what goes to the model versus what gets stored.

agent-resume checkpoints job progress, not message content. It is useful when an agent processes a large batch and needs to survive a restart without reprocessing completed items. If the job involves LLM conversations, you might use both: agent-resume for the item-level checkpoint, conversation-codec for the per-item conversation history.

agenttap plugs into the HTTP transport layer and captures the exact JSON sent to the provider. That includes system prompts, tool schemas, and the full message array as it appears on the wire. conversation-codec captures the message list as your application code assembled it, before it reaches the SDK. They are different observation points on the same data flow.

What's next

The current redact callable signature is (dict) -> dict, one message at a time. A future version could accept a batch callable (list[dict]) -> list[dict] for cases where cross-message context matters for redaction (for example, a name mentioned in message 1 that appears again in message 7 without the role context that makes it obviously PII).

Append-mode writes would help long-running agents that want to flush each turn to disk incrementally rather than rewriting the full history. Right now, write is a full overwrite.

Rotation support, where files roll over after a size or message-count threshold, would be useful for agents that run indefinitely and accumulate large histories.

GitHub: MukundaKatta/conversation-codec