Mukunda Rao Katta

Posted on May 25

Your Hermes agent's audit log is leaking customer emails. Here's a 100-line lib that fixes that.

#devchallenge #hermesagentchallenge #agents #security

Hermes Agent Challenge Submission: Write About Hermes Agent

This is a submission for the Hermes Agent Challenge.

I built a Hermes agent last week that takes a customer support email, decides whether it needs a refund, and either issues one or escalates to a human. Standard stuff. The agent worked. The problem started the moment I turned on audit logging.

Every run wrote a JSONL row to disk. Every row contained the full inbound message, the tool calls, the tool outputs, and the final reply. Within an hour the log had:

41 customer email addresses
7 partial credit card numbers (people paste them into support tickets, then apologize)
1 JWT from a webhook payload my agent decoded
1 leaked Stripe test key from a vendor reply
12 phone numbers
3 internal ticket IDs that should not have left the system

I was about to ship that log to S3 for run-history search. The log was also being mirrored to Sentry on error, and to a Slack channel on escalation. Three places to leak from. Zero scrubbing.

I went looking for a small lib that would clean a string before I wrote it. The options were either a paid API, a heavy NER-based PII detector, or a hand-rolled regex I would have to maintain myself. None of those fit a 200-line agent script.

So I built one. It is called agent-redact. The whole thing is around 130 lines, zero runtime dependencies, and pip-installs as agent-redact. Repo: MukundaKatta/agent-redact.

What it looks like

from agent_redact import redact

audit_line = (
    "tool=charge_customer args={'email': 'jane.doe@acme.com', "
    "'card': '4111 1111 1111 1111'} key=sk-" + "Z" * 40
)
print(redact(audit_line))

Output:

tool=charge_customer args={'email': '<email>', 'card': '<credit-card>'} key=<openai-key>

That is the default mode. One function call, one line of output. The pattern set covers email, US SSN, phone numbers, credit cards (with optional Luhn), and the common provider keys (OpenAI, Anthropic, AWS, GitHub, Google, Stripe, Slack), plus JWTs. No model call, no network, no config file.

Hash mode, when you need to keep rows distinguishable

The bigger pain with naive redaction is that you lose all join keys. If jane.doe@acme.com shows up in 30 audit rows, replacing every one with <email> means you can no longer ask "how many runs did this user trigger today" without going back to raw logs.

agent-redact ships a hash mode for exactly that case:

redact("user jane.doe@acme.com retried 3 times", mode="hash", salt="rotate-monthly")
# -> "user <email:7c3a91> retried 3 times"

Same email, same salt, same six-char tag every time. Different user, different tag. You can group, count, and filter on those tags without ever seeing the underlying address. Rotate the salt monthly and the tags rotate too.

Where this fits in the rest of the stack

This is the seventh small Python lib I have shipped in the same "boring middleware for agents" family. The others compose with it directly:

agenttrace writes per-run JSONL with token counts and latency. Pipe that through agent-redact before storage. There is a 30-line example in examples/integrate_with_agenttrace.py that walks rows recursively and scrubs every string node.
agentleash writes an audit log proving an agent stayed under a USD cap. Same scrubber, same hash mode, and now the proof you keep around does not double as a PII spill.
birddog is a scraping middleware. If the scraped page is going to a downstream LLM, run redact() on the page body first so the model never sees the raw payload.

Three integration points, one function, no extra deps.

Design notes worth calling out

Two things are worth flagging if you read the source.

First, overlap resolution. The Anthropic sk-ant- prefix is a strict superset of the OpenAI sk- prefix. Naive iteration over patterns would wrap a key twice, or wrap the wrong label. The fix: collect all matches across all patterns, sort by (earliest start, longest length), then walk forward and skip anything that starts before the previous match ended. Provider keys are listed before generic shapes so the tie-break goes the right way.

Second, the phone pattern needed a separator. The first version matched any 13-digit run, which meant a 16-digit credit card got partially eaten by the phone rule before the card rule ran. Requiring at least one space, dash, or + prefix in the phone pattern fixed that without losing real phone hits.

Try it

pip install agent-redact

from agent_redact import redact
print(redact("contact me at jane@example.com"))

Repo with all the patterns and tests: github.com/MukundaKatta/agent-redact.

If you are building a Hermes agent that touches user input, take 10 minutes this weekend and wrap your audit writer. Future-you, the one explaining the S3 bucket to your security team, will thank you.

Top comments (2)

nujovich • May 25

I realized I built a cron job that’s leaking early adopters emails for my services 🫣. Great contribution !

Mukunda Rao Katta • May 29

Ha, that's exactly the moment that made me write it. Easy to ship a cron job that quietly logs more than you meant to. Drop the redaction in front of the log writer and you're covered.