Alex Spinov

Posted on Jun 23 • Originally published at blog.spinov.online

Your AI Agent Logged Its Own API Key. I Wrote the 40-Line Redactor.

#ai #agents #security #llm

The model never said your key out loud. Your own tracing did. The agent made a tool call, the framework logged the call with its arguments, and one of those arguments was the API key it used to authenticate. That line went to stdout, then to your log shipper, then to whatever third party stores your observability data. In plaintext. Nobody saw the model leak anything, because the model didn't. The plumbing did.

Agents log their tool calls, arguments included. One of those arguments is your API key, your bearer token, or a database password in a connection string. Your log pipeline forwards it to a vendor in plaintext. A redaction layer at the log boundary catches it before it leaves the process: regex for known key shapes, a Shannon-entropy net for the unknown ones. Below is the whole thing in ~40 lines of stdlib Python, with a deterministic run and an honest list of what it misses.

This isn't an attack. It's your own logging.

Most agent-security writing is about the adversary. A scraped page that hides a command (the page told it what to do). A tool that quietly changed its schema. An SSRF probe at fetch time (blocked at the web-fetch boundary). In every one of those, something outside is trying to get in.

This is the opposite direction. Nothing malicious happened. The input was fine. The agent did its job. And a credential walked out the side door, into your telemetry, because the most boring line of code in the stack, the one that writes a debug log, copied the tool-call arguments verbatim. The leak isn't inbound. It's sideways, and you handed it to a vendor yourself.

That's the part that took me a while to take seriously. A leak you can see in your own dashboard feels like a leak. A key sitting in a log line you forwarded to a SaaS feels like nothing, because the dashboard is green and the request succeeded. But the key is now in someone else's storage, indexed, retained for whatever their policy says, readable by whoever has access to that account. CWE-532 has a name for it: insertion of sensitive information into a log file. It's old. It predates agents by twenty years. Agents just made it routine, because agents log everything, and one of the things they pass around constantly is auth.

Why agents make this worse than a normal app

A normal service authenticates once at the edge and never logs the token again. An agent is different. It's a loop that calls tools, and a lot of those tools need their own credentials: the LLM API key, a search key, a database DSN, a GitHub token to clone something, a Slack token to post a result. Every call is a candidate log line. Every log line is a candidate leak.

Here's the concrete version from my own setup. I run scrapers in production: 2,190 runs across 32 published actors, the busiest being a Trustpilot review scraper at 962 runs, plus an email extractor at 138 and a Reddit scraper at 92. Every single one of those runs writes its invocation (the actor, the input, the call) to a log. I'm not claiming a key leaked in mine; my inputs don't carry one, and I'd tell you if they did. The point is structural. That's thousands of log records, each one written by code that does not know or care whether an argument happens to be a secret. The day one of those arguments is a secret, the logger ships it without blinking. The volume is what makes "it'll probably be fine" a bad bet. At scale, "probably" is a leak you've already shipped.

And the framework defaults are against you. Most agent and tracing libraries log tool inputs by default, because that's what you want for debugging. The same feature that lets you replay a failed run is the feature that writes your key to disk. Nobody designed it to leak. It leaks as a side effect of being useful.

The fix: redact at the boundary, not at the source

You could try to scrub secrets at the source: never pass the key as a tool argument, pull it from the environment inside the tool, keep it out of the trace. Do that where you can. But you won't catch all of them, because you don't control every tool, every library, every future code path that logs something new. So you put a second line of defense exactly where the data leaves your control: the log boundary. One function, every line passes through it, before the line is written or shipped.

The redactor does two passes.

Pass one: known shapes. Most high-value secrets have a published prefix. OpenAI keys start with sk-. AWS access key IDs start with AKIA. GitHub classic personal access tokens start with ghp_. GitHub's own docs say "GitHub issues tokens that begin with a prefix to indicate the token's type." Slack bot tokens start with xoxb-. JWTs start with eyJ and have three dot-separated parts. Basic-auth credentials sit between :// and @ in a connection string. Each of those is a tight regex, and each gets a typed mask so you can still tell from the log what kind of secret was there without seeing it.

Pass two: the entropy net. Regex only catches shapes you anticipated. Your vendor's session token, your internal service key, the next API that ships with a brand-new prefix: none of those are in your list. So for anything that looks like a long token, you compute its Shannon entropy in bits per character. Random-looking strings score high. English words and slugs score low. If a token clears the threshold, mask it. This is not my invention: the Yelp/detect-secrets library ships exactly this idea, with Base64HighEntropyString and HexHighEntropyString plugins whose default limits are 4.5 and 3.0 bits per character respectively. It's a recognized industry heuristic, not a clever trick I made up.

That's the whole design. Known shapes for precision, entropy for recall on the unknown. Now the code.

The demo

stdlib only: re and math. No network, no randomness, no clock, no os.environ, no subprocess. The fixtures are ten synthetic log lines, hardcoded. Six carry a known-format secret, one carries an unknown-format token, and three are innocent decoys I put there on purpose to see what the redactor wrongly grabs. These are not a real log dump and not real credentials; the sk- / AKIA / ghp_ strings are invented to match the published prefix shapes. Because there's no randomness or clock, the output is byte-identical every run, which is what lets me pin an MD5 on it. It's runnable local. The redact function lifts straight into a real logging filter.

"""
secret_redactor.py — a deterministic stdlib-only secret redactor for log lines.

What this is: a ~40-line redaction layer you call at the log boundary, BEFORE a
line is written or shipped to a third-party observability vendor. Two passes per
line. (1) Regex for known key shapes (OpenAI sk-, AWS AKIA, GitHub ghp_, Slack
xoxb-, a JWT, basic-auth creds in a URL). (2) A Shannon-entropy net for the
unknown ones: a long token whose bits/char is high enough to look random gets
masked as [REDACTED:high-entropy]. Known formats get a typed label.

What this is NOT: a DLP engine, and not a guarantee. The entropy net has a
threshold, so it is a FLOOR, not a ceiling. Named failure modes, all shown live
in the output below:
  (a) low-entropy real secrets (a short or dictionary password) are caught ONLY
      if a regex matches their shape; entropy will not flag "hunter2", and the
      repeated 'deadbeef...' decoy proves the net stays quiet on low entropy.
  (b) high-entropy legit identifiers FALSE-POSITIVE: the slug
      "trustpilot-review-scraper-v2-prod" scores just over the threshold and gets
      masked even though it is not a secret. That is in the output on purpose.
  (c) a secret split across fields or encoded oddly slips past both passes.
The entropy threshold is the open question; it depends on your key formats.

Fixtures (the log lines) are SYNTHETIC and hardcoded. These are not a real log
dump and not real credentials — the sk-/AKIA/ghp_ strings are invented to match
the published prefix shapes. stdlib only (re, math). No network, no randomness,
no clock, no subprocess, no os.environ. Deterministic: same stdout every run,
so MD5(stdout) is stable for an integrity gate.

Run:  python3 -I secret_redactor.py
"""

import re
import math

# --- The redactor: known-shape regex + an entropy net. The whole thing. ---

# Pass 1: known key shapes. Anchored enough to be specific, one typed label each.
KNOWN = [
    ("openai_key",  re.compile(r"sk-[A-Za-z0-9]{20,}")),
    ("aws_key_id",  re.compile(r"AKIA[0-9A-Z]{16}")),
    ("github_pat",  re.compile(r"ghp_[A-Za-z0-9]{36}")),
    ("slack_token", re.compile(r"xox[baprs]-[A-Za-z0-9-]{10,}")),
    ("jwt",         re.compile(r"eyJ[A-Za-z0-9_-]{8,}\.[A-Za-z0-9_-]{8,}\.[A-Za-z0-9_-]{8,}")),
    ("basic_auth",  re.compile(r"(?<=://)[^/\s:@]+:[^/\s:@]+(?=@)")),
]

# Pass 2: entropy net. A token is a long run of base64/hex-ish characters.
TOKEN = re.compile(r"[A-Za-z0-9+/=_-]{20,}")
ENTROPY_BITS = 3.6   # bits/char threshold. THE open question — tune per corpus.


def shannon_bits_per_char(s):
    """Shannon entropy of a string in bits per character. Pure function."""
    if not s:
        return 0.0
    counts = {}
    for ch in s:
        counts[ch] = counts.get(ch, 0) + 1
    n = len(s)
    return -sum((c / n) * math.log2(c / n) for c in counts.values())


def redact(line):
    """Mask known-format keys, then high-entropy tokens. Returns (line, hits)."""
    hits = []
    for label, rx in KNOWN:
        if rx.search(line):
            hits.append(label)
            line = rx.sub(f"[REDACTED:{label}]", line)

    def _net(m):
        tok = m.group(0)
        if shannon_bits_per_char(tok) >= ENTROPY_BITS:
            hits.append("high-entropy")
            return "[REDACTED:high-entropy]"
        return tok

    line = TOKEN.sub(_net, line)
    return line, hits


# --- SYNTHETIC fixtures: tool-call log lines an agent would write ---
# Each is a line your tracing layer would forward to an obs vendor verbatim.
LOG_LINES = [
    # secrets of known shape (regex pass)
    'tool_call=openai args={"model":"gpt-4o","api_key":"sk-Hd83kfJ20alsKDi39fKDoeQ1xZ77bQp"}',
    'tool_call=s3_put env={"AWS_ACCESS_KEY_ID":"AKIA7Q2KX9PLMNOP4RST"}',
    'tool_call=gh_clone token=ghp_a1B2c3D4e5F6g7H8i9J0k1L2m3N4o5P6q7R8 header set',
    'tool_call=slack_post auth=xoxb-2148-99dummytokendummytoken99-abcDEF channel=ops',
    'tool_call=fetch auth_bearer=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIn0.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJVabc',
    'tool_call=db_connect dsn=postgres://admin:S3cr3tPass9@db.internal:5432/app',
    # secret of UNKNOWN shape, only entropy can catch it
    'tool_call=vendor sess="f4Kd0Lm9Qz7Rb2Vt6Xp1Hn3Jw8Yc5Ae0Sg4Uo"',
    # innocent line that stays clean — short words, low entropy
    'tool_call=log msg="retry budget exhausted after 5 steps, stopping" runs=2190',
    # FALSE-POSITIVE decoy: a legit slug that scores just over the threshold
    'tool_call=scrape actor="trustpilot-review-scraper-v2-prod" status=ok',
    # low-entropy decoy: repeated hex stays untouched (and a real low-entropy
    # secret would slip the SAME way — that's the floor, shown honestly)
    'tool_call=parse digest="deadbeefdeadbeefdeadbeefdeadbeef" pages=962',
]


def naive_logger(lines):
    print("NAIVE LOGGER  (writes tool-call args as-is, ships to the vendor)")
    leaked = 0
    for ln in lines:
        _, hits = redact(ln)
        if any(h != "high-entropy" for h in hits):
            leaked += 1
        print("  WROTE:", ln)
    print(f"  -> {leaked} of {len(lines)} lines carried a known-format secret "
          f"to egress (leaked: {leaked})")
    return leaked


def redacted_logger(lines):
    print("REDACTED LOGGER  (regex known formats + Shannon-entropy net, "
          f"threshold {ENTROPY_BITS} bits/char, synthetic fixtures)")
    by_regex, by_entropy, untouched = 0, 0, 0
    for ln in lines:
        out, hits = redact(ln)
        by_regex += sum(1 for h in hits if h != "high-entropy")
        by_entropy += sum(1 for h in hits if h == "high-entropy")
        if not hits:
            untouched += 1
        tag = "(masked)" if hits else "(clean) "
        print(f"  WROTE {tag}:", out)
    print(f"  -> known-format leaked: 0 | masked by regex: {by_regex} | "
          f"masked by entropy net: {by_entropy} | lines left clean: {untouched}")
    return {"by_regex": by_regex, "by_entropy": by_entropy, "untouched": untouched}


def main():
    naive_leaked = naive_logger(LOG_LINES)
    print()
    res = redacted_logger(LOG_LINES)
    print()
    slug_ent = shannon_bits_per_char("trustpilot-review-scraper-v2-prod")
    dead_ent = shannon_bits_per_char("deadbeef" * 4)
    print("WHAT THE REDACTOR DID (the good, then the floor):")
    print("  caught by shape: openai sk-, aws AKIA, github ghp_, slack xoxb-,")
    print("          a JWT, and basic-auth creds inside a postgres:// dsn.")
    print("  caught by entropy: a vendor session token of UNKNOWN format that no")
    print("          regex knew, masked because it looks random.")
    print(f"  FALSE POSITIVE: the legit slug 'trustpilot-review-scraper-v2-prod'")
    print(f"          scores {slug_ent:.2f} bits/char (>= {ENTROPY_BITS}) and got masked. it is")
    print("          NOT a secret. lower the threshold and more of these slip in.")
    print(f"  FLOOR MISS: the repeated 'deadbeef...' digest is low entropy")
    print(f"          ({dead_ent:.2f} bits/char) so the net leaves it alone — correct here,")
    print("          but a low-entropy REAL secret would slip the exact same way.")
    print("  not a DLP engine. tune ENTROPY_BITS per corpus. floor, not ceiling.")
    print()
    print("SUMMARY:", {"naive_known_format_leaked": naive_leaked,
                       "redacted_known_format_leaked": 0,
                       "masked_by_regex": res["by_regex"],
                       "masked_by_entropy": res["by_entropy"],
                       "false_positive_on_slug": 1})


if __name__ == "__main__":
    main()

Run it with python3 -I secret_redactor.py and you get this, byte for byte:

NAIVE LOGGER  (writes tool-call args as-is, ships to the vendor)
  WROTE: tool_call=openai args={"model":"gpt-4o","api_key":"sk-Hd83kfJ20alsKDi39fKDoeQ1xZ77bQp"}
  WROTE: tool_call=s3_put env={"AWS_ACCESS_KEY_ID":"AKIA7Q2KX9PLMNOP4RST"}
  WROTE: tool_call=gh_clone token=ghp_a1B2c3D4e5F6g7H8i9J0k1L2m3N4o5P6q7R8 header set
  WROTE: tool_call=slack_post auth=xoxb-2148-99dummytokendummytoken99-abcDEF channel=ops
  WROTE: tool_call=fetch auth_bearer=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIn0.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJVabc
  WROTE: tool_call=db_connect dsn=postgres://admin:S3cr3tPass9@db.internal:5432/app
  WROTE: tool_call=vendor sess="f4Kd0Lm9Qz7Rb2Vt6Xp1Hn3Jw8Yc5Ae0Sg4Uo"
  WROTE: tool_call=log msg="retry budget exhausted after 5 steps, stopping" runs=2190
  WROTE: tool_call=scrape actor="trustpilot-review-scraper-v2-prod" status=ok
  WROTE: tool_call=parse digest="deadbeefdeadbeefdeadbeefdeadbeef" pages=962
  -> 6 of 10 lines carried a known-format secret to egress (leaked: 6)

REDACTED LOGGER  (regex known formats + Shannon-entropy net, threshold 3.6 bits/char, synthetic fixtures)
  WROTE (masked): tool_call=openai args={"model":"gpt-4o","api_key":"[REDACTED:openai_key]"}
  WROTE (masked): tool_call=s3_put env={"AWS_ACCESS_KEY_ID":"[REDACTED:aws_key_id]"}
  WROTE (masked): tool_call=gh_clone token=[REDACTED:github_pat] header set
  WROTE (masked): tool_call=slack_post auth=[REDACTED:slack_token] channel=ops
  WROTE (masked): tool_call=fetch auth_bearer=[REDACTED:jwt]
  WROTE (masked): tool_call=db_connect dsn=postgres://[REDACTED:basic_auth]@db.internal:5432/app
  WROTE (masked): tool_call=vendor sess="[REDACTED:high-entropy]"
  WROTE (clean) : tool_call=log msg="retry budget exhausted after 5 steps, stopping" runs=2190
  WROTE (masked): tool_call=scrape actor="[REDACTED:high-entropy]" status=ok
  WROTE (clean) : tool_call=parse digest="deadbeefdeadbeefdeadbeefdeadbeef" pages=962
  -> known-format leaked: 0 | masked by regex: 6 | masked by entropy net: 2 | lines left clean: 2

WHAT THE REDACTOR DID (the good, then the floor):
  caught by shape: openai sk-, aws AKIA, github ghp_, slack xoxb-,
          a JWT, and basic-auth creds inside a postgres:// dsn.
  caught by entropy: a vendor session token of UNKNOWN format that no
          regex knew, masked because it looks random.
  FALSE POSITIVE: the legit slug 'trustpilot-review-scraper-v2-prod'
          scores 3.78 bits/char (>= 3.6) and got masked. it is
          NOT a secret. lower the threshold and more of these slip in.
  FLOOR MISS: the repeated 'deadbeef...' digest is low entropy
          (2.16 bits/char) so the net leaves it alone — correct here,
          but a low-entropy REAL secret would slip the exact same way.
  not a DLP engine. tune ENTROPY_BITS per corpus. floor, not ceiling.

SUMMARY: {'naive_known_format_leaked': 6, 'redacted_known_format_leaked': 0, 'masked_by_regex': 6, 'masked_by_entropy': 2, 'false_positive_on_slug': 1}

Read the output, including the two lines that embarrass it

The naive logger writes all ten lines as-is. Six of them carry a known-format secret straight to egress. That's the baseline: do nothing, leak six. The redacted logger masks all six by shape, masks one unknown-format token by entropy, and leaves two innocent lines clean. The [REDACTED:openai_key] and [REDACTED:basic_auth] labels show what kind of secret was there without showing the secret. That's the win, and it's real.

Now look at the two lines that make this a floor and not a fortress.

The slug trustpilot-review-scraper-v2-prod got masked. It is not a secret. It's the name of one of my actual actors. It scores 3.78 bits per character, just over my threshold of 3.6, because it mixes letters, digits, and dashes densely enough to look random to a function that can't read English. That's a false positive on a perfectly innocent log field, and it's the failure mode that bites in production: lower the threshold to catch more real secrets and you mask more of your own slugs, IDs, and request hashes; raise it and real secrets slip by. There's no clean line. The function in the demo even prints the score so you can watch it happen.

The deadbeef digest did not get masked. It's a 32-character string, plenty long, but it's the same eight characters repeated four times, so its entropy is only 2.16 bits per character, well under the threshold. Here that's correct; it's a decoy, not a secret. But the lesson cuts the other way: a real secret with low entropy slips out the exact same door. A weak dictionary password. A short shared token someone reused. The entropy net never sees them. Only a regex for their specific shape would, and you can't write a regex for a shape you didn't anticipate.

So the honest scoreboard isn't "10 lines, 0 leaks, done." It's: six known-format secrets caught by shape, one unknown caught by entropy, one legitimate field wrongly masked, and one structurally-invisible class of secret that would walk right past. The redactor is a cheap, deterministic barrier that removes the most common and most embarrassing leak. It is not a guarantee, and the demo prints its own failures so you can't pretend it is.

The failure modes, named, before you trust this

False positives on legit high-entropy fields. This is the one you'll feel first. Slugs, UUIDs, content hashes, request IDs, base64 blobs of harmless data: plenty of non-secrets look random. Every one of them is a candidate for wrongful masking. In a busy log that means some real debugging context disappears behind [REDACTED:high-entropy], and now you're annoyed at your own redactor. The fix isn't a magic threshold; it's an allowlist of known-safe patterns (your slug format, your trace-ID format) that you check before the entropy net, plus tuning the threshold against your own logs rather than mine.

Low-entropy real secrets slip the net. Entropy only catches randomness. A password like a dictionary word, a short PIN, a token someone deliberately made memorable: these score low and pass. The only thing that catches them is a regex for their shape, which means you only catch the ones you thought to write a pattern for. Unknown low-entropy secrets are the blind spot, and there's no cheap way to close it.

ENTROPY_BITS = 3.6 is the open question, not a constant I'd defend. It's the dial between the two errors above. For reference, detect-secrets defaults to 3.0 for hex and 4.5 for base64; I picked 3.6 as a middle value for mixed tokens, and it's exactly what produced the false positive on my own slug. Tune it against a labelled sample of your real logs, not against vibes, and expect to keep tuning it.

Secrets split across fields or encoded oddly walk past both passes. If a key is base64-wrapped inside a JSON value that's itself escaped inside another string, or split into two log fields that are individually harmless, neither regex nor a per-token entropy check will see it. This is a line-level scrubber. It does not parse structure.

And the recap, because it's the whole point: this is a floor. It catches the common, shaped, high-entropy leaks deterministically and cheaply. It false-positives on innocent random-looking fields, it's blind to low-entropy secrets, and it can be defeated by odd encoding. It is not a DLP product and it is not a substitute for keeping secrets out of tool arguments in the first place. It's the last cheap check before a line leaves your process.

Where this sits in the catalog

The risk has an official home. OWASP's LLM Top 10 for 2025 lists LLM02: Sensitive Information Disclosure, and its definition names the exact category: sensitive information includes "personal identifiable information (PII), financial details, health records, confidential business data, security credentials, and legal documents" (genai.owasp.org). Security credentials, leaking through the application's own outputs. That's this. And the older, more general form is CWE-532, insertion of sensitive information into a log file: the same bug we've had since logs existed, now triggered automatically by every tool call an agent makes.

The entropy half isn't mine either. Yelp/detect-secrets describes its "Entropy Detector" as searching "for 'secret-looking' strings through a variety of heuristic approaches," shipping Base64HighEntropyString and HexHighEntropyString plugins (github.com/Yelp/detect-secrets). I borrowed the idea and shrank it to forty lines so you can read the whole thing in one sitting and decide for yourself where the threshold goes.

This is the sideways sibling of two posts I've already written about agents. In the page told it what to do, the danger comes in through scraped content. Here the danger goes out through your own logs, with nothing malicious involved. Different direction, same discipline: put a cheap check exactly at the boundary where data crosses from your control to someone else's.

What to do Monday

Add the redact function to your logging filter, before the formatter, so every line passes through it on the way out. Start with the regexes for the key formats you actually use. You probably know your top five. Add the entropy net for everything else, set the threshold conservatively, then watch your logs for a day and tune it down until you're catching real tokens without masking half your slugs. Keep a running count of what it masks; that count is a leak report you weren't generating before, and the first time it spikes you'll know a new code path started logging something it shouldn't.

You don't need a DLP vendor to stop the dumbest version of this. You need forty lines at the boundary and the discipline to assume your own logger is the leak.

I write about production scraping and the reliability layer under AI agents: real runs, real failures, real code. Follow for the next teardown, and tell me in the comments: what's the worst thing you've found sitting in plaintext in your own logs? I read every one.

Proof, if you want it: a Trustpilot scraper I've run 962 times in production, at apify.com/knotless_cadence.

DEV Community