Mukunda Rao Katta

Posted on May 25

tool-secret-scrubber-rs: Strip Secrets from LLM Tool Logs in Rust

#hermeschallenge #ai #rust #agents

The log that should not have shipped

The agent was doing fine. It called a tool with a set of parameters. The tool returned results. The entire exchange was logged to the observability platform for debugging purposes.

Three days later someone noticed the logs. The tool call arguments included an api_key field. The value was a real credential. It had been sitting in the logging platform, searchable, for 72 hours. Anyone with read access to logs had read access to that key.

This is not a hypothetical. Every major observability breach investigation list includes some version of this scenario. Credentials in logs. Tokens in traces. Secrets in debug output that was never supposed to reach a persistent store. The individual engineer who added the log line was not being careless. They wanted visibility. The credential ending up there was an emergent outcome of passing structured data directly into a logger.

Agent tool calls make this worse because the entire purpose of a tool call is to pass data. The model sends structured arguments. The tool receives them. The logging middleware captures the whole thing. Your API key for the downstream weather service lives inside the same JSON blob as the latitude and longitude the user asked about. There is no obvious place in the stack where someone thought to strip it.

The fix is to scrub before logging. tool-secret-scrubber-rs does this for serde_json::Value payloads.

Shape of the fix

The crate gives you a Scrubber that takes a serde_json::Value and returns a cleaned version with secrets replaced by [REDACTED].

use tool_secret_scrubber_rs::Scrubber;
use serde_json::json;

let scrubber = Scrubber::default();

let args = json!({
    "query": "sales report for Q2",
    "api_key": "PLACEHOLDER_API_KEY_VALUE",
    "database_url": "postgres://user:PLACEHOLDER_PASS@host/db",
    "output_format": "csv"
});

let clean = scrubber.scrub(args);

// clean["query"] == "sales report for Q2"   (unchanged)
// clean["api_key"] == "[REDACTED]"           (key name match)
// clean["database_url"] == "[REDACTED]"      (pattern match)
// clean["output_format"] == "csv"            (unchanged)

The scrubber runs two passes. First it checks field names. If a key contains secret, token, password, api_key, or similar terms, the value is redacted regardless of its content. Second, it runs regex matches against string values for the nine known token patterns.

For nested structures, the scrubber recurses:

let nested = json!({
    "request": {
        "headers": {
            "authorization": "Bearer PLACEHOLDER_BEARER_TOKEN",
            "content-type": "application/json"
        },
        "body": "hello"
    }
});

let clean = scrubber.scrub(nested);
// clean["request"]["headers"]["authorization"] == "[REDACTED]"
// clean["request"]["headers"]["content-type"] == "application/json"

What it does not do

The scrubber does not understand context. It applies the same rules everywhere in the JSON tree. If your tool legitimately uses a field named token to mean a pagination cursor, that cursor will be redacted. You can customize the key-name list when constructing a non-default Scrubber, but the default is conservative. The scrubber also does not detect secrets embedded inside longer strings. If the model puts a credential in the middle of a prose explanation, the regex patterns may catch it if the token format is one of the nine known types. If the credential is a custom format your organization uses, it will not be caught. For PII beyond credentials, see llm-pii-redact.

Inside the lib

The nine token patterns cover the formats with recognizable structure:

AWS access key IDs (20-character uppercase alphanumeric starting with AKIA)
AWS secret access keys (40-character base64-ish)
GitHub personal access tokens (classic ghp_ prefix)
GitHub fine-grained tokens (github_pat_ prefix)
Slack bot tokens (xoxb- prefix)
Slack user tokens (xoxp- prefix)
Google service account keys (JSON with "type": "service_account")
Generic JWT three-part base64 structure
Generic bearer token patterns in authorization header values

Each pattern is compiled once at Scrubber::default() construction and reused across calls. The compiled regex set is wrapped in an Arc so cloning a Scrubber is cheap and you can share one instance across threads.

The key-name match is case-insensitive and checks substrings. A field named DatabasePassword matches because the lowercase form contains password. A field named retry_count does not match any of the sensitive substrings. The substring list is: secret, token, password, api_key, apikey, access_key, private_key, auth, credential.

Recursion handles arrays too. If a tool argument is an array of objects, each object in the array is scrubbed individually. This covers cases where agents batch multiple requests in a single tool call.

The function signature returns a new serde_json::Value rather than mutating in place. This means the original value is unchanged, which matters in logging pipelines where you want to pass the clean version to the log and the original version into the actual tool call. The cost is an allocation, but tool call payloads are small enough that this is not a concern in practice.

A note on test fixtures: the test suite uses obviously synthetic PLACEHOLDER shapes for all token-like strings. Realistic token formats in test fixtures trigger GitHub Push Protection even when they are not real credentials. If you contribute tests, keep fixture values in the PLACEHOLDER_... style rather than realistic-looking strings.

When useful

Any agent that logs tool call arguments to an observability platform, data warehouse, or log aggregator.
Pipelines where tool call payloads are stored in databases for debugging or replay purposes.
Development environments where tool call traces are written to files that might be committed or shared.
Audit trails where you need to prove the tool was called but cannot store the credential it carried.
Multi-tenant systems where tool logs from different customers share a single store and you cannot afford cross-tenant credential exposure.

When not useful

Tool calls where the argument values are already encrypted or are opaque references rather than plaintext credentials. Scrubbing is redundant if the credential is already a secret reference.
High-throughput pipelines where the per-call allocation for producing a clean copy is a measured bottleneck. In that case, consider scrubbing at write time in a separate thread.
Applications that need to preserve exact argument values for replay and cannot use a separate unredacted store for that purpose.
Cases where your logging platform already has a credential scanning layer and you would rather centralize that logic there.

Install

[dependencies]
tool-secret-scrubber-rs = "0.1"
serde_json = "1"

cargo add tool-secret-scrubber-rs serde_json

Siblings

Crate / Package	Language	What it does
tool-secret-scrubber	Python	Same regex-based scrubbing for Python agent pipelines
llm-pii-redact	Python	Broader PII redaction including phone, email, SSN, CC
agenttap	Python	Wire-level prompt introspection and capture
tool-error-classify	Python	Classify tool errors before logging
agentguard-rs	Rust	Egress allowlist to prevent credential exfiltration

What is next

The next version adds a custom pattern API so you can register your own regex patterns for organization-specific token formats. After that, I want to add a ScrubReport return type that tells you what was redacted and why, so your logging pipeline can emit a count of redacted fields per call without having to diff the before and after manually. A WASM build for browser-side scrubbing is also on the list for teams that process tool logs in frontend observability dashboards.

Source is at github.com/MukundaKatta/tool-secret-scrubber-rs. The crate publish is queued and will appear at crates.io/crates/tool-secret-scrubber-rs once the rate limit clears.

DEV Community