Sammegh Banjara

Posted on Mar 25

AI Agents are doing more than you think.

#opensource #ai #security

Why your PII redaction tool is useless for AI Agents (and what to do about it) — built a fix

I watched my agent try to email a production API key. Here is the post-mortem.

If you are building AI agents, you are likely sleeping on a massive security hole.

We’ve all added "PII Redaction" to our stacks. It’s standard procedure now. You spin up a middleware, scan the prompt for emails or SSNs, and redact them.
Job done, right?

Wrong.

I learned this the hard way last week.

The "Oh Sh*t" Moment
I was testing a "Jira Summarizer" agent. The premise was simple: Read a ticket, summarize it, and email the summary to the team.

I fed it a test ticket that contained a dummy AWS key (AKIA...) inside the description.

My PII filter scanned the incoming prompt: "Summarize ticket ID-123."
Result: Clean. No PII found.

The agent read the ticket (via a tool call), processed the text, and decided to act.
It called the send_email tool.

I checked the logs. My stomach dropped.

{
  "tool": "send_email",
  "arguments": {
    "to": "team@company.com",
    "body": "Here is the summary. The user provided the key: AKIAIOSFODNN7EXAMPLE..."
  }
}

My security layer had completely missed it.

The Blind Spot: Tool Call Arguments
The problem isn't that PII filters don't work. It's that they are looking in the wrong place.

Most security tools focus on the Prompt (what the human types).
But Agents operate in the Arguments (what the AI decides to do).

Agents don't just "talk." They execute.

Read: Agent fetches data from a database or ticket.
Think: Agent decides that data is "relevant."
Act: Agent injects that data into a tool (Email, HTTP Request, SQL Query).
Your PII filter checks step 1. It ignores step 3.

The Fix: "Actionable Security"

I realized I needed a security layer that understood the agent's execution loop. I needed something that didn't just scan text, but scanned intent.

I ended up building QuiGuard to solve this.

It’s a proxy that sits between your agent and the LLM provider, but instead of just checking prompts, it recursively inspects tool_calls.

How it works:

Intercept: It captures the API request before it leaves your network.
Parse: It identifies tool_calls in the JSON body.
Scrub: It recursively scans every argument for PII/Secrets.
Restore: It replaces secrets with placeholders (), lets the AI work, and swaps the real values back in the response.
This "Round-Trip Restoration" means the AI can process the data (e.g., "Send an email to ") without ever seeing the real address.

The Future of Agent Security
We are moving from "Chatbots" (passive) to "Agents" (active).
Our security models must evolve.

If you are deploying agents into production:

Stop trusting prompt filters alone.
Inspect your tool outputs.
Implement "Action Gates" (block high-risk actions like DELETE or external emails).
I open-sourced the fix I built. It’s a self-hosted Docker container that plugs into any OpenAI-compatible endpoint.

GitHub: https://github.com/somegg90-blip/quiguard-gateway

Website: https://quiguardweb.vercel.app/

If you are building agents, stay safe. The leaks aren't coming from the users anymore. They are coming from the agents themselves.

Top comments (1)

CPDForge • Mar 25

This hits a very similar issue to what we ran into — just from a different angle.

We were generating compliance training content, and everything looked fine until you actually tried to rely on it.

The problem wasn’t the initial prompt, it was what happened after — consistency, drift, and things breaking across the system.

Feels like the common theme is:
we’re validating inputs, but not what the system actually does with them.

Your point about inspecting tool calls instead of just prompts is spot on.