If you're shipping an AI agent — a chatbot, an assistant, anything that talks to users — it can leak things it shouldn't. API keys, internal instructions, customer data. The scary part isn't that it happens. It's that you often can't tell when it did.
I build a free tool that scans agents for these leaks before you ship. I'm not a security researcher by background — I'm a solo builder figuring this out by measuring, not by claiming expertise. This week I tested my own tool hard enough to find its limits, and I think the limits are worth sharing more than the wins.
A real example of how a leak hides
Say your agent's secret key is AKIA... (that's the format AWS uses). My scanner looks for that shape and flags it. Simple.
But here's a case that broke it: when a long blob of random uppercase letters and numbers shows up in the output — which happens all the time with things like encoded data or tokens — a fake "key" can appear inside it by pure chance. My scanner flagged it as a leak. False alarm.
So I tried the obvious fix. And the obvious fix made things worse.
What I can catch, honestly
A secret that appears whole, in plain sight: yes, reliably.
A secret broken into pieces across a long conversation: only partly. If the pieces are scattered, my tool can miss them.
A secret described instead of shown ("it starts with sk and then..."): mostly no. That's a known blind spot.
A false alarm inside a big random blob: I can now filter most of these — but only with an extra check, not the simple one I first reached for.
What I can't, and won't pretend to
When I tried the "obvious" fix for the false alarm, it stopped the false alarm — but it also made my tool miss real keys that happened to sit right next to other text. I measured it before shipping, saw it was a worse trade, and didn't ship it. Catching the easy case isn't the same as being safe, and a fix that quietly creates a bigger hole is worse than the bug it fixes.
The honest summary: these tools have a wall. Past a certain point, no clever pattern saves you — you need a different kind of check entirely. I can tell you where my wall is. I can't tell you there isn't one.
What I'm asking
I'd rather hear where this is wrong than where it's right. If you ship agents: what kind of leak worries you most, and would a "here's exactly what we caught and what we couldn't" report actually be useful to you, or is that not the thing you'd reach for? Genuinely trying to find out before I build more.
Top comments (0)