I Let AI Agents Attack My Permission Gateway for a Week. Here's What Broke.

sriram prakhya — Sat, 06 Jun 2026 08:38:16 +0000

When I started building Agent_Sudo, I thought the hard part would be approvals and policy enforcement.

I was wrong.

The hard part was discovering all the ways real agents behave once they start interacting with real tools, real runtimes, and real users.

Over the last week I dogfooded Agent_Sudo against actual agent workflows and found four surprising problems:

1. Agents Can Bypass Governance If The Runtime Gives Them Native Tools

One agent successfully changed Agent_Sudo's workspace configuration using a host-native shell that Agent_Sudo never saw. Agent_Sudo behaved correctly. The runtime boundary didn't. This reinforced a lesson that now sits at the center of the project: Agent_Sudo governs routed actions, not arbitrary runtime capabilities.

2. Workspace Changes Were Invisible

A write that was previously denied later became allowed. The audit log showed the decision change. What it didn't show was why. The cause was a workspace configuration change that wasn't being audited. Fix: PR #83 added workspace_changed audit events.

3. Broad Delegations Can Hide Problems

A wildcard delegation was allowing writes that should have required approval. Later, when that delegation expired, it denied everything instead. The authorization engine was working correctly. The visibility wasn't. Fix: PR #86 added delegation status and broad-scope visibility.

4. Approval Wait Time Didn't Mean What I Thought It Meant

I configured Agent_Sudo to wait 300 seconds for approvals. The requests still expired after 120 seconds.

The reason: approval TTL and wait time were separate controls.

Fix: PR #89 now warns when wait exceeds TTL and explains the effective limit.

What Surprised Me Most

None of these issues came from architecture reviews. None came from design documents. All came from running real agents against the system. The lesson wasn't "build more features."

It was:

Dogfood your assumptions.

Want To Try It?

The fastest path is:

pipx install agent-sudo-mcp
agent-sudo eval

It runs the complete flow:

blocked → delegated → allowed once → denied → audit verified

If you try it, tell me one thing:

Did you reach audit verified, or where did you stop?

https://github.com/Kisyntra/Agent_Sudo my repo in case you want to checkout.

What Building Agent_Sudo Taught Me About AI Agent Security (Before I Found Any Users)

sriram prakhya — Sun, 31 May 2026 01:20:59 +0000

I shipped a real thing. Agent_Sudo is a local permission gateway for AI agents: it sits in front of an agent's tool calls and decides allow / deny / require-approval based on policy and where the request originated, and it writes a tamper-evident, hash-chained audit log you can verify. Python, zero runtime dependencies, ~190 passing tests, an MCP server, working examples for LangGraph and PydanticAI, published to PyPI as v0.4.0.

It's solid. I'm proud of the engineering. And the most useful things I've learned so far have had almost nothing to do with the code.

I'm in the middle of figuring out whether anyone actually needs this. Here's what that's teaching me honestly, while it's still in progress.

Engineering quality and demand are completely different variables

For weeks I measured the project by the things engineers measure: tests green, modules clean, no dependencies, careful abstractions. All real, all satisfying and none of it tells you whether a single person wants the tool.

I caught myself using code quality as a proxy for progress. It isn't. A beautifully built thing that no one needs is still a thing no one needs. Realizing those are two separate axes, is it good* vs. does anyone want it has been the single most clarifying shift, and I clearly optimized the first while assuming the second.

I may have built a vitamin while telling myself it was a painkiller

The pitch sounds urgent: stop prompt-injection, stop exfiltration, audit everything. But step back. Most developers already get permission prompts from their tools, and a gateway only helps if you actually route every call through it. For a solo dev, that reads as a nice-to-have for a risk you haven't been bitten by yet.

There's a more serious buyer teams that need real authorization policy and a verifiable audit trail across many agents. That's a painkiller for them. But I haven't validated that buyer yet. So an honest open question I'm now carrying: am I building for a pain people feel, or a pain I find interesting?

My demo proves the wrong thing (and I built it)

I made a clean 60-second demo: an agent reads a poisoned web page, tries to exfiltrate secrets, and the gateway blocks it. It looks great.

Then I read my own code. The requests were hand-authored. The "attack" was hard-coded. Enforcement ran in dry-run. It faithfully demonstrates the decision logic but it stages the genuinely hard part: intercepting a real agent and attributing where an instruction actually came from (the user vs. the model vs. fetched content). That attribution is the core technical claim, and the demo asserts it instead of proving it.

A demo that narrates instead of proves is, if anything, worse than no demo — because a skeptical reader spots the gap in about a minute, and now they don't trust the rest either. Building the version that actually intercepts and attributes is the real work, and it's still ahead of me.

Distribution turned out to be much harder than building

I assumed the build was the hard part. The build was the easy part.

A few concrete discoveries from trying to get it in front of people:

I posted to a relevant subreddit. It was removed instantly not by moderators, but by Reddit's spam filter, because my account had 1 karma. The account is five years old; it didn't matter. No reputation, no post.
I looked at the official protocol community's Discord. Its rules: no self-promotion; soliciting is a bannable offense. It's a contributor/spec space, not a place to show a product and rightly so.

The pattern clicked: these gates aren't judging my project. They're judging whether I have any standing in the community, which I don't yet. You can't broadcast your way out of a cold start. The channels that reach developers are gated by exactly the reputation a brand-new builder hasn't had time to earn and that reputation is built by participating for weeks before you have anything to pitch, not on launch day.

What evidence I still don't have

This is the part I find genuinely interesting, because it's a list I can go get answers to:

Pull: not one person has said "I need this" unprompted. Zero is data.
A validated buyer: I have a hypothesis about who'd pay or adopt — I haven't tested it with a single real conversation.
Proof of the core claim: a working integration where Agent_Sudo intercepts a live agent and derives provenance itself, with no dry-run and no hand-built requests.
Distribution standing: any community presence at all that isn't a cold, reputation-less account.

Notice none of those are about the code. They're about demand, evidence, and trust the variables I under-invested in while over-investing in architecture.

What I'm doing about it

The lesson isn't "good code doesn't matter." It's "good code is necessary and nowhere near sufficient, and I had the order backwards." So I'm flipping it: instead of polishing the engine, I'm going after the missing evidence directly real integration demo, conversations with the teams who'd actually feel this pain, and showing up in the right communities as a participant first.

If you've shipped something technically sound that no one showed up for or you work on agents and have an opinion on where provenance attribution breaks I'd genuinely like to compare notes in the comments. The repo's here if you want to poke at it: github.com/Kisyntra/Agent_Sudo.

I'm spending the next 30 days answering a simple question:

Does anyone actually need this enough to adopt it?

That's a much harder question than whether I can build it, and it's the one that matters now.

DEV Community: sriram prakhya