sriram prakhya

Posted on May 31

What Building Agent_Sudo Taught Me About AI Agent Security (Before I Found Any Users)

#ai #webdev #opensource #career

I shipped a real thing. Agent_Sudo is a local permission gateway for AI agents: it sits in front of an agent's tool calls and decides allow / deny / require-approval based on policy and where the request originated, and it writes a tamper-evident, hash-chained audit log you can verify. Python, zero runtime dependencies, ~190 passing tests, an MCP server, working examples for LangGraph and PydanticAI, published to PyPI as v0.4.0.

It's solid. I'm proud of the engineering. And the most useful things I've learned so far have had almost nothing to do with the code.

I'm in the middle of figuring out whether anyone actually needs this. Here's what that's teaching me honestly, while it's still in progress.

Engineering quality and demand are completely different variables

For weeks I measured the project by the things engineers measure: tests green, modules clean, no dependencies, careful abstractions. All real, all satisfying and none of it tells you whether a single person wants the tool.

I caught myself using code quality as a proxy for progress. It isn't. A beautifully built thing that no one needs is still a thing no one needs. Realizing those are two separate axes, is it good* vs. does anyone want it has been the single most clarifying shift, and I clearly optimized the first while assuming the second.

I may have built a vitamin while telling myself it was a painkiller

The pitch sounds urgent: stop prompt-injection, stop exfiltration, audit everything. But step back. Most developers already get permission prompts from their tools, and a gateway only helps if you actually route every call through it. For a solo dev, that reads as a nice-to-have for a risk you haven't been bitten by yet.

There's a more serious buyer teams that need real authorization policy and a verifiable audit trail across many agents. That's a painkiller for them. But I haven't validated that buyer yet. So an honest open question I'm now carrying: am I building for a pain people feel, or a pain I find interesting?

My demo proves the wrong thing (and I built it)

I made a clean 60-second demo: an agent reads a poisoned web page, tries to exfiltrate secrets, and the gateway blocks it. It looks great.

Then I read my own code. The requests were hand-authored. The "attack" was hard-coded. Enforcement ran in dry-run. It faithfully demonstrates the decision logic but it stages the genuinely hard part: intercepting a real agent and attributing where an instruction actually came from (the user vs. the model vs. fetched content). That attribution is the core technical claim, and the demo asserts it instead of proving it.

A demo that narrates instead of proves is, if anything, worse than no demo — because a skeptical reader spots the gap in about a minute, and now they don't trust the rest either. Building the version that actually intercepts and attributes is the real work, and it's still ahead of me.

Distribution turned out to be much harder than building

I assumed the build was the hard part. The build was the easy part.

A few concrete discoveries from trying to get it in front of people:

I posted to a relevant subreddit. It was removed instantly not by moderators, but by Reddit's spam filter, because my account had 1 karma. The account is five years old; it didn't matter. No reputation, no post.
I looked at the official protocol community's Discord. Its rules: no self-promotion; soliciting is a bannable offense. It's a contributor/spec space, not a place to show a product and rightly so.

The pattern clicked: these gates aren't judging my project. They're judging whether I have any standing in the community, which I don't yet. You can't broadcast your way out of a cold start. The channels that reach developers are gated by exactly the reputation a brand-new builder hasn't had time to earn and that reputation is built by participating for weeks before you have anything to pitch, not on launch day.

What evidence I still don't have

This is the part I find genuinely interesting, because it's a list I can go get answers to:

Pull: not one person has said "I need this" unprompted. Zero is data.
A validated buyer: I have a hypothesis about who'd pay or adopt — I haven't tested it with a single real conversation.
Proof of the core claim: a working integration where Agent_Sudo intercepts a live agent and derives provenance itself, with no dry-run and no hand-built requests.
Distribution standing: any community presence at all that isn't a cold, reputation-less account.

Notice none of those are about the code. They're about demand, evidence, and trust the variables I under-invested in while over-investing in architecture.

What I'm doing about it

The lesson isn't "good code doesn't matter." It's "good code is necessary and nowhere near sufficient, and I had the order backwards." So I'm flipping it: instead of polishing the engine, I'm going after the missing evidence directly real integration demo, conversations with the teams who'd actually feel this pain, and showing up in the right communities as a participant first.

If you've shipped something technically sound that no one showed up for or you work on agents and have an opinion on where provenance attribution breaks I'd genuinely like to compare notes in the comments. The repo's here if you want to poke at it: github.com/Kisyntra/Agent_Sudo.

I'm spending the next 30 days answering a simple question:

Does anyone actually need this enough to adopt it?

That's a much harder question than whether I can build it, and it's the one that matters now.

Top comments (4)

Harjot Singh • May 31

"Before I found any users" is a refreshingly honest subtitle, and AI agent security is genuinely the right thing to obsess over early because it's the category where shipping-then-fixing can mean a breach, not a bug. The core hard problem with an agent that has sudo-level access: the agent's instructions and its untrusted inputs share the same channel, so prompt injection isn't an edge case, it's the whole threat model. An agent that can run commands is one cleverly-crafted input away from running the wrong ones.

The principle I keep landing on: agents should operate under least-privilege with deterministic gates on dangerous actions, never "trust the model to be careful." Capabilities scoped tight, irreversible actions require explicit confirmation, untrusted input never flows straight to a privileged call. Same propose-then-gate discipline I build into Moonshift (a multi-agent pipeline shipping a prompt to a real SaaS). Genuinely valuable to think security-first pre-users - what's the threat that worried you most building Agent_Sudo? Injection-to-command-execution is the one that keeps me up.

sriram prakhya • Jun 1

That's a great way to frame it.
The thing that worried me most wasn't prompt injection by itself it was attribution.
With traditional systems, you usually know who initiated an action. With agents, the same shell command or API call can originate from a direct user request, model-generated reasoning, fetched content, tool output, or some combination of all of them.
The security decision often depends on that origin. Reading a file because the user asked is very different from reading the same file because a webpage told the agent to do it.
Prompt injection is one path into the problem, but the deeper question for me became: how do we know where an instruction came from, and how do we make authorization decisions differently based on that provenance?
That's what pushed Agent_Sudo toward provenance tracking, approvals, and auditability rather than trying to solve everything with prompt filtering alone.

Harjot Singh • May 31

Thinking about agent security before users is the rare right order, most teams bolt it on after an incident. "agent_sudo" is a telling name: the core problem is that agents need elevated privileges to be useful, but elevated privileges on a non-deterministic actor is a genuinely new kind of risk. The principles that hold: least-privilege by default, scoped and expiring grants, every action attributable, and a human gate on anything irreversible. An agent should request capability like sudo asks for a password, not hold root permanently. That permissioned-execution model is exactly how I architect Moonshift. What was the biggest security surprise building it, the privilege scoping or prompt-injection as a path to those privileges?

sriram prakhya • Jun 1

The biggest surprise was how quickly privilege scoping and prompt injection become the same problem.
Initially I thought of them separately: prompt injection was an input problem, and privilege management was an authorization problem.
But once an agent has access to tools, injection mostly matters because it can influence how those privileges get used. An injected instruction with no capabilities behind it isn't very interesting. An injected instruction that can read files, send messages, execute commands, or access credentials is where things get dangerous.
That realization pushed me toward a least-privilege model with approvals, scoped permissions, delegation limits, and audit trails. I became less interested in trying to perfectly detect every malicious prompt and more interested in making sure the agent couldn't do high-impact actions without crossing an explicit boundary.
I'm still learning, but that shift in thinking was probably the biggest surprise while building it.