DEV Community

Cover image for What Building Agent_Sudo Taught Me About AI Agent Security (Before I Found Any Users)
sriram prakhya
sriram prakhya

Posted on

What Building Agent_Sudo Taught Me About AI Agent Security (Before I Found Any Users)

I shipped a real thing. Agent_Sudo is a local permission gateway for AI agents: it sits in front of an agent's tool calls and decides allow / deny / require-approval based on policy and where the request originated, and it writes a tamper-evident, hash-chained audit log you can verify. Python, zero runtime dependencies, ~190 passing tests, an MCP server, working examples for LangGraph and PydanticAI, published to PyPI as v0.4.0.

It's solid. I'm proud of the engineering. And the most useful things I've learned so far have had almost nothing to do with the code.

I'm in the middle of figuring out whether anyone actually needs this. Here's what that's teaching me honestly, while it's still in progress.

Engineering quality and demand are completely different variables

For weeks I measured the project by the things engineers measure: tests green, modules clean, no dependencies, careful abstractions. All real, all satisfying and none of it tells you whether a single person wants the tool.

I caught myself using code quality as a proxy for progress. It isn't. A beautifully built thing that no one needs is still a thing no one needs. Realizing those are two separate axes, is it good* vs. does anyone want it has been the single most clarifying shift, and I clearly optimized the first while assuming the second.

I may have built a vitamin while telling myself it was a painkiller

The pitch sounds urgent: stop prompt-injection, stop exfiltration, audit everything. But step back. Most developers already get permission prompts from their tools, and a gateway only helps if you actually route every call through it. For a solo dev, that reads as a nice-to-have for a risk you haven't been bitten by yet.

There's a more serious buyer teams that need real authorization policy and a verifiable audit trail across many agents. That's a painkiller for them. But I haven't validated that buyer yet. So an honest open question I'm now carrying: am I building for a pain people feel, or a pain I find interesting?

My demo proves the wrong thing (and I built it)

I made a clean 60-second demo: an agent reads a poisoned web page, tries to exfiltrate secrets, and the gateway blocks it. It looks great.

Then I read my own code. The requests were hand-authored. The "attack" was hard-coded. Enforcement ran in dry-run. It faithfully demonstrates the decision logic but it stages the genuinely hard part: intercepting a real agent and attributing where an instruction actually came from (the user vs. the model vs. fetched content). That attribution is the core technical claim, and the demo asserts it instead of proving it.

A demo that narrates instead of proves is, if anything, worse than no demo — because a skeptical reader spots the gap in about a minute, and now they don't trust the rest either. Building the version that actually intercepts and attributes is the real work, and it's still ahead of me.

Distribution turned out to be much harder than building

I assumed the build was the hard part. The build was the easy part.

A few concrete discoveries from trying to get it in front of people:

  • I posted to a relevant subreddit. It was removed instantly not by moderators, but by Reddit's spam filter, because my account had 1 karma. The account is five years old; it didn't matter. No reputation, no post.
  • I looked at the official protocol community's Discord. Its rules: no self-promotion; soliciting is a bannable offense. It's a contributor/spec space, not a place to show a product and rightly so.

The pattern clicked: these gates aren't judging my project. They're judging whether I have any standing in the community, which I don't yet. You can't broadcast your way out of a cold start. The channels that reach developers are gated by exactly the reputation a brand-new builder hasn't had time to earn and that reputation is built by participating for weeks before you have anything to pitch, not on launch day.

What evidence I still don't have

This is the part I find genuinely interesting, because it's a list I can go get answers to:

  • Pull: not one person has said "I need this" unprompted. Zero is data.
  • A validated buyer: I have a hypothesis about who'd pay or adopt — I haven't tested it with a single real conversation.
  • Proof of the core claim: a working integration where Agent_Sudo intercepts a live agent and derives provenance itself, with no dry-run and no hand-built requests.
  • Distribution standing: any community presence at all that isn't a cold, reputation-less account.

Notice none of those are about the code. They're about demand, evidence, and trust the variables I under-invested in while over-investing in architecture.

What I'm doing about it

The lesson isn't "good code doesn't matter." It's "good code is necessary and nowhere near sufficient, and I had the order backwards." So I'm flipping it: instead of polishing the engine, I'm going after the missing evidence directly real integration demo, conversations with the teams who'd actually feel this pain, and showing up in the right communities as a participant first.

If you've shipped something technically sound that no one showed up for or you work on agents and have an opinion on where provenance attribution breaks I'd genuinely like to compare notes in the comments. The repo's here if you want to poke at it: github.com/Kisyntra/Agent_Sudo.

I'm spending the next 30 days answering a simple question:

Does anyone actually need this enough to adopt it?

That's a much harder question than whether I can build it, and it's the one that matters now.

Top comments (1)

Collapse
 
harjjotsinghh profile image
Harjot Singh

"Before I found any users" is a refreshingly honest subtitle, and AI agent security is genuinely the right thing to obsess over early because it's the category where shipping-then-fixing can mean a breach, not a bug. The core hard problem with an agent that has sudo-level access: the agent's instructions and its untrusted inputs share the same channel, so prompt injection isn't an edge case, it's the whole threat model. An agent that can run commands is one cleverly-crafted input away from running the wrong ones.

The principle I keep landing on: agents should operate under least-privilege with deterministic gates on dangerous actions, never "trust the model to be careful." Capabilities scoped tight, irreversible actions require explicit confirmation, untrusted input never flows straight to a privileged call. Same propose-then-gate discipline I build into Moonshift (a multi-agent pipeline shipping a prompt to a real SaaS). Genuinely valuable to think security-first pre-users - what's the threat that worried you most building Agent_Sudo? Injection-to-command-execution is the one that keeps me up.