DEV Community: VamsiSudhakaran1

The State of Agent Code Safety: what we scanned, and what we refused to flag

VamsiSudhakaran1 — Tue, 07 Jul 2026 04:21:43 +0000

The one-sentence problem

A static analyzer sees eval(x) and asks "is x tainted by SQL or an HTTP parameter?" It has no concept of "x is the model's reply."

That blind spot — model output reaching a code, shell, or deserialization sink — is the entire agent layer, and it's invisible to the tools most teams already run.

I scanned 30+ of the most-starred open-source AI-agent frameworks for exactly this, then hand-verified every high-severity finding against the source. Here's what's real — and, deliberately, what I decided not to report.

What's actually out there

~1 in 5 frameworks let model output flow into eval/exec/new Function — mostly code-writing agents by design, where the only thing between a prompt injection and RCE is a sandbox the scanner can't verify.
One is unambiguous and public: SuperAGI's eval() on the assistant's own reply, CVE-2025-51472.

73% assemble LLM request params in a dict, spread them with **kwargs, and set no max_tokens anywhere in the path (27 call sites).

Cost hygiene, not a vuln, on its own — but near-universal, and passed indirectly where most scanners can't see it.

The part nobody publishes: what I refused to flag

A scanner that cries wolf gets demoted to advisory and ignored. The hardest — and most valuable — engineering here is not flagging the things that look scary but aren't. Every one of these is a real pattern in a real, well-engineered framework that a lazy scanner reports as critical:

The rule that produces that discipline: assert HIGH only when you can see the dangerous input's source — a value assigned from an LLM call in scope, or an unambiguous name like request.body.

When the danger is only inferred from a variable's name, say so — MEDIUM, "confirm the source" — not "you have an RCE." A judge rules on the evidence in front of it.

What I'm not claiming

No precision/recall number yet — that needs a labeled benchmark with true negatives, which I'm building. This is a methodology plus hand-checked findings, plus a public account of the false positives I eliminated.

Honesty about the second is the point. (One reported finding — an unbounded tool-call loop in LightAgent — already got a maintainer fix. That's the bar.)

Read the full report / scan your own agent

Full write-up: https://release-gate.com/research.html

pip install release-gate

release-gate audit . # two honest scores + a PROMOTE/HOLD/BLOCK verdict

It's open source — the analyzer and the entire false-positive test suite are on GitHub.(https://github.com/VamsiSudhakaran1/release-gate)

Agent security has a survivorship-bias problem — we're armoring the wrong part of the plane

VamsiSudhakaran1 — Wed, 01 Jul 2026 11:53:22 +0000

In WWII, the military studied bombers that came back and wanted to armor the spots with the most bullet holes. Abraham Wald pointed out the mistake: those holes mark where a plane can get hit and still fly home. The armor belongs where the returning planes have no holes — because the planes hit there never came back to be studied.

I've spent the last few weeks building a static + behavioral scanner for LLM agents, and ran it across 60+ open-source agent repos — AutoGPT, CrewAI, LangGraph, mem0, and a pile of newer frameworks.

Two things stood out.

One: the findings cluster around the same handful of issues everyone already talks about — eval(model_output) (yes, real, CVE-2025-51472 in SuperAGI), prompt-injection surfaces, LLM calls with no token ceiling (the "$4k overnight bill" stories). These are real. They're also the bullet holes on the planes that came home. Visible, patchable, survivable — and every SAST tool and guardrail vendor is racing to cover them. In 18 months they're table stakes.

Two, and this is the part that nags me: most well-maintained repos scan clean. And a clean scan is not proof of safety — it's the survivorship bias. The agents that failed catastrophically didn't leave a grep-able fingerprint. They left an incident: money wired to the wrong account, prod data deleted, a defamatory email sent, secrets exfiltrated through a poisoned tool. Those planes went down where our scanners have no data — which is exactly why the scans look clean.

So where were the downed planes actually hit? Not in code patterns. In two boundaries static analysis can't read:

Output → consequential action.

The agent's decision was plausible but wrong, and it triggered something irreversible. Every line of code was fine. The failure was in what it did, not what it is. Does anyone check whether irreversible tool actions (pay, delete, deploy, send) are gated behind a confirmation, a dry-run, a human?

Trust boundaries — MCP tools, agent-to-agent handoffs, persistent memory. The agent trusted a poisoned input and acted on it. No eval, no injection string, nothing to grep. Does anyone verify that agent A should trust agent B's output before acting on it? That an MCP tool's description isn't itself an injection?

These are invisible to SAST (no pattern), to guardrails (they filter one input, not the action), to evaluators (they score the text, not the consequence). Nobody is asking the question that actually keeps people up at night: "what can this agent DO, and is the dangerous part gated?"

I don't have this fully solved — I'm building toward it (release-gate, open source, if you're curious). But I'm posting because I think the framing matters more than any tool right now, and I want to be wrong in public.

So, genuinely: if you run an agent anywhere near production — what's the fatal boundary you're most afraid of that nothing you have today would catch? The irreversible action? The tool you can't fully trust? The model quietly drifting under you?

I'd rather learn what the missing planes look like from people who've flown the mission than keep guessing.

I built an open-source governance gate for AI agent deployments

VamsiSudhakaran1 — Tue, 17 Mar 2026 12:39:09 +0000

The $50K deploy that shouldn't have happened

Imagine this: your team ships an autonomous AI agent to production. It works great in staging. But in production, a retry loop fires endlessly, burning through tokens. By the time someone notices, the bill is $50K and climbing.

No kill switch.
No cost cap.
No rate limit.

That's the problem I built release-gate to solve.

What is release-gate?
It's an open-source tool that sits at one specific point in your CI/CD pipeline — between test and deploy. It reads a release-gate.yaml file in your repo and runs governance checks against it.

The result is binary: PASS or FAIL.
No partial deploys.
No "warnings you can ignore."

What it checks (v0.2.0)

1. INPUT_CONTRACT — Schema Validation

Does your agent validate incoming requests? release-gate checks that your JSON schema is syntactically valid, that sample inputs pass, and that bad inputs fail.

yamlinput_contract:
schema:
type: object
required: [prompt]
properties:
prompt:
type: string
maxLength: 1000

2. FALLBACK_DECLARED — Operational Safeguards

Can you kill this agent in under 5 seconds? Who gets paged? Where's the runbook?

yamlfallback_declared:
kill_switch:
type: feature_flag
name: disable_agent
ownership:
team: platform-eng
oncall: oncall@yourco.com
runbook_url: https://wiki/runbook

3. IDENTITY_BOUNDARY — Access Control

Is auth required? Are there rate limits? Can one customer see another's data?

yamlidentity_boundary:
authentication: required
rate_limit: 100
data_isolation:
- user_owned_only
- no_cross_access

4. ACTION_BUDGET — Cost & Resource Controls

What's the max spend? How many retries? How many concurrent requests?

yamlaction_budget:
max_tokens_per_req: 5000
max_retries: 3
max_daily_cost: 1000
max_concurrent: 10

Why YAML?
Because governance should live in the repo, next to the code, reviewed in PRs, and versioned in git. Not in a dashboard someone forgets to update.

What's on the roadmap?
v0.3 (Q2 2026): Approval workflows, dashboard UI, audit reports, compliance evidence generation
v1.0 (Q4 2026): Runtime policy enforcement, multi-tenant support, enterprise integrations

Links:

Website: release-gate.com
GitHub: github.com/VamsiSudhakaran1/release-gate

I'd love your feedback
This is early (v0.2.0) and I'm actively building.
What governance checks would matter most to your team? What's missing? Drop a comment or open an issue on GitHub.
Thanks for taking your time in reading this.