Between May 4 and May 6, 2026, NVD published four CVEs against AI/agent projects. Different teams, different codebases, four review processes — and one defect class.
The shape: an LLM produces output, the application drops that output into a privileged execution sink — SQL engine, Python interpreter, shell, browser DOM — without re-validation, and the sink runs it.
This is OWASP's LLM05: Improper Output Handling. It is distinct from LLM01: Prompt Injection precisely because the failure mode is downstream of the model. The user's prompt isn't malicious. The user asks for a SQL query and gets a SQL query. The model didn't go rogue. The application failed to treat the model's output as untrusted.
Why this matters for how you allocate AI security spend
A guardrail product tuned to detect malicious prompts produces zero alerts on these four CVEs. There is no jailbreak. There is no prompt injection. The user's request is syntactically reasonable, the model's response is syntactically reasonable, and the bug lands at the layer below — at the seam where LLM output enters a privileged sink.
That layer is where your control belongs, and that's not where the AI security tooling market has been concentrated. Most products in this space — prompt firewalls, input classifiers, jailbreak detectors — sit in front of the model. They're useful for what they do. They are not the right tool for the bug class that produced the four CVEs above.
The honest framing is: you need both. Prompt-side detection for LLM01-style abuse, plus sink-side validation at every seam where LLM output crosses into something that executes. The second category has been under-built and under-bought, and the CVE batch is the receipt.
What "right placement" looks like, by sink
This is not novel application security. It is the AppSec playbook from before LLMs existed, applied at the seam where LLM output meets a privileged sink:
SQL — parameterized queries plus a statement-type allowlist. COPY FROM PROGRAM shouldn't be reachable from any LLM-driven path; that's a database-role decision, not prompt engineering.
Python eval() — there is no safe eval() of untrusted input. Use ast.literal_eval plus a whitelist; if real code execution is required, sandbox with a separate process boundary and no network or filesystem.
Shell — execFile/spawn with arg arrays, never exec/execSync with concatenated strings. This is a 1990s-era bug; the LLM is just a new way to inject the metacharacter.
DOM — server-side sanitize any LLM-influenced content; serve under CSP that disallows inline scripts. SVG goes through a dedicated sanitizer or doesn't render at all.
None of this requires AI-specific tooling. It requires recognizing that the LLM's output is input to another program — and treating it the way mature engineering teams have treated untrusted input for a quarter century.
What this means for CTO / CISO / Head of AI Platform
Three concrete shifts.
Audit your agent's output surface, not just its input surface. Most AI security audits I see in 2026 cover prompt injection and refusal training. They rarely enumerate the sinks that LLM output reaches — the SQL engine, the code interpreter, the browser tool, the report renderer, the email sender. List those. Treat each as a privilege boundary. Ask, for each one: what validates the LLM's output before this sink consumes it? If the answer is "the model's training," that is the bug.
Reallocate at least one budget line from "AI guardrails" to "AppSec at the seam." This isn't headcount growth — it is a portfolio shift. The teams who already understand SQL injection, command injection, eval-of-untrusted, and stored XSS in the classical sense are the ones equipped to fix the LLM05 class. They need access to your agent code paths and priority on the backlog. Most AppSec teams I've talked to have not been brought into AI agent projects yet because the framing has been "AI is a different category." It isn't, for this defect class.
Run a structured red-team exercise focused on the output-to-sink seam, not on the model. Existing scanners for SQL injection, command injection, and eval-of-untrusted catch the same bugs in this code path — when they're pointed at it. The blocker is framing, not tooling. A two-week internal exercise asking "what does our agent's LLM output get fed into, and what validates it before it lands?" will surface the same defect class in any production agent with more than one tool sink.
What I'm not saying
I'm not saying prompt-side detection is wrong or wasteful. LLM01 is a real attack class. The argument is about proportional spend, not categorical replacement.
I'm not saying frontier-model refusal training fixes this. It doesn't. The model is not refusing harm when it produces a DROP TABLE; it is responding to a syntactically reasonable request. The harm enters the system at the sink, not the model.
I'm not saying the four affected projects are uniquely sloppy. They are open-source, ship under public review, and got patched within hours — more than most closed agent systems would manage. The point of citing them is structural similarity, not blame.
The shift is not which AI security vendor you use. It is which boundary in your system you treat as the one that matters.
Newsletter outro (paste below the article, replaces the LinkedIn-style "if this resonated" line)
If this sharpened a question for you — or pushed back on something you believe — replies hit my inbox directly and I read every one. The full reproducibility version, with the probe yaml and a 15-probe sample run against Claude Sonnet 4.6, lives at https://www.at-helper.com/blog/four-cves-in-a-week-all-the-same-shape-when-agents-execute-llm-generated-code.
Next regular issue (a thought-leadership piece, not interim) lands in one week. If a colleague who's making 2026 AI security spend decisions would find this useful, the share button below is the move.
— Yang

Top comments (0)