When prompts become shells: the tool registry is the attack surface

#security #cve #aiagents #ai

On May 7, 2026, Microsoft published "When Prompts Become Shells: RCE vulnerabilities in AI agent frameworks" — a retrospective on two Critical (9.9) CVEs in Semantic Kernel that landed in February and were patched within days.

The CVEs are bad. The framing is worse — and worth reading carefully.

The two CVEs

CVE-2026-26030 — `eval()` on attacker-controlled filter strings

InMemoryVectorStore accepts user-supplied filter expressions and evaluates them. Filter strings are interpolated into a Python expression and executed via eval():

expr = f"' or {user_filter} or '"
result = eval(expr, {"__builtins__": {}}, {})

An AST blocklist exists. It enumerates dangerous node types: Import, Call to known names, attribute access on a denylist. The blocklist was bypassable through undocumented attribute traversal — __name__, load_module, BuiltinImporter — none of which the filter explicitly denied. From there the attacker reaches os.system through the importer machinery without ever hitting an Import node.

Patched: semantic-kernel Python 1.39.4. Three external researchers credited.

CVE-2026-25592 — `DownloadFileAsync` exposed as a kernel function

In SessionsPythonPlugin, the DownloadFileAsync method was decorated with [KernelFunction]. That single attribute makes a function callable by the LLM as a tool. The method accepts a localFilePath parameter with no canonicalization, no directory allowlist, no validation of any kind.

[KernelFunction]
public async Task DownloadFileAsync(string remoteUrl, string localFilePath)
{
    // No path validation. No scope check. No allowlist.
    await File.WriteAllBytesAsync(localFilePath, content);
}

A prompt that gets the agent to call this tool with localFilePath = "C:\\Windows\\Start Menu\\Programs\\Startup\\malware.exe" writes a file that executes on the next user login. Sandbox escape, host-level persistence, in one tool call.

Patched: Microsoft.SemanticKernel.Plugins.Core 1.71.0. Same three researchers.

Microsoft's load-bearing line

"Vulnerabilities in the AI layer are no longer just a content issue and are an execution risk... because these frameworks act as a ubiquitous foundational layer, a single vulnerability in how they map AI model outputs to system tools carries systemic risk."

That's not throwaway language. They're naming a class.

A registered tool that wraps eval() turns prompt injection into a syscall.

Every agent framework has a tool registry. Every registry maps LLM-generated strings to functions. If any registered function wraps a dangerous primitive — eval, exec, Download*, raw filesystem write — prompt injection is no longer a content problem. It's a scope problem.

What runtime testing catches

Most agent security tools — including the harness I work on (msaleme/red-team-blue-team-agent-fabric) — operate at runtime. They send adversarial prompts, observe what the agent does, and flag deviations from declared behavior. Several tests in that suite map directly to these CVEs:

MCP-010: injects path traversal, template injection, and command substitution payloads into tool call arguments. Catches the DownloadFileAsync exploit if you've already invoked the tool.
SS-002: scans declared permissions against actual code, fails when exec:none is declared but eval() appears in the body. Catches CVE-26030's pattern if a permission declaration exists.
SS-007: enforces sandboxing tiers — a tool wrapping filesystem-write should be Tier-1, never auto-promoted to Tier-3.

Each is useful. None is sufficient.

What runtime testing misses

The upstream cause of both CVEs is structural: a function that could call eval() was registered as a tool. A function that could write any path was decorated with [KernelFunction]. The vulnerability existed at registration time, not at invocation time.

Runtime probes can't see this. They observe the symptom — a bad call happens — and report it after the fact. They don't enumerate the framework's tool registry at load time and traverse the call graph of each registered callable looking for dangerous primitives.

That gap is closer to a Semgrep-over-the-tool-registry rule than a runtime test. Roughly:

rules:
  - id: kernel-function-wraps-dangerous-primitive
    pattern-either:
      - patterns:
          - pattern-inside: |
              [KernelFunction]
              ... $METHOD(...) { ... }
          - pattern-either:
              - pattern: eval(...)
              - pattern: exec(...)
              - pattern: subprocess.$ANY(...)
              - pattern: File.WriteAllBytesAsync(...)
              - pattern: File.WriteAllText(...)
      - patterns:
          - pattern-inside: |
              @kernel_function
              def $METHOD(...): ...
          - pattern-either:
              - pattern: eval(...)
              - pattern: exec(...)
              - pattern: __import__(...)
    message: |
      Function registered as an LLM-callable tool wraps a dangerous primitive.
      The LLM can now invoke this primitive with attacker-influenced arguments.
    severity: ERROR
    languages: [python, csharp]

That rule would have flagged both CVEs in CI before they reached production.

The harder version of this problem is transitive: a registered tool that calls a helper that calls eval(). That requires whole-program analysis. But the first-order cases — direct calls inside the function body — are catchable with the same tooling teams already use for SQL injection and unsafe deserialization.

What I take away

The LLM is not a security boundary. Microsoft says this in their architectural recommendations and they're right. Treat every LLM-generated string as untrusted input at every system call.

The tool registry is the trust boundary. Whether a function is callable by the model is a security decision, not a developer-convenience decision. Every @tool / [KernelFunction] / register_tool() decorator is a capability grant.

Runtime tests catch what registration audits should have caught upstream. Both layers are necessary. Neither is sufficient on its own.

I'm interested in how teams are currently auditing this — whether at PR-time as a Semgrep rule, at registration time as a runtime check, or trust-on-load with downstream gating. Drop a note in the comments if your stack does any of these.

Full test mappings (SS-002, SS-007, MCP-010, MCP-001, HC-5, HC-6, RiskGate) and the cross-link to the constitutional governance layer (CognitiveThoughtEngine/constitutional-agent-governance) are in the GitHub Discussion.

Top comments (2)

Max Quimby • May 11

"Prompt injection is no longer a content problem. It's a scope problem." That's the sentence I'm going to keep stealing.

The registration-time static analysis angle is the right pressure point. Runtime adversarial testing is necessary but it's strictly checking — by the time eval is wired to an LLM-callable interface, your blast radius is already shipped. Semgrep rules catching eval, exec, subprocess, and unbounded filesystem writes at registration is cheap and works.

One pattern we've found useful on top of static analysis: a capability manifest per tool, declared at registration, that constrains paths, network egress, and argument shapes. The LLM sees the tool; the runtime enforces the manifest. Even if a developer accidentally registers something like DownloadFileAsync, the manifest refuses writes outside an allowlisted prefix and the CVE-2026-25592 class of bug becomes a logged denial instead of a startup-folder dropper.

The hard part isn't the mechanism, it's culture: every team has someone who wants to register run_shell "just for debugging." Make that registration step loud.