6 AI Agent Security Signals From the First Week of April 2026 — And What Catches Each One

#ai #security #mcp #agents

The first week of April 2026 produced more AI agent security signals than most months. Here's what happened, why it matters, and what — if anything — existing frameworks can catch.

1. Microsoft's Azure MCP Server shipped with zero authentication (CVSS 9.1)

CVE-2026-32211. Microsoft's @azure-devops/mcp package shipped with no authentication on critical functions. CVSS 9.1 — network-reachable, no credentials needed, no user interaction. Five public PoC exploits landed within days. Microsoft mitigated server-side by April 7.

The MCP specification makes authentication optional. Microsoft — the company that co-authored the protocol — shipped without it. This CVE is one of 30+ MCP CVEs filed in under 60 days in early 2026. The root cause across all of them: missing input validation, absent authentication, blind trust in tool descriptions.

What catches it: The Agent Security Harness tests unauthenticated MCP access directly (AUTH-001 through AUTH-003, MCP-003). 11 tests cover this attack surface. On the governance side, the constitutional-agent-governance RiskGate classifies unauthenticated endpoint exposure as CRITICAL — security_critical_events >= 1 triggers immediate FREEZE.

2. LiteLLM backdoor → 4TB exfiltrated from a $10B AI startup

On March 24, attackers published backdoored LiteLLM versions (v1.82.7–1.82.8) to PyPI after compromising Trivy's CI pipeline. The malware harvested API keys, SSH keys, Kubernetes configs, and cloud credentials. A .pth persistence mechanism fired on any Python interpreter startup. Mercor, a $10B AI data vendor supplying OpenAI and Anthropic, confirmed a breach. Lapsus$ claimed 4TB.

LiteLLM gets 95 million monthly PyPI downloads. It's a transitive dependency of CrewAI, DSPy, AutoGen, MLflow, and dozens of MCP server implementations. The malware was specifically designed to steal what AI agent infrastructure holds: LLM API keys, cloud IAM tokens, Kubernetes service accounts. This isn't "another PyPI compromise" — it's the first supply chain attack optimized for the AI agent stack.

What catches it: Eight CVE-series tests (CVE-001 through CVE-008) cover nested schema injection, tool fork fingerprinting, marketplace contamination, and encoded payload detection. Three provenance tests (PRV-010–PRV-012) cover forked tool detection and registry hash mismatches. The GovernanceGate enforces zero tolerance: control_bypass_attempts >= 1 → immediate FAIL.

What's missing: We test tool/MCP marketplace supply chains, not PyPI/npm dependency chains. The LiteLLM attack shows the build pipeline is the entry point — the agent framework is the delivery vehicle.

3. Unit 42: 22 prompt injection techniques observed on live websites

Palo Alto's Unit 42 published the first large-scale observation of indirect prompt injection in production. Not a lab — real websites targeting real agents. 22 delivery techniques: invisible CSS text, HTML data-* attribute cloaking, base64 runtime assembly, canvas-based rendering, payload splitting across DOM elements.

85.2% of attacks use social engineering framing ("god mode" personas, fake system-update prompts), not algorithmic exploits. 37.8% are visible plaintext — they work because nobody checks. Intent breakdown from telemetry: data destruction (14.2%), content moderation bypass (9.5%), unauthorized payments, SEO poisoning.

What catches it: Seven indirect injection tests across memory poisoning (MEM-002, MEM-005), MCP template injection (MCP-006), grounding source manipulation (AZR-002), and multi-turn trust-building attacks (STATE-001/STATE-002). The RiskGate catches downstream effects at misuse_risk_index >= 0.80.

What's missing: We test payload delivery through tool outputs and prompts, but not the 22 web-based concealment techniques. Agents are being hit through their browsing pipelines.

4. Malicious MCP servers can inflate agent costs 658x (3% detection rate)

Researchers demonstrated that a malicious MCP server can steer agents into prolonged tool-calling chains by editing two fields in its responses. Each call returns valid, task-relevant content — but the final answer is deferred until maximum turns. On Mistral-Large: 87 tokens per query (benign) vs. 57,255 (under attack). Four defense classes tested; maximum detection rate: 3%.

This is the agent equivalent of a cryptominer: invisible to the user, expensive to the operator. The output looks correct; only the bill changes.

What catches it: HC-2 (budget ceiling) stops the moment accumulated spend exceeds approved budget. HC-3 (runway survival floor) provides the absolute bottom. EconomicGate catches the trajectory — HOLD at 6 months runway, FAIL at 3. The harness tests cascade containment (IR-008) and budget exhaustion (X4-011, X4-013).

What's missing: The paper's attack is subtler than obvious loops. Each tool call is individually legitimate. Per-session trajectory monitoring would catch this — neither repo implements that yet.

5. 49% of organizations can't see what their AI agents are doing

Salt Security surveyed 327 security professionals. 48.9% are entirely blind to machine-to-machine traffic from autonomous AI agents. 48.3% cannot distinguish legitimate agents from malicious bots. 99% of observed attacks originated from authenticated sources — agents with valid credentials but no behavioral guardrails.

Authentication solved "who." Nobody solved "what are they doing."

What catches it: HC-11 triggers STOP if an agent goes silent for 24 hours. GovernanceGate freezes the system if audit coverage drops below 95%. The harness tests logging completeness (AUDIT-001), structured log fields (IR-006), and detection latency (AIUC-E001 — must be < 2 seconds).

What's missing: Our checks validate that the agent logs its own actions. They don't test whether external infrastructure (WAFs, API gateways, SIEM) can see and classify agent behavior from the outside.

6. Microsoft shipped an Agent Governance Toolkit (916 stars in 8 days)

Microsoft open-sourced a runtime policy engine with 7 packages: Agent OS (YAML/OPA/Cedar rules), Agent Mesh (DID-based identity), Agent Runtime (ring-based execution tiers), Agent SRE (circuit breakers), Agent Compliance (EU AI Act grading). 916 stars, MIT license.

This validates agent governance as an infrastructure category. But AGT is a runtime guard — it blocks known-bad actions against written policies. It doesn't find the unknown-bad actions before you deploy, and it has no answer for scenarios not covered by policy rules.

The stack should be: Test (find what's broken) → Govern (catch what no policy covers) → Enforce (block known-bad at runtime).

The Agent Security Harness is open source: github.com/msaleme/red-team-blue-team-agent-fabric. The constitutional governance layer: github.com/CognitiveThoughtEngine/constitutional-agent-governance.