We Tested Agentic AI Against 525 Real Attacks. Here's What We Found.
We ran the numbers. The threat is real.
For the past several months, we've been building and validating Cerberus — an open-source runtime security harness for agentic AI systems. We designed it around a specific threat model we call the Lethal Trifecta: the simultaneous convergence, within a single AI execution turn, of privileged data access, untrusted content injection, and an outbound exfiltration path.
We just finished our first formal validation run. N=525 attack trials across three major AI providers. Here is what the data shows.
Attack Success Rates (full injection compliance — agent fully redirected to attacker's address):
• GPT-4o-mini: 90.3% [95% CI: 84.8%–93.9%] — Causation Score: 0.811
• Gemini 2.5 Flash: 82.4% [95% CI: 75.9%–87.5%] — Causation Score: 0.702
• Claude Sonnet: 6.7% [95% CI: 3.8%–11.5%] — Causation Score: 0.207
Control group: 0/30 exfiltrations across all providers (clean baseline). Fisher's exact test: OpenAI p<0.0001, Google p<0.0001 — both statistically significant.
"This is not a theoretical vulnerability. At a 90% success rate, the Lethal Trifecta is a reliable attack primitive against current production AI systems."
What is the Lethal Trifecta — and why does it matter in supply chain and finance?
The attack chain requires three conditions to align within a single execution turn:
• Privileged data access — the agent can see sensitive operational or financial data
• Untrusted content injection — the agent is processing external input: a vendor document, an invoice, a client email, a compliance filing
• An outbound exfiltration path — the agent has the authority to take downstream action
In supply chain environments, AI agents orchestrating procurement workflows read vendor submissions, cross-reference inventory systems, and trigger purchase orders. In financial services, agents summarize client portfolios, process compliance documents, and triage transactions. Both architectures deliver all three trifecta conditions by design.
This is not a niche edge case. It is the default configuration of agentic AI in both industries — and our data shows that two of the three major model providers are vulnerable to it at rates above 80%.
What Cerberus detected — and where the gaps are
Cerberus operates in three detection layers. The validation ran in observe-only mode — agents behaved exactly as they normally would, and Cerberus recorded what it saw without intervening.
Detection Layer Results (N=525):
• L1 — Data Source Monitoring: 100% across all providers [97.9%, 100%]
• L2 — Token Provenance Tracking: 100% across all providers [97.9%, 100%]
• L3 — Outbound Intent Detection: 13.7% (OpenAI) / 1.1% (Anthropic) / 65.7% (Google)
Overall detection rate: 28.5% [24.7%, 32.6%]. False positive rate: 0.0% [0.0%, 11.4%] — zero false triggers across 30 clean control runs.
Per-Category Detection (all providers combined):
• Direct Injection: 37.8% [28.5%, 48.1%]
• Encoded/Obfuscated: 37.5% [27.2%, 49.0%]
• Multi-turn: 33.3% [22.9%, 45.6%]
• Multilingual: 33.3% [22.9%, 45.6%]
• Advanced Technique: 20.0% [14.1%, 27.5%]
• Social Engineering: 15.3% [8.8%, 25.3%]
The L3 detection gap is a known limitation and the active development focus. L1 and L2 coverage is production-ready. L3 is where the adversarial arms race is happening.
Zero performance overhead
• p50: 52μs per session
• p99: 0.23ms per session
• Overhead: 0.01% of typical LLM latency (~2s)
Against a typical LLM response time of ~2 seconds, Cerberus adds 0.01% overhead at p99. There is no meaningful performance argument against deploying it.
What this means if you're running AI in supply chain or financial services
If your agentic AI deployment uses GPT-4o-mini or Gemini and processes external documents — vendor submissions, invoices, client communications, compliance filings — the Lethal Trifecta succeeds against it at a rate above 80%.
The question is not whether this attack is theoretically possible. The question is whether you have a runtime layer that can detect when all three trifecta conditions are active in a single execution turn. Most deployments today do not.
Cerberus is open source. L1 and L2 detection are production-ready. L3 is under active development with full transparency on where the gaps are. That's the honest state of the tooling — and it's already more runtime visibility than any comparable open-source option provides today.
🔗 github.com/odinforge/cerberus
📦 npm: @cerberus-ai/core (signed provenance)
🧪 demo.cerberus.sixsenseenterprise.com
🌐 sixsenseenterprise.com
Top comments (0)