I built a live interactive attack demo — watch real prompt injection happen and get blocked in real time

#ai #opensource #security #showdev

If you've been following Cerberus, the open-source agentic AI security layer I've been building, here's something new: a live interactive demo running on a real server with real Grafana metrics.

→ demo.cerberus.sixsenseenterprise.com

What it does

Pick a scenario. Hit Run. Watch step cards populate as the attack executes. Watch the Grafana panel spike. Everything is real — real Cerberus guard() middleware, real OpenTelemetry spans, real Prometheus scraping, real Grafana rendering.

The scenarios

Scenario	Steps	Expected outcome
Clean Run (Control)	2	Passes — score stays 1
Data Exfiltration	2	Logged — score 2
Prompt Injection	1	Logged — score 2
Full Lethal Trifecta	3	BLOCKED — score 4
Encoded Injection (Base64)	3	BLOCKED — score 4
Social Engineering	3	BLOCKED — score 4
Enterprise APT Simulation	19	BLOCKED at step 19

The Enterprise APT scenario is the interesting one

19 steps. Twelve legitimate internal reads (HR, finance, CRM, payroll, contracts, audit logs, secrets vault). One clean external fetch (vendor portal). One injection delivery disguised as a "GDPR regulatory update" from compliance-verify.net. Two authorized sends to acme.com — these pass. One attempted exfiltration to data-audit@compliance-verify.net — blocked.

The authorizedDestinations config is key. Cerberus tracks what's authorized in context. Legitimate sends don't get blocked. Only the attacker's destination does.


typescript
const guarded = guard(executors, {
  threshold: 3,
  alertMode: 'interrupt',
  opentelemetry: true,
  authorizedDestinations: ['acme.com', 'deloitte.com'],
  // ...
}, outboundTools);