Fenix

Posted on Jun 13

Why the Pentagon blocks Fable 5, and how I built a <1ms guard for local agents

#ai #security #opensource #agents

Why the Pentagon blocks Fable 5, and how I built a <1ms guard for local agents

The Pentagon just told Anthropic: "You're not releasing Fable 5 to the world."

Why? Because it has autonomous penetration capabilities — it can hack systems by itself, without a human pressing buttons. Governments are terrified. Big Tech is scrambling. Papers are being written this week about "Sovereign Assurance Boundaries" and "certificate-bound admission layers."

Meanwhile, the rest of us already have everything we need.

The uncomfortable truth

You don't need a trillion-parameter closed model to break 90% of web infrastructure. The fragility is already there — unpatched systems, misconfigured APIs, classic SQL injection, weak auth. The exploits aren't new. What's new is automation at superhuman speed.

Run Hermes or a quantized Gemma/Mistral model locally via Ollama. Give it access to tools. Let it chain exploits autonomously. You'll compromise more systems in an afternoon than a team of pentesters in a month.

The threat was never the model size. It's the unmonitored tool access.

The academic answer: too heavy

This week's research papers (He & Yu, Zhou et al.) propose elaborate solutions. Airlock-broker architectures. Certificate-bound execution contracts. PKI infrastructure for AI agents.

It's secure. It's also slow, rigid, and bureaucratic. By the time you deploy it, the agents are already running in production.

My answer: Agent Fixer Stage

I built something different. While the papers debate theory, I wrote code.

Agent Fixer Stage is a lightweight, plug-and-play output guard for multi-agent workflows. ~850 lines of Python. Zero heavy dependencies. Sub-millisecond overhead.

from agent_fixer import AgentFixer

fixer = AgentFixer(scope="Deploy the microservice", action="clean")
result = fixer.check(agent_output)

if result.status == "rejected":
    alert_security_team(result.reason, result.score)

How it works: 3 cortocalable layers

Input → [Normalize] → [Pattern Score] → [Embeddings] → Output
         (5ms)         (20ms)            (5ms)

Happy path (clean output): Only layers 0+1 run. 0.04ms.

Suspicious output: Layer 2 kicks in. Semantic similarity check against known attack patterns.

Confirmed malicious: Rejected. Score, matched pattern, and reasoning logged.

What it catches

Attack type	Detection
Direct injection (curl, wget, os.system)	~95%
Leetspeak / homoglyph obfuscation	~90%
Cross-line fragmentation	~85%
Semantic exfiltration	~75%
Global	~85-90%

42 tests passing. Benchmarks verified. No hype, just code.

Anti-evasion included

Unicode NFKC + zero-width char stripping
Cyrillic homoglyph → ASCII mapping
Leetspeak normalization (1gn0r3 → ignore)
Cross-line fragmentation detection
TF-IDF embeddings for semantic variants

What it doesn't catch

100% detection is impossible. Sophisticated APTs, zero-day prompt injection, and novel obfuscation techniques will slip through. This is one layer in a defense strategy, not a silver bullet.

The pair: MCP Core Defense + Agent Fixer Stage

MCP Core Defense → Audits TOOLS before registration (static)
Agent Fixer Stage → Audits OUTPUTS during execution (runtime)

Together they cover the full lifecycle: what the agent can do, and what it actually did.

No PKI infrastructure. No bureaucratic airlock-brokers. Just Python that runs in <1ms and catches 9 out of 10 attacks.

Top comments (1)

Mehmet Can Farsak • Jun 13

Interesting angle on lightweight guards vs heavy PKI architectures. You're tackling the output side of agent safety — I've been looking at the input/pre-action side. Built Brainstorm-Mode (mehmetcanfarsak/Brainstorm-Mode on GitHub) which acts as a PreToolUse guardrail that prevents agents from executing tools when they should be brainstorming. Three modes (divergent, actionable, academic) enforce the right "thinking vs doing" state. Complements output guards nicely — catch drift before tools even get called.

DEV Community

Why the Pentagon blocks Fable 5, and how I built a <1ms guard for local agents

Why the Pentagon blocks Fable 5, and how I built a <1ms guard for local agents

The uncomfortable truth

The academic answer: too heavy

My answer: Agent Fixer Stage

How it works: 3 cortocalable layers

What it catches

Anti-evasion included

What it doesn't catch

The pair: MCP Core Defense + Agent Fixer Stage

Links

Top comments (1)