Andrew

Posted on May 11 • Originally published at andrew.ooo

AgentArmor Review: 8-Layer Open-Source Agent Security

#agentarmor #aiagentsecurity #promptinjection #owaspasi

Originally published on andrew.ooo — visit the original for any updates, code snippets that aged out, or follow-up posts.

TL;DR

AgentArmor is a new open-source security framework that wraps any AI agent with eight independent enforcement layers — ingestion, storage, context, planning, execution, output, inter-agent, and identity. It's built specifically against the OWASP Top 10 for Agentic Applications (2026) and ships as a Python library, a FastAPI proxy, and a native MCP server you can plug into Claude Code or OpenClaw with five lines of JSON.

After three weeks of agent-security tooling launches — most of them point solutions (a prompt-injection scanner here, a PII redactor there) — AgentArmor is the first I've seen that takes the boring-but-correct approach: every data flow point in an agent's lifecycle is a separate enforcement layer with its own threat model. Highlights from the v0.5.0 release:

8 independent layers mapped 1-to-1 to the OWASP ASI Top 10 risk catalog
127+ adversarial test cases validating the four hardened layers (L3–L6) end-to-end
AES-256-GCM at rest for stored memory, HMAC-SHA256 mutual auth for inter-agent traffic
Native MCP server (agentarmor-mcp) — six tools any MCP-compatible agent can call
Apache 2.0, pip install agentarmor-core, integrations for LangChain, OpenAI Agents SDK, and MCP servers
Show HN traction in early May with hands-on demos blocking real attacks against a local Ollama agent

This review walks through what AgentArmor actually does at each layer, the code you'd write to use it, what the 127 adversarial tests cover, and where the framework still has hard edges.

What is AgentArmor?

AgentArmor (GitHub: Agastya910/agentarmor) is a defense-in-depth security framework that sits around your agent runtime, not inside it. You don't rewrite your LangChain or OpenAI Agents code; you wrap tool calls, LLM responses, and memory writes in armor.intercept(...) and let eight layers each do their job.

The framing the author articulated on Show HN: most "agent security" tooling today is a point solution. You bolt on a prompt-injection scanner. Then a PII redactor. Then a permissions wrapper. Each works in isolation. An attacker who slips past the first scanner has a clean shot at the tool runtime, because nothing downstream is looking for the second-stage chain.

AgentArmor's pitch is that an agent has eight distinct data-flow surfaces — ingestion, storage, context, planning, execution, output, inter-agent, identity — and each needs its own enforcement engine. The README's ASCII diagram makes it concrete:

 MCP Agents (Claude Code, OpenClaw, Cursor, etc.)
                 │ stdio (agentarmor-mcp)
                 ▼
       ┌─────────────────────────────┐
       │ AgentArmor Pipeline         │
       │  L8: Identity & IAM         │
       │  L1: Data Ingestion         │
       │  L2: Memory/Storage         │
       │  L3: Context Assembly       │
       │  L4: Plan Validation        │
       │  L5: Action Execution       │
       │  L7: Inter-Agent Security   │
       │ ────────────────────────── │
       │  L6: Output Filter (post)   │
       │  Audit Logger (cross-cut)   │
       │  Policy Engine (cross-cut)  │
       └─────────────────────────────┘
                 │
                 ▼
       External Tools / APIs / LLMs

It's Apache 2.0, pure Python, lives at agentarmor-core on PyPI, and the v0.5.0 release explicitly upgrades four of the eight layers from "basic" to "production-grade, adversarially-tested" — unusually honest framing for a v0.x project.

Why It's Trending NOW

The Show HN went up in early May 2026 with a hands-on demo: a local Ollama agent (qwen2:7b) running tool calls, and AgentArmor blocking a database.delete at L8 (permission check), redacting PII from file content at L6, and killing a prompt injection at L1 before it reached the model.

Three structural reasons it's getting attention right now:

OWASP ASI Top 10 just stabilized. The Agentic Security & Integrity Top 10 left draft status in December 2025 and is now the de-facto checklist enterprise security teams point at. AgentArmor is the first open-source framework that maps cleanly to all ten risks.
MCP server sprawl is creating real incidents. Teams adding three or four MCP servers to a coding agent have effectively granted that agent network, filesystem, and database access with no boundary between them. AgentArmor's armor_scan_mcp_server is one of the few utilities that audits MCP servers for rug-pull risk and missing TLS/OAuth.
The "agent ran amok" stories landed. From the Slack indirect-prompt-injection PromptArmor disclosure to production incidents where coding agents wiped repos or leaked credentials, founders aren't arguing about whether agent security matters anymore — they're shopping for tooling.

The Eight Layers (and what each one actually does)

The whole framework is organized around this table from the README:

Layer	Name	What It Protects
L1	Ingestion	Input scanning, prompt-injection detection, source verification
L2	Storage	AES-256-GCM encryption at rest, HMAC integrity, tamper detection
L3	Context	GoalLock anchoring, multi-canary injection, template injection stripping
L4	Planning	Action chain tracking, semantic risk scoring, multi-step attack detection
L5	Execution	DNS rebinding protection, rate limiting, circuit breakers, resource budgets
L6	Output	Credential redaction, PII scanning, harmful content blocking, exfiltration detection
L7	Inter-Agent	Mutual auth (HMAC), trust scoring with time decay, delegation depth control
L8	Identity	Agent identity, JIT permissions, credential rotation

A few need more than a one-liner.

L3 (Context) is the most interesting one. It introduces GoalLock — an anchor block placed at the start of every conversation that the model is contractually told to honor. Combined with CanaryVault (multiple unique canary tokens per session), L3 doesn't just detect goal hijacking; it makes hijacks physically visible by checking whether canaries leak in output. Validated against 48 adversarial test cases.

L4 (Planning) goes beyond what most "guardrail" libraries attempt. The ActionChainTracker watches the sequence of actions an agent proposes and scores them as a chain, not in isolation. Reading a config file is fine. Reading a config file, then making an outbound HTTP call to a brand-new domain, then writing to /etc — that's a recon → escalation → exfiltration chain, and L4 catches the pattern.

L5 (Execution) is five sub-domains: Network Policy (DNS rebinding + SSRF protection), Rate Limiting (token bucket + circuit breaker), Resource Budget (timeout + size limits), Output Sanitizer (UTF-8 + binary strip), and Side-Effect Auditor (immutable execution records). DNS rebinding protection is rare in agent stacks — that's the attack where an allowlisted domain resolves to your cloud metadata IP after the first lookup.

L7 (Inter-Agent) is for multi-agent systems: HMAC-SHA256 mutual auth, trust scoring that decays over time, delegation-depth limits, and timestamp-bound replay prevention. If you're running CrewAI or AutoGen in production, L7 alone may justify the dependency.

L8 (Identity) gives every agent a native identity with JIT permissions and short-lived credentials — the same pattern modern human IAM uses.

Getting Started (the actual code)

Install with uv (recommended) or pip:

uv add agentarmor-core                  # core
uv add "agentarmor-core[mcp]"           # + MCP server (Claude Code, OpenClaw)
uv add "agentarmor-core[pii]"           # + Presidio PII detection
uv add "agentarmor-core[all]"           # everything

Minimum-viable usage:

import asyncio
from agentarmor import AgentArmor

async def main():
    armor = AgentArmor()

    # Register your agent with an explicit permission set
    identity, token = armor.l8_identity.register_agent(
        agent_id="my-agent",
        permissions={"read.*", "search.*"},
    )

    # Intercept a tool call through all 8 layers
    result = await armor.intercept(
        action="read.file",
        params={"path": "/data/notes.txt"},
        agent_id="my-agent",
        input_data="Read the file please",
    )

    print(f"Safe: {result.is_safe}")
    print(f"Verdict: {result.final_verdict.value}")

asyncio.run(main())

A more realistic pattern wraps tool functions with the @armor.shield decorator:

@armor.shield(action="database.query")
async def query_database(sql: str) -> dict:
    return db.execute(sql)

Now every call to query_database flows through L1 → L4 → L5 → L8, with the action name pre-bound for risk scoring.

For framework-agnostic deployment, AgentArmor also runs as a FastAPI proxy (agentarmor serve --config agentarmor.yaml --port 8400) and as a native MCP server you can plug into Claude Code or OpenClaw via ~/.claude/claude_desktop_config.json:

{
  "mcpServers": {
    "agentarmor": {
      "command": "uv",
      "args": ["run", "agentarmor-mcp"],
      "cwd": "/path/to/your/project"
    }
  }
}

The MCP server exposes six tools — armor_register_agent, armor_scan_input, armor_intercept, armor_scan_output, armor_scan_mcp_server, and armor_get_status. The MCP scanner is the one to bookmark first: full TLS + OAuth 2.1 + rug-pull check on any MCP server before your coding agent connects to it.

The Policy Engine

Layers do default-safe enforcement, but every team has its own redlines. AgentArmor's policy engine is YAML-based:

# policies/my_agent.yaml
version: "1.0"
name: "database_agent"
agent_type: "database"
risk_level: "high"

global_denied_actions:
  - "database.drop"
  - "database.truncate"

require_human_approval_for:
  - "database.delete"

rules:
  - name: "limit_transfer_amount"
    action_pattern: "transfer.*"
    conditions:
      - field: "params.amount"
        operator: ">"
        value: "1000"
    verdict: "escalate"
    priority: 100

This is the kind of policy you can actually hand to a security team. It reads like an IAM policy, supports priority-based rule resolution, and the verdict vocabulary (allow / deny / escalate) maps to real workflows — including human-in-the-loop approval gates.

OWASP ASI Top 10 Coverage

The README ships a mapping table that's worth quoting because it shows the framework actually has a threat model, not just features:

OWASP ASI Risk	AgentArmor Layer(s)
ASI01: Goal Hijacking	L1 (injection), L3 (GoalLock + canary tokens)
ASI02: Tool Misuse	L4 (chain tracking), L5 (execution gates), Policy Engine
ASI03: Identity Abuse	L8 (identity), L5 (JIT perms)
ASI04: Supply Chain	L1 (source verify), MCP Scanner
ASI05: Code Execution	L5 (5-domain enforcement), L4 (risk scoring)
ASI06: Memory Poisoning	L2 (AES-256-GCM + MAC integrity), L3 (canary tokens)
ASI07: Inter-Agent	L7 (mutual auth, trust scoring with decay)
ASI08: Cascading Failures	L4 (chain depth + circuit breaker), L5 (rate limits)
ASI09: Human Trust	L6 (5-scanner pipeline), Audit Logger
ASI10: Rogue Agents	L8 (credential rotation), L7 (trust decay)

Every cell is a concrete code path you can read. That's rare in this category — most "compliance-aware" projects ship a mapping table that turns out to be marketing.

Community Reactions

The Show HN thread leaned constructive: practitioners flagging edge cases (PII regex false positives on S3 ARNs, JIT permissions being hard to scope without breaking tool calls) with the author engaging seriously. Reddit cybersecurity threads (r/cybersecurity, r/AskNetsec) reflect the broader consensus AgentArmor is built on: prompt injection is the top OWASP risk, point solutions don't work, defense-in-depth is the answer.

Worth flagging: there's a separate academic project also called "AgentArmor" on arXiv from September 2025 (program analysis on runtime traces, 3% ASR on AgentDojo). Different project. This review covers Agastya910/agentarmor — the open-source production framework on GitHub. Naming collision is becoming a real problem in this category.

Honest Limitations

This is a v0.5.0 framework. Even with 127+ adversarial tests, there are real edges:

PII detection's recall is bounded by Microsoft Presidio. Good but not perfect, especially for non-English content and bespoke identifiers. Confidence gating helps; custom recognizers are often needed.
L4 chain tracking needs tuning per agent. A benign workflow that reads → deletes → writes (e.g., log-rotation) will trip the multi-step heuristic without policy tweaks.
Python-only in-process. Go or TypeScript runtimes need the FastAPI proxy form.
Performance overhead is non-trivial. Tens of milliseconds per intercept, dominated by Presidio. For most agents fine; for high-throughput RAG loops, bypass L6 on internal flows.
MCP scanner can't catch every rug-pull. It checks TLS, OAuth, and known patterns — but a motivated upstream can still ship a malicious update.

Who Should Use AgentArmor

Strong fit:

Teams running production agents that touch databases, file systems, or outbound APIs
Anyone using MCP with multiple servers (the armor_scan_mcp_server tool alone is worth installing)
Multi-agent systems (CrewAI, AutoGen, custom) — L7 is the cleanest open-source inter-agent auth I've seen
Anyone with a compliance team that's started asking about OWASP ASI Top 10

Probably overkill (for now):

Single-user, single-machine agents with no external network access
Pure RAG-only chat assistants with no tool calls
Experiments where you'd rather see the agent fail loudly

Comparison with Alternatives

Tool	Approach	Coverage	License
AgentArmor	8-layer defense-in-depth, Python lib + proxy + MCP	All 10 OWASP ASI risks	Apache 2.0
PromptArmor	LLM-based prompt-injection detection	Ingestion only	Commercial
Llama Guard	Content moderation classifier	Output safety	Custom
Rebuff	Multi-stage prompt-injection detection	Ingestion + heuristics	Apache 2.0
Guardrails AI	Output validation framework	Output + schema	Apache 2.0
NeMo Guardrails (NVIDIA)	Programmable guardrails (Colang)	Conversation flow	Apache 2.0

Most alternatives are point solutions. AgentArmor's differentiation is breadth — it's the first open-source project that genuinely covers every layer of the agent data flow, not just inputs or outputs.

FAQ

Q: Does AgentArmor work with Claude Code and OpenClaw?
Yes — it ships a native MCP server (agentarmor-mcp) that any MCP-compatible coding agent can call directly. The setup is five lines of JSON in your MCP config. The MCP server exposes six tools including armor_scan_mcp_server, which is one of the few utilities that audits the other MCP servers you've connected for TLS, OAuth, and rug-pull risk.

Q: How does AgentArmor compare to OpenAI's built-in safety features?
OpenAI's safety layers run inside the model and protect against content-policy violations. AgentArmor runs around your agent and protects the agent's data flow — tool calls, memory, identity, inter-agent traffic. Complementary, not competitive.

Q: Can I run only some of the eight layers?
Yes. The ArmorConfig lets you enable or disable individual layers, and each can be instantiated standalone. For incremental adoption, start with L1 + L6 + L8 and add the rest as you mature.

Q: What's the difference between this AgentArmor and the arXiv paper from September 2025?
Different projects sharing a name. The arXiv "AgentArmor" is academic research on runtime-trace program analysis (3% ASR on AgentDojo). This review covers Agastya910/agentarmor on GitHub — an open-source production framework. Verify which one you're installing.

Q: Is it ready for production?
The hardened layers (L3, L4, L5, L6) have 127+ adversarial tests and are explicitly tagged production-grade. L1, L2, L7, L8 work but haven't had the same red-team treatment yet. Reasonable to run in production behind a feature flag with audit logging on, watching v0.6.x for the remaining hardening.

Bottom Line

AgentArmor is the most architecturally honest open-source agent security project I've reviewed this year. It refuses the "one magic regex" framing, names eight distinct enforcement surfaces, maps every one to a public threat model (OWASP ASI Top 10), and ships actual adversarial tests instead of marketing benchmarks. The v0.5.0 hardening release is exactly the kind of work you want to see from a security project — the author found four layers that were too soft, rebuilt them with adversarial validation, and shipped the test suite alongside the code.

If you're running any kind of production AI agent in 2026 — coding agent, RAG with tool calls, multi-agent system — pip install agentarmor-core should be on your evaluation list this week. Even if you don't adopt the full framework, the MCP scanner alone is a free defense against the next rug-pull incident.

GitHub: github.com/Agastya910/agentarmor · PyPI: agentarmor-core · License: Apache 2.0

DEV Community