Originally published on andrew.ooo — visit the original for any updates, code snippets that aged out, or follow-up posts.
TL;DR
AgentArmor is a new open-source security framework that wraps any AI agent with eight independent enforcement layers — ingestion, storage, context, planning, execution, output, inter-agent, and identity. It's built specifically against the OWASP Top 10 for Agentic Applications (2026) and ships as a Python library, a FastAPI proxy, and a native MCP server you can plug into Claude Code or OpenClaw with five lines of JSON.
After three weeks of agent-security tooling launches — most of them point solutions (a prompt-injection scanner here, a PII redactor there) — AgentArmor is the first I've seen that takes the boring-but-correct approach: every data flow point in an agent's lifecycle is a separate enforcement layer with its own threat model. Highlights from the v0.5.0 release:
- 8 independent layers mapped 1-to-1 to the OWASP ASI Top 10 risk catalog
- 127+ adversarial test cases validating the four hardened layers (L3–L6) end-to-end
- AES-256-GCM at rest for stored memory, HMAC-SHA256 mutual auth for inter-agent traffic
-
Native MCP server (
agentarmor-mcp) — six tools any MCP-compatible agent can call -
Apache 2.0,
pip install agentarmor-core, integrations for LangChain, OpenAI Agents SDK, and MCP servers - Show HN traction in early May with hands-on demos blocking real attacks against a local Ollama agent
This review walks through what AgentArmor actually does at each layer, the code you'd write to use it, what the 127 adversarial tests cover, and where the framework still has hard edges.
What is AgentArmor?
AgentArmor (GitHub: Agastya910/agentarmor) is a defense-in-depth security framework that sits around your agent runtime, not inside it. You don't rewrite your LangChain or OpenAI Agents code; you wrap tool calls, LLM responses, and memory writes in armor.intercept(...) and let eight layers each do their job.
The framing the author articulated on Show HN: most "agent security" tooling today is a point solution. You bolt on a prompt-injection scanner. Then a PII redactor. Then a permissions wrapper. Each works in isolation. An attacker who slips past the first scanner has a clean shot at the tool runtime, because nothing downstream is looking for the second-stage chain.
AgentArmor's pitch is that an agent has eight distinct data-flow surfaces — ingestion, storage, context, planning, execution, output, inter-agent, identity — and each needs its own enforcement engine. The README's ASCII diagram makes it concrete:
MCP Agents (Claude Code, OpenClaw, Cursor, etc.)
│ stdio (agentarmor-mcp)
▼
┌─────────────────────────────┐
│ AgentArmor Pipeline │
│ L8: Identity & IAM │
│ L1: Data Ingestion │
│ L2: Memory/Storage │
│ L3: Context Assembly │
│ L4: Plan Validation │
│ L5: Action Execution │
│ L7: Inter-Agent Security │
│ ────────────────────────── │
│ L6: Output Filter (post) │
│ Audit Logger (cross-cut) │
│ Policy Engine (cross-cut) │
└─────────────────────────────┘
│
▼
External Tools / APIs / LLMs
It's Apache 2.0, pure Python, lives at agentarmor-core on PyPI, and the v0.5.0 release explicitly upgrades four of the eight layers from "basic" to "production-grade, adversarially-tested" — unusually honest framing for a v0.x project.
Why It's Trending NOW
The Show HN went up in early May 2026 with a hands-on demo: a local Ollama agent (qwen2:7b) running tool calls, and AgentArmor blocking a database.delete at L8 (permission check), redacting PII from file content at L6, and killing a prompt injection at L1 before it reached the model.
Three structural reasons it's getting attention right now:
- OWASP ASI Top 10 just stabilized. The Agentic Security & Integrity Top 10 left draft status in December 2025 and is now the de-facto checklist enterprise security teams point at. AgentArmor is the first open-source framework that maps cleanly to all ten risks.
-
MCP server sprawl is creating real incidents. Teams adding three or four MCP servers to a coding agent have effectively granted that agent network, filesystem, and database access with no boundary between them. AgentArmor's
armor_scan_mcp_serveris one of the few utilities that audits MCP servers for rug-pull risk and missing TLS/OAuth. - The "agent ran amok" stories landed. From the Slack indirect-prompt-injection PromptArmor disclosure to production incidents where coding agents wiped repos or leaked credentials, founders aren't arguing about whether agent security matters anymore — they're shopping for tooling.
The Eight Layers (and what each one actually does)
The whole framework is organized around this table from the README:
| Layer | Name | What It Protects |
|---|---|---|
| L1 | Ingestion | Input scanning, prompt-injection detection, source verification |
| L2 | Storage | AES-256-GCM encryption at rest, HMAC integrity, tamper detection |
| L3 | Context | GoalLock anchoring, multi-canary injection, template injection stripping |
| L4 | Planning | Action chain tracking, semantic risk scoring, multi-step attack detection |
| L5 | Execution | DNS rebinding protection, rate limiting, circuit breakers, resource budgets |
| L6 | Output | Credential redaction, PII scanning, harmful content blocking, exfiltration detection |
| L7 | Inter-Agent | Mutual auth (HMAC), trust scoring with time decay, delegation depth control |
| L8 | Identity | Agent identity, JIT permissions, credential rotation |
A few need more than a one-liner.
L3 (Context) is the most interesting one. It introduces GoalLock — an anchor block placed at the start of every conversation that the model is contractually told to honor. Combined with CanaryVault (multiple unique canary tokens per session), L3 doesn't just detect goal hijacking; it makes hijacks physically visible by checking whether canaries leak in output. Validated against 48 adversarial test cases.
L4 (Planning) goes beyond what most "guardrail" libraries attempt. The ActionChainTracker watches the sequence of actions an agent proposes and scores them as a chain, not in isolation. Reading a config file is fine. Reading a config file, then making an outbound HTTP call to a brand-new domain, then writing to /etc — that's a recon → escalation → exfiltration chain, and L4 catches the pattern.
L5 (Execution) is five sub-domains: Network Policy (DNS rebinding + SSRF protection), Rate Limiting (token bucket + circuit breaker), Resource Budget (timeout + size limits), Output Sanitizer (UTF-8 + binary strip), and Side-Effect Auditor (immutable execution records). DNS rebinding protection is rare in agent stacks — that's the attack where an allowlisted domain resolves to your cloud metadata IP after the first lookup.
L7 (Inter-Agent) is for multi-agent systems: HMAC-SHA256 mutual auth, trust scoring that decays over time, delegation-depth limits, and timestamp-bound replay prevention. If you're running CrewAI or AutoGen in production, L7 alone may justify the dependency.
L8 (Identity) gives every agent a native identity with JIT permissions and short-lived credentials — the same pattern modern human IAM uses.
Getting Started (the actual code)
Install with uv (recommended) or pip:
uv add agentarmor-core # core
uv add "agentarmor-core[mcp]" # + MCP server (Claude Code, OpenClaw)
uv add "agentarmor-core[pii]" # + Presidio PII detection
uv add "agentarmor-core[all]" # everything
Minimum-viable usage:
import asyncio
from agentarmor import AgentArmor
async def main():
armor = AgentArmor()
# Register your agent with an explicit permission set
identity, token = armor.l8_identity.register_agent(
agent_id="my-agent",
permissions={"read.*", "search.*"},
)
# Intercept a tool call through all 8 layers
result = await armor.intercept(
action="read.file",
params={"path": "/data/notes.txt"},
agent_id="my-agent",
input_data="Read the file please",
)
print(f"Safe: {result.is_safe}")
print(f"Verdict: {result.final_verdict.value}")
asyncio.run(main())
A more realistic pattern wraps tool functions with the @armor.shield decorator:
@armor.shield(action="database.query")
async def query_database(sql: str) -> dict:
return db.execute(sql)
Now every call to query_database flows through L1 → L4 → L5 → L8, with the action name pre-bound for risk scoring.
For framework-agnostic deployment, AgentArmor also runs as a FastAPI proxy (agentarmor serve --config agentarmor.yaml --port 8400) and as a native MCP server you can plug into Claude Code or OpenClaw via ~/.claude/claude_desktop_config.json:
{
"mcpServers": {
"agentarmor": {
"command": "uv",
"args": ["run", "agentarmor-mcp"],
"cwd": "/path/to/your/project"
}
}
}
The MCP server exposes six tools — armor_register_agent, armor_scan_input, armor_intercept, armor_scan_output, armor_scan_mcp_server, and armor_get_status. The MCP scanner is the one to bookmark first: full TLS + OAuth 2.1 + rug-pull check on any MCP server before your coding agent connects to it.
The Policy Engine
Layers do default-safe enforcement, but every team has its own redlines. AgentArmor's policy engine is YAML-based:
# policies/my_agent.yaml
version: "1.0"
name: "database_agent"
agent_type: "database"
risk_level: "high"
global_denied_actions:
- "database.drop"
- "database.truncate"
require_human_approval_for:
- "database.delete"
rules:
- name: "limit_transfer_amount"
action_pattern: "transfer.*"
conditions:
- field: "params.amount"
operator: ">"
value: "1000"
verdict: "escalate"
priority: 100
This is the kind of policy you can actually hand to a security team. It reads like an IAM policy, supports priority-based rule resolution, and the verdict vocabulary (allow / deny / escalate) maps to real workflows — including human-in-the-loop approval gates.
OWASP ASI Top 10 Coverage
The README ships a mapping table that's worth quoting because it shows the framework actually has a threat model, not just features:
| OWASP ASI Risk | AgentArmor Layer(s) |
|---|---|
| ASI01: Goal Hijacking | L1 (injection), L3 (GoalLock + canary tokens) |
| ASI02: Tool Misuse | L4 (chain tracking), L5 (execution gates), Policy Engine |
| ASI03: Identity Abuse | L8 (identity), L5 (JIT perms) |
| ASI04: Supply Chain | L1 (source verify), MCP Scanner |
| ASI05: Code Execution | L5 (5-domain enforcement), L4 (risk scoring) |
| ASI06: Memory Poisoning | L2 (AES-256-GCM + MAC integrity), L3 (canary tokens) |
| ASI07: Inter-Agent | L7 (mutual auth, trust scoring with decay) |
| ASI08: Cascading Failures | L4 (chain depth + circuit breaker), L5 (rate limits) |
| ASI09: Human Trust | L6 (5-scanner pipeline), Audit Logger |
| ASI10: Rogue Agents | L8 (credential rotation), L7 (trust decay) |
Every cell is a concrete code path you can read. That's rare in this category — most "compliance-aware" projects ship a mapping table that turns out to be marketing.
Community Reactions
The Show HN thread leaned constructive: practitioners flagging edge cases (PII regex false positives on S3 ARNs, JIT permissions being hard to scope without breaking tool calls) with the author engaging seriously. Reddit cybersecurity threads (r/cybersecurity, r/AskNetsec) reflect the broader consensus AgentArmor is built on: prompt injection is the top OWASP risk, point solutions don't work, defense-in-depth is the answer.
Worth flagging: there's a separate academic project also called "AgentArmor" on arXiv from September 2025 (program analysis on runtime traces, 3% ASR on AgentDojo). Different project. This review covers Agastya910/agentarmor — the open-source production framework on GitHub. Naming collision is becoming a real problem in this category.
Honest Limitations
This is a v0.5.0 framework. Even with 127+ adversarial tests, there are real edges:
- PII detection's recall is bounded by Microsoft Presidio. Good but not perfect, especially for non-English content and bespoke identifiers. Confidence gating helps; custom recognizers are often needed.
- L4 chain tracking needs tuning per agent. A benign workflow that reads → deletes → writes (e.g., log-rotation) will trip the multi-step heuristic without policy tweaks.
- Python-only in-process. Go or TypeScript runtimes need the FastAPI proxy form.
- Performance overhead is non-trivial. Tens of milliseconds per intercept, dominated by Presidio. For most agents fine; for high-throughput RAG loops, bypass L6 on internal flows.
- MCP scanner can't catch every rug-pull. It checks TLS, OAuth, and known patterns — but a motivated upstream can still ship a malicious update.
Who Should Use AgentArmor
Strong fit:
- Teams running production agents that touch databases, file systems, or outbound APIs
- Anyone using MCP with multiple servers (the
armor_scan_mcp_servertool alone is worth installing) - Multi-agent systems (CrewAI, AutoGen, custom) — L7 is the cleanest open-source inter-agent auth I've seen
- Anyone with a compliance team that's started asking about OWASP ASI Top 10
Probably overkill (for now):
- Single-user, single-machine agents with no external network access
- Pure RAG-only chat assistants with no tool calls
- Experiments where you'd rather see the agent fail loudly
Comparison with Alternatives
| Tool | Approach | Coverage | License |
|---|---|---|---|
| AgentArmor | 8-layer defense-in-depth, Python lib + proxy + MCP | All 10 OWASP ASI risks | Apache 2.0 |
| PromptArmor | LLM-based prompt-injection detection | Ingestion only | Commercial |
| Llama Guard | Content moderation classifier | Output safety | Custom |
| Rebuff | Multi-stage prompt-injection detection | Ingestion + heuristics | Apache 2.0 |
| Guardrails AI | Output validation framework | Output + schema | Apache 2.0 |
| NeMo Guardrails (NVIDIA) | Programmable guardrails (Colang) | Conversation flow | Apache 2.0 |
Most alternatives are point solutions. AgentArmor's differentiation is breadth — it's the first open-source project that genuinely covers every layer of the agent data flow, not just inputs or outputs.
FAQ
Q: Does AgentArmor work with Claude Code and OpenClaw?
Yes — it ships a native MCP server (agentarmor-mcp) that any MCP-compatible coding agent can call directly. The setup is five lines of JSON in your MCP config. The MCP server exposes six tools including armor_scan_mcp_server, which is one of the few utilities that audits the other MCP servers you've connected for TLS, OAuth, and rug-pull risk.
Q: How does AgentArmor compare to OpenAI's built-in safety features?
OpenAI's safety layers run inside the model and protect against content-policy violations. AgentArmor runs around your agent and protects the agent's data flow — tool calls, memory, identity, inter-agent traffic. Complementary, not competitive.
Q: Can I run only some of the eight layers?
Yes. The ArmorConfig lets you enable or disable individual layers, and each can be instantiated standalone. For incremental adoption, start with L1 + L6 + L8 and add the rest as you mature.
Q: What's the difference between this AgentArmor and the arXiv paper from September 2025?
Different projects sharing a name. The arXiv "AgentArmor" is academic research on runtime-trace program analysis (3% ASR on AgentDojo). This review covers Agastya910/agentarmor on GitHub — an open-source production framework. Verify which one you're installing.
Q: Is it ready for production?
The hardened layers (L3, L4, L5, L6) have 127+ adversarial tests and are explicitly tagged production-grade. L1, L2, L7, L8 work but haven't had the same red-team treatment yet. Reasonable to run in production behind a feature flag with audit logging on, watching v0.6.x for the remaining hardening.
Bottom Line
AgentArmor is the most architecturally honest open-source agent security project I've reviewed this year. It refuses the "one magic regex" framing, names eight distinct enforcement surfaces, maps every one to a public threat model (OWASP ASI Top 10), and ships actual adversarial tests instead of marketing benchmarks. The v0.5.0 hardening release is exactly the kind of work you want to see from a security project — the author found four layers that were too soft, rebuilt them with adversarial validation, and shipped the test suite alongside the code.
If you're running any kind of production AI agent in 2026 — coding agent, RAG with tool calls, multi-agent system — pip install agentarmor-core should be on your evaluation list this week. Even if you don't adopt the full framework, the MCP scanner alone is a free defense against the next rug-pull incident.
GitHub: github.com/Agastya910/agentarmor · PyPI: agentarmor-core · License: Apache 2.0
Top comments (0)