Justin Yuan

Posted on Mar 7 • Originally published at github.com

I built an open-source firewall for AI agents — it blocks dangerous tool calls before they execute

#openclaw #security #ai #agents

The problem nobody talks about

Every AI agent framework — LangChain, CrewAI, Anthropic, OpenAI — gives the LLM full control over which tools to call and with what arguments.

The model says "run this SQL query: DROP TABLE users" and your code just... executes it. No confirmation. No policy check. No audit trail.

Existing observability tools (LangFuse, Helicone, Arize) log what happened. That's useful for debugging. But the database is already gone.

What I built

AEGIS is an open-source, self-hosted firewall that sits between your AI agent and its tools.

It doesn't just observe — it intercepts and blocks before execution.

How it works

Agent calls a tool → AEGIS SDK intercepts → Gateway classifies (SQL? file? shell?) → Policy engine evaluates (injection? traversal? exfiltration?) → Decision: allow / block / pending (human reviews) → Ed25519 signed, SHA-256 hash-chained, stored in dashboard.

One line to integrate

import agentguard
agentguard.auto("http://localhost:8080")

# Your existing agent code — completely unchanged
client = anthropic.Anthropic()
response = client.messages.create(model="claude-sonnet-4-20250514", tools=[...])

What it catches out of the box

SQL injection — DROP, DELETE, TRUNCATE in database tools
Path traversal — ../../etc/passwd, sensitive directories
Command injection — rm -rf, curl | sh, shell metacharacters
Prompt injection — "ignore previous instructions" patterns
Data exfiltration — large payloads to external endpoints
PII leakage — SSN, email, phone, credit card, API keys (auto-redacted)

Human-in-the-loop

For high-risk actions, the agent pauses. You open the Compliance Cockpit, see the exact tool name and arguments, and click Allow or Block. The agent resumes in under a second.

agentguard.auto(
    "http://localhost:8080",
    blocking_mode=True,
    human_approval_timeout_s=300,
)

The dashboard

The Compliance Cockpit gives you:

Real-time trace stream with risk badges
Pending approvals queue
Token cost tracking (40+ models)
Session grouping
Anomaly detection
PII auto-redaction
Alert rules (Slack, PagerDuty, webhook)
Kill switch (auto-revoke after N violations)
Forensic export (PDF + CSV)
Agent behavior baseline (7-day profile)

SDK support

Python (9 frameworks, all auto-patched): Anthropic, OpenAI, LangChain/LangGraph, CrewAI, Google Gemini, AWS Bedrock, Mistral, LlamaIndex, smolagents

JavaScript/TypeScript:

import agentguard from '@justinnn/agentguard'
agentguard.auto('http://localhost:8080', { agentId: 'my-agent' })

Go (zero dependencies, stdlib only):

guard := agentguard.Auto()
result, err := guard.Wrap("query_db", args, queryFn)

Cryptographic audit trail

Every trace is Ed25519 signed and SHA-256 hash-chained. Modifying any record breaks the chain. This isn't logging — it's tamper-evident, cryptographically verifiable proof.

Deploy in 30 seconds

git clone https://github.com/Justin0504/Aegis
cd Aegis
docker compose up -d

Dashboard at localhost:3000. Gateway at localhost:8080.

Self-hosted. MIT licensed. No telemetry. No data leaves your infrastructure.

Try it

GitHub: https://github.com/Justin0504/Aegis

There's also a live demo agent (Claude-powered research assistant with its own chat UI) that walks through every feature: tracing, SQL injection blocking, PII detection, and human approval flow.

I'd love to hear what policies you'd want built in. Issues and PRs welcome.

Top comments (3)

Narnaiezzsshaa Truong • Mar 7

AEGIS is genuinely well-built for what it is—the cryptographic audit trail is thoughtful, the human-in-the-loop approval flow is the right instinct, and the one-line integration is clean. You clearly know what you're doing at the tool layer.

Justin Yuan • Mar 8

Thank you — really appreciate the thoughtful feedback. The cryptographic audit trail was one of the earliest design decisions: if agents are going to act autonomously, the audit log has to be

trustworthy by construction, not just by convention. And human-in-the-loop felt non-negotiable for high-risk actions — observability alone isn't enough when the damage is irreversible.

Would love to hear if there are specific policies or integrations you'd want to see next. Always iterating based on real-world use cases.

Narnaiezzsshaa Truong • Mar 11

Justin, you might find this article/post helpful: dev.to/narnaiezzsshaa/why-tool-cal...