DEV Community

Cover image for I built an open-source firewall for AI agents — it blocks dangerous tool calls before they execute
Justin Yuan
Justin Yuan

Posted on • Originally published at github.com

I built an open-source firewall for AI agents — it blocks dangerous tool calls before they execute

The problem nobody talks about

Every AI agent framework — LangChain, CrewAI, Anthropic, OpenAI — gives the LLM full control over which tools to call and with what arguments.

The model says "run this SQL query: DROP TABLE users" and your code just... executes it. No confirmation. No policy check. No audit trail.

Existing observability tools (LangFuse, Helicone, Arize) log what happened. That's useful for debugging. But the database is already gone.

What I built


AEGIS is an open-source, self-hosted firewall that sits between your AI agent and its tools.

It doesn't just observe — it intercepts and blocks before execution.

How it works

Agent calls a tool → AEGIS SDK intercepts → Gateway classifies (SQL? file? shell?) → Policy engine evaluates (injection? traversal? exfiltration?) → Decision: allow / block / pending (human reviews) → Ed25519 signed, SHA-256 hash-chained, stored in dashboard.

One line to integrate

import agentguard
agentguard.auto("http://localhost:8080")

# Your existing agent code — completely unchanged
client = anthropic.Anthropic()
response = client.messages.create(model="claude-sonnet-4-20250514", tools=[...])
Enter fullscreen mode Exit fullscreen mode

What it catches out of the box

  • SQL injection — DROP, DELETE, TRUNCATE in database tools
  • Path traversal../../etc/passwd, sensitive directories
  • Command injectionrm -rf, curl | sh, shell metacharacters
  • Prompt injection — "ignore previous instructions" patterns
  • Data exfiltration — large payloads to external endpoints
  • PII leakage — SSN, email, phone, credit card, API keys (auto-redacted)

Human-in-the-loop

For high-risk actions, the agent pauses. You open the Compliance Cockpit, see the exact tool name and arguments, and click Allow or Block. The agent resumes in under a second.

agentguard.auto(
    "http://localhost:8080",
    blocking_mode=True,
    human_approval_timeout_s=300,
)
Enter fullscreen mode Exit fullscreen mode

The dashboard

The Compliance Cockpit gives you:

  • Real-time trace stream with risk badges
  • Pending approvals queue
  • Token cost tracking (40+ models)
  • Session grouping
  • Anomaly detection
  • PII auto-redaction
  • Alert rules (Slack, PagerDuty, webhook)
  • Kill switch (auto-revoke after N violations)
  • Forensic export (PDF + CSV)
  • Agent behavior baseline (7-day profile)

SDK support

Python (9 frameworks, all auto-patched): Anthropic, OpenAI, LangChain/LangGraph, CrewAI, Google Gemini, AWS Bedrock, Mistral, LlamaIndex, smolagents

JavaScript/TypeScript:

import agentguard from '@justinnn/agentguard'
agentguard.auto('http://localhost:8080', { agentId: 'my-agent' })
Enter fullscreen mode Exit fullscreen mode

Go (zero dependencies, stdlib only):

guard := agentguard.Auto()
result, err := guard.Wrap("query_db", args, queryFn)
Enter fullscreen mode Exit fullscreen mode

Cryptographic audit trail

Every trace is Ed25519 signed and SHA-256 hash-chained. Modifying any record breaks the chain. This isn't logging — it's tamper-evident, cryptographically verifiable proof.

Deploy in 30 seconds

git clone https://github.com/Justin0504/Aegis
cd Aegis
docker compose up -d
Enter fullscreen mode Exit fullscreen mode

Dashboard at localhost:3000. Gateway at localhost:8080.

Self-hosted. MIT licensed. No telemetry. No data leaves your infrastructure.

Try it

GitHub: https://github.com/Justin0504/Aegis

There's also a live demo agent (Claude-powered research assistant with its own chat UI) that walks through every feature: tracing, SQL injection blocking, PII detection, and human approval flow.

I'd love to hear what policies you'd want built in. Issues and PRs welcome.

Top comments (1)

Collapse
 
narnaiezzsshaa profile image
Narnaiezzsshaa Truong

AEGIS is genuinely well-built for what it is—the cryptographic audit trail is thoughtful, the human-in-the-loop approval flow is the right instinct, and the one-line integration is clean. You clearly know what you're doing at the tool layer.