Your LangChain agent just ran rm -rf /. It was supposed to list files.
This isn't a hypothetical. AI agents call tools — shell commands, database queries, payment APIs, file operations. Every tool call is a potential security incident. And right now, most agents have zero runtime enforcement.
I built ShadowAudit to fix this. It's a deterministic, offline-first governance layer that sits between your agent and its tools. If a call exceeds your risk threshold, it's blocked. No LLM calls. No cloud dependencies. No API keys.
pip install shadowaudit
The Problem: Agents Are Unguarded
When you build an AI agent, you give it tools. A shell tool. A database tool. A payment API tool. The agent decides which tool to call and with what parameters. That's the whole point — autonomy.
But autonomy without guardrails is negligence.
| What the agent should do | What the agent might do |
|---|---|
ls -la /var/log |
rm -rf /var/log |
SELECT * FROM users WHERE id=123 |
DROP TABLE users |
transfer $10 to vendor |
transfer $10,000 to unknown_account |
Current solutions fall short:
- Prompt engineering — "Please don't do anything dangerous." Agents ignore this.
- LLM-based guardrails — Probabilistic, slow, expensive, requires API calls.
- Human-in-the-loop — Doesn't scale. You can't review 10,000 agent decisions per hour.
What you need is deterministic, runtime enforcement that works offline and blocks dangerous calls before they execute.
What ShadowAudit Does
Agent → ShadowAudit Gate → Tool (allowed)
→ Blocked (AgentActionBlocked raised)
ShadowAudit evaluates every tool call against a risk taxonomy. If the risk score exceeds the threshold, the call is blocked. The decision is logged. The agent's behavioral state is updated.
5 Lines of Code
from langchain.tools import ShellTool
from shadowaudit.framework.langchain import ShadowAuditTool
safe_shell = ShadowAuditTool(
tool=ShellTool(),
agent_id="ops-agent-1",
risk_category="command_execution",
)
safe_shell.run("ls -la") # ✅ Allowed
safe_shell.run("rm -rf /") # ❌ AgentActionBlocked raised
Same interface as the original tool. Drop-in replacement. Zero behavior change for safe calls.
CLI for CI/CD
# Scan your codebase for ungated agent tools
shadowaudit check ./src
# Block deployments if high-risk tools are ungated
shadowaudit check ./src --fail-on-ungated
# Generate a professional HTML assessment report
shadowaudit assess ./src --taxonomy financial --compliance
# Replay agent traces through the safety gate
shadowaudit simulate --trace-file agent_trace.jsonl --compare
Drop shadowaudit check --fail-on-ungated into your CI pipeline. If someone commits an ungated shell tool, the build fails.
Architecture: Deterministic, Not Probabilistic
Every AI safety tool today uses LLMs to evaluate risk. That's slow, expensive, and non-deterministic — the same input can produce different outputs.
ShadowAudit uses keyword-based scoring with pluggable strategies:
- Taxonomy lookup — finds risk category config (keywords, threshold delta, severity)
- Scoring — pluggable scorer computes risk score from payload content
- Threshold comparison — score vs. taxonomy delta determines pass/fail
- FSM transition — fail-closed state machine: anything not an explicit pass is a block
- Audit log — decision recorded with timestamp, agent ID, payload hash, and reason
- State update — K (trust) and V (velocity) metrics updated for adaptive scoring
This is auditable. Reproducible. Explainable. The kind of thing compliance auditors actually accept.
Why Offline-First Matters
ShadowAudit works fully offline. SQLite-backed state. No Redis. No cloud. No API keys.
This matters because:
- Banks run agents inside air-gapped VPCs. They can't call external APIs.
- Healthcare has HIPAA constraints. Agent data can't leave the network.
- Defense contractors work in classified environments. Zero external connectivity.
- Legal teams block any tool that sends data to third parties.
If your governance tool requires an internet connection, you've already lost these customers.
Pre-Built Taxonomies
ShadowAudit ships with three starter taxonomies:
| Taxonomy | Risk Categories | Example Keywords |
|---|---|---|
| General | shell execution, file operations, network calls |
rm, curl, chmod, wget
|
| Financial | payments, withdrawals, PII access, account modifications |
transfer, withdraw, ssn, account_number
|
| Legal | privilege waiver, regulatory filings, client data access |
waive, settle, attorney_client, file_motion
|
Each taxonomy has tuned thresholds. You can build custom ones interactively:
shadowaudit build-taxonomy
Framework Support
| Framework | Status |
|---|---|
| LangChain | ✅ First-class adapter |
| CrewAI | ✅ First-class adapter |
| AutoGen | 🔜 Next |
| OpenAI Agents SDK | 🔜 Planned |
Both adapters use duck typing — they work with any tool that has name, description, and run(). You don't need the framework installed for the adapter to work.
The Numbers
- 133 tests, 100% pass rate
- Zero flaky tests — deterministic by design
- ruff + mypy clean — strict linting from day one
- MIT licensed — use it, modify it, build on it
- Python 3.10+ — modern Python with no legacy baggage
What's Next
ShadowAudit is in alpha (v0.3.2). The core gate, CLI, framework adapters, and assessment tools are functional and tested. Here's the roadmap:
- 🔜 AutoGen adapter
- 🔜 Behavioral anomaly detection — pattern detection across sessions
- 🔜 Pro dashboard — team-level visibility, compliance reports, alerting
- 🔜 More taxonomies — healthcare, defense, e-commerce
Try It
pip install shadowaudit
AI agents are the next attack surface. Don't wait for an incident to start governing them.
Built by Anshuman Kumar. MIT licensed. Works offline.
Top comments (0)