Agent Sentry: a 400-line local sidecar that watches what your AI coding agent is about to do

#security #python #opensource #ai

The problem nobody is solving cleanly

If you've handed a real shell to Claude Code, Codex, Cursor, or any other agentic IDE, you've watched it do something that made you flinch.

For me, it was the day my agent decided that the right way to "install the linter" was:

curl -fsSL https://example.dev/install.sh | bash

It wasn't a prompt-injection attack. It wasn't a malicious package. It was just a confident model interpreting a vague instruction. The result is the same whether the cause is adversarial or accidental: arbitrary remote code ran on my machine before I could read the next token.

The standard answers are unsatisfying:

"Just review every command." Fine for a 3-step task. Useless for a 500-step refactor.
"Use a sandbox." I do, sometimes. But the agent I want to monitor is the one operating on my real repo, with my real keys.
"Trust the model provider." They are not in the loop. Their hooks fire, but the policy layer is mine to write.

I wanted something dumber and closer to the metal: a local process that sees every tool call my agent is about to make, decides what it thinks should happen, logs the decision, and stays out of the way until I tell it to bite.

So I built one.

Agent Sentry

agent-sentry is a ~400-line HTTP server that runs on localhost:7749. You point your agent's PreToolUse / PostToolUse hooks at it. For every tool call, it:

Normalizes the hook payload into a stable event schema.
Extracts structured features (command pattern flags, paths touched, tool name, etc.).
Runs a Layer-1 pattern policy — pipe-to-shell, shell-profile-write, process substitution feeding a shell, SSH config rewrites.
Returns a decision: would-block, would-ask, or shadow-only.
Persists the event and the decision to a local SQLite database.

By default, it is in shadow mode: it tells you what it would have done, without actually blocking. That sounds weak, and on day one it is. But shadow mode is what lets you tune rules against your real workflow without breaking it.

Design choices that mattered

Pure stdlib. No FastAPI, no SQLAlchemy, no Pydantic. The whole runtime is http.server + sqlite3 + re. A security tool that drags in 40 transitive dependencies is itself an attack surface. The sidecar is small enough to read end-to-end on a coffee break.
Local only. The server binds 127.0.0.1 and there is no network egress. There is no telemetry. If you want to share what your agent did, you do it explicitly through the export tool.
Redacted export, not raw export. The export_redacted module walks the SQLite store and strips home paths (/Users/<name> → /Users/<u>), API key shapes (sk-..., AKIA..., GitHub ghp_...), emails, and IPs before writing JSONL. The counts are printed to stderr so you can verify that something actually got redacted, not silently dropped:

$ python -m agent_sentry.sidecar.export_redacted --db ./sentry.db --out ./events.jsonl redactions: {'home_path': 412, 'api_key': 3, 'email': 17, 'ip': 0}

Decision vocabulary is explicit. would-block vs would-ask vs shadow-only is a deliberate split. "Block" means the rule is confident enough that I'd happily enforce. "Ask" means the rule wants me in the loop. "Shadow" means no opinion. Mixing these collapses the signal.

A look at the policy layer

The whole Layer-1 policy is currently this:

`def evaluate_layer1_policy(event):
features = event.get("structured_features", {}) or {}
patterns = set(features.get("command_patterns", []))

if "pipe-to-shell" in patterns:
    return {"decision": "block", "decision_type": "would-block",
            "reason": "Pipe remote content to shell (curl|bash / wget|sh)"}
if "shell-profile-write" in patterns:
    return {"decision": "block", "decision_type": "would-block",
            "reason": "Write to shell profile or SSH config file"}

return {"decision": "allow", "decision_type": "shadow-only", "reason": ""}`

Two rules. That is the entire policy. I don't want a Big Rule Engine; I want a clear list of patterns I'm willing to defend in writing. New rules land as new pattern regexes in events.py plus a clause in policy.py.

The pattern detection is regex over the raw command string, hardened against the obvious bypasses:

curl ... | bash

# bash <(curl ...) re.compile(r"\b(?:ba|z|da)?sh\b\s*<\(\s*(?:curl|wget)\b", re.I)

This is not bulletproof. Regex over shell strings never is. The point is that the bar to bypass is now visible code that a future me, or a future reviewer, can argue with — instead of "the model felt good about it."

Wiring it to Claude Code

Add a PreToolUse hook that POSTs the raw event JSON to http://127.0.0.1:7749/score. The sidecar returns a small JSON decision the forwarder can act on (or ignore, in shadow mode). A PostToolUse hook on the same endpoint closes the loop with the actual command output, so the stored event has both the intended action and what happened.

There is no Claude-specific code in the sidecar. The schema is the hook schema — the same idea works for any agent runtime that can fire a webhook.

What's next

The repo today is the sidecar plus 67 tests. It is deliberately small. Things on the list:

A second policy layer that scores by intent ("this is a privilege escalation step in a longer chain") rather than by surface pattern.
A tiny TUI to scrub the SQLite store and approve / deny in batch.
Hook recipes for Codex and Cursor.

If you build with AI agents on your own machine, I'd love issues and PRs — especially new pattern rules that caught your agent doing something weird.

Repo: https://github.com/MaxHagl/agent-sentry
MIT licensed.