Sahil Kathpal

Posted on May 5 • Originally published at codeongrass.com

Catch Agent Mistakes Before They Execute: Agent Verifier + Conduct

#agents #opensource #security #tutorial

By the time a manual code review catches a hardcoded API key or a retry loop with no exit condition, an AI coding agent has already written it to disk — and possibly already run it. Two freshly shipped open-source tools — Agent Verifier and Conduct — close this window by adding automated pre-execution checks that run before your agent touches anything: before files are written, before commands execute, before the damage is done. This tutorial walks through setting up both tools, the four error classes they catch, and how to combine them into a two-stage review layer alongside Claude Code or any other coding agent.

TL;DR: Agent Verifier runs static checks on your agent's pending actions and flags hardcoded secrets, unbounded loops, hallucinated tool references, and context-blowing prompts before a session runs. Conduct intercepts each action in real time with a separate reviewer agent that evaluates session context, the pending action, and the current file state before passing or blocking. Together they form a pre-execution review layer you can add to any agent workflow in under an hour — without replacing your existing approval gates.

Why Approval Gates Alone Don't Catch Agent Mistakes

The standard advice for keeping AI coding agents safe is to use approval gates: review each tool call, approve or deny, stay in the loop. That's the right instinct — but approval gates have a structural problem. They ask a human to evaluate raw tool inputs in real time, without content analysis, at the speed the agent is working.

As discussed in r/codex, developers hit a binary: either approve every action without reading it, or interrupt flow so frequently that the agent becomes more friction than value. The result is that most developers either rubber-stamp approvals or disable permission checks entirely — neither of which is safe.

Manual approval gates detect presence (a tool call is happening) but not quality (whether the tool call is correct or dangerous). An agent about to write an API key into a config file will trigger an approval modal — but the human reviewing it needs to already know to look for that pattern and catch it in the few seconds before clicking through. That's not a reliable control at any non-trivial throughput.

Pre-execution review (automated analysis of agent actions before they execute) fills the gap. Instead of asking a human to detect issues in real time, it runs structured checks or a reviewer agent that evaluates context, compares against known bad patterns, and surfaces specific findings — before the action runs. As our breakdown of the permission layer shows, presence detection is only about 2% of what a real agent control system needs to do. The other 98% is content evaluation, context management, and escalation logic — exactly what these tools provide.

What Is Pre-Execution Review?

Pre-execution review is an automated check that evaluates an AI agent's planned action against a set of criteria before the action executes. It sits between the agent's decision to call a tool and the tool actually running — giving the system a chance to evaluate, flag, or block the action before any state changes.

This is distinct from post-hoc review (reading the diff after the agent finishes) and from presence-based approval gates (clicking "approve" without evaluating content). Pre-execution review is content-aware and runs at the right moment: after the agent has decided what to do, but before it does it.

The Four Error Classes Agents Consistently Skip

Agent Verifier is built around four specific error categories that AI coding agents reliably miss — patterns a careful human reviewer would catch immediately but that agents skip because they're optimizing for task completion, not safety hygiene.

1. Hardcoded Secrets

Agents write API keys, tokens, and credentials directly into source files when that's the path of least resistance for completing a task. The agent isn't being careless — it's solving the problem it was given, and putting a secret in a config file is a valid way to make code run. But it's easy to miss in a real-time approval review.

Example of what Agent Verifier catches:

❌ Hardcoded credential detected in tool input
   Tool: Write
   File: src/config.ts
   Match: ANTHROPIC_API_KEY = "sk-ant-..."
   Fix: Use environment variable or secrets manager

2. Unbounded Retry Loops

Agents building retry logic frequently omit termination conditions. A retry loop that runs until success — with no maximum attempt count, no exponential backoff, no circuit breaker — can spin indefinitely, consuming API quota and hitting rate limits.

Example finding:

⚠️ Retry loop with no termination condition
   Tool: Bash
   Command: while ! curl -s $API_URL; do sleep 1; done
   Fix: Add maximum retry count or timeout

3. Hallucinated Tool References

When agents work with MCP (Model Context Protocol) integrations, they sometimes reference tools that don't exist in the current session — tools seen in training data or prior sessions but not registered in the current environment. These calls fail silently or with cryptic errors that are hard to debug after the fact.

Example:

❌ Reference to unregistered tool
   Tool call: use_mcp_tool("github", "create_pr")
   Available MCP tools: ["github.list_repos", "github.get_file"]
   "create_pr" is not a registered tool in this session

4. Massive System Prompts

As agent sessions grow, accumulated context can exceed the effective reasoning window. An 80k-token system prompt fed to an agent on a task requiring precise instruction-following produces degraded output — but the agent won't surface that. It attempts the task and returns something plausible-looking that doesn't honor constraints in the parts of the prompt it stopped attending to. This connects directly to why Claude agents ignore rules past ~15 tool calls — context overload is a structural failure mode, not an occasional edge case.

Prerequisites

A working Claude Code, Codex, or OpenCode agent setup
Node.js 18+ (for Agent Verifier)
Python 3.10+ (for Conduct)
Git access to both tool repositories
Recommended (not required): Grass for persistent cloud VM and mobile approval forwarding — see the Grass section below

Step 1: Set Up Agent Verifier

Agent Verifier is an open-source CLI tool that runs a structured checklist against your agent's pending session state. It integrates with Claude Code's skill system — you trigger it from within a chat session, giving you a clean pre-run gate before handing off a long autonomous task.

Install Agent Verifier:

git clone https://github.com/aurite-ai/agent-verifier
cd agent-verifier
npm install
npm run build
npm install -g .

Trigger a verification pass from your Claude Code session:

verify agent

Agent Verifier reads the current session context — the agent's recent tool calls, files staged to write, and queued commands — and produces a structured report:

Agent Verifier — Pre-Execution Report
─────────────────────────────────────
✅ 8 checks passed
⚠️ 3 warnings
❌ 2 issues

Issues (require resolution before proceeding):
  ❌ [secrets]   Hardcoded credential in Write input: src/api-client.ts
  ❌ [tool-ref]  Unregistered MCP tool referenced: "notion.create_database"

Warnings (review recommended):
  ⚠️ [loop]     Retry loop without termination: scripts/deploy.sh:42
  ⚠️ [context]  System prompt length: 78,400 tokens (threshold: 64,000)
  ⚠️ [loop]     Nested loop depth > 3: src/sync.ts:118

The workflow: run verify agent before any long autonomous session. Fix the ❌ issues — these are blockers. Review ⚠️ warnings — these are risks you're choosing to accept. Clean output means you can hand off the task with confidence. This maps cleanly onto the CORE agentic workflow's plan-review checkpoint — Agent Verifier is the tool that makes that checkpoint substantive rather than a rubber stamp.

Step 2: Set Up Conduct for Continuous Action Interception

Conduct takes a different approach. Rather than a one-time pre-run checklist, it sits in the execution path and intercepts each agent action in real time. For every action, a separate reviewer agent evaluates three inputs simultaneously:

Session context — what the agent is trying to accomplish, its recent history, and current state
Pending action — the specific tool call about to execute (tool name, inputs, target files)
Current file state — the actual content of the file being modified, if applicable

The reviewer produces a pass or block decision with structured rationale. This is a meaningful upgrade over static pattern matching: it can evaluate whether an action makes sense given what the agent is actually trying to accomplish, not just whether it matches a dangerous pattern in the abstract.

Install Conduct:

git clone https://github.com/nizos/conduct
cd conduct
pip install -e .

Configure the intercept hook in your Claude Code settings.json:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "*",
        "hooks": [
          {
            "type": "command",
            "command": "conduct review --tool $TOOL_NAME --input '$TOOL_INPUT' --session $SESSION_ID"
          }
        ]
      }
    ]
  }
}

When the hook fires, Conduct spins up a lightweight reviewer agent with the session context loaded. The reviewer evaluates the pending action and returns a structured response:

{
  "decision": "block",
  "confidence": 0.94,
  "rationale": "Tool input contains OPENAI_API_KEY literal. Session context shows user requested environment-based config. Action contradicts stated requirements.",
  "suggested_fix": "Replace literal with process.env.OPENAI_API_KEY"
}

A block decision causes the PreToolUse hook to return a non-zero exit code. Claude Code interprets this as a denial — the tool call stops before execution, and the rationale surfaces to the agent as context for its next reasoning step.

One critical configuration note: as our analysis of PreToolUse hooks shows, hooks configured on specific tool names can be circumvented when an agent constructs calls in unexpected ways. Use a "*" matcher to intercept all tools — don't try to enumerate specific tool names.

Step 3: Combine Both Tools Into a Two-Stage Gate

Agent Verifier and Conduct solve different scopes of the same problem:

	Agent Verifier	Conduct
When it runs	On-demand, before a long session	Per-action, continuously during the run
What it evaluates	Full session state and queued actions	Each individual action in session context
Reviewer type	Static pattern matching	Live LLM-based reviewer agent
Best for	Pre-run sanity check before handoff	Catching emergent issues during autonomous runs
Performance cost	Single CLI pass	Per-action LLM call (Haiku 4.5 by default)

The recommended combined workflow:

Before handing off a long run: verify agent → fix ❌ issues → review ⚠️ warnings
During the run: Conduct intercepts each action; blocks surface for your review
After the run: Standard diff review for anything that passed through

This layered approach is consistent with building effective human-in-the-loop approval gates — automated checks reduce cognitive load, directing human attention to exceptions rather than every action. The CISO's AI agent production approval checklist from ARMO frames this as "autonomous quality gates with human escalation paths": the same pattern, applied to agent workflows rather than CI/CD pipelines.

How Do You Verify the Setup Is Working?

After installing both tools, test with a deliberately bad prompt:

Write a script that fetches data from the GitHub API.
Use the key GITHUB_TOKEN = "ghp_testABC123" for now — we'll move it to env later.

With both tools active, you should observe:

Conduct blocks the Write tool call before the file is created, with rationale citing the hardcoded credential.
Agent Verifier (if run before the session) flags the same issue under ❌ [secrets].

If the Write call executes and the file is created with the literal token, the integration is not wired correctly. Check that:

The PreToolUse hook path in settings.json resolves to the installed conduct binary
Conduct has read access to the session directory (typically ~/.claude/projects/)
The matcher is "*", not a specific tool name

As Augment Code's autonomous quality gate framework recommends, treat the test cases you run during setup as your regression suite — run them again whenever you update either tool or change your agent configuration.

Troubleshooting Common Issues

Conduct is blocking too aggressively (false positives)

The reviewer agent's default confidence threshold for blocks is 0.7. Excessive false positives usually indicate stale or incomplete session context. Verify that $SESSION_ID resolves correctly to an active session file. Test Conduct in isolation before wiring it into hooks:

conduct review \
  --tool Write \
  --input '{"path": "test.ts", "content": "const x = 1;"}' \
  --session ./test-session.json

Agent Verifier reports "no agent state found"

Agent Verifier reads Claude Code's session transcript from ~/.claude/projects/<encoded-cwd>/<session-id>.jsonl. For non-standard session paths or other agents, pass the session file explicitly:

verify agent --session ./path/to/session.jsonl

Per-action Conduct calls are adding significant latency

Conduct defaults to claude-haiku-4-5 for speed. If latency is still a problem, add a skip list for low-risk read-only tools:

{
  "conduct": {
    "skip_tools": ["Read", "Glob", "ListDirectory"],
    "review_tools": ["Write", "Edit", "Bash", "WebFetch"]
  }
}

Conduct's block rationale doesn't give the agent enough context to self-correct

The rationale string from Conduct is surfaced directly to the agent as PreToolUse hook output. If the agent is looping on the same blocked action, the rationale is too vague. Increase the --detail level in the Conduct hook command — this produces longer rationale strings that give the agent more specific corrective direction. Good workflow approval design ensures the rejection message is as actionable as the approval path.

How Grass Makes This Workflow Better

The pre-execution review layer described above works on any machine running a coding agent. But running it locally creates a structural problem: your laptop sleeps, disconnects, and gets repurposed — and when it does, the agent stops, Conduct stops, and any in-progress review state is lost.

There's a more meaningful issue: Conduct's per-action reviews are generating structured decisions with full rationale. That's valuable signal — information you should be able to inspect and act on from wherever you are, not just when you're sitting at your laptop.

Grass is a machine built for AI coding agents. The always-on cloud VM gives Agent Verifier and Conduct a persistent execution environment where the reviewers stay running between your working sessions. When Conduct blocks an action at 2am during an overnight run, the block doesn't disappear — it queues in the Grass mobile app and you see it the next morning with full rationale, ready to approve, deny, or give the agent corrective context from your phone.

Setting this up on Grass:

Provision your Grass VM at codeongrass.com. Agent Verifier and Conduct run in the same VM environment as your Claude Code agent — install them once, they persist across all sessions in that workspace. No reinstall on reconnect. No state loss when you close your laptop.

In the Grass mobile app, Conduct blocks surface as permission request modals — the same interface used for standard tool approvals. You see the tool name, the input preview, and Conduct's block rationale in a formatted card. One tap to override and allow, one tap to deny and let the agent reason about the rejection.

For long autonomous runs — the multi-hour sessions where pre-execution review actually matters — you fire off the task from your phone, let the session run overnight, and wake up to a Conduct review log showing exactly which actions passed, which were flagged, and which were blocked, with full rationale for each. That's the operational layer that makes automated review practical rather than theoretical. Monitoring overnight sessions becomes significantly more actionable when blocks surface as mobile notifications rather than silent terminal output you find the next morning.

Grass is free for 10 hours with no credit card — enough to set up and validate the full Agent Verifier + Conduct stack on a real project.

Frequently Asked Questions

How is pre-execution review different from a manual approval gate?

An approval gate asks a human to evaluate each tool call in real time — presence detection with no content analysis. Pre-execution review automates content evaluation (static pattern matching, LLM-based contextual analysis) before the human sees anything, surfacing only the actions that fail specific criteria. The human's attention is directed to exceptions rather than every action. You still have an approval gate; you now have substantive analysis feeding it.

Does Conduct add significant latency to each agent action?

Yes, but it's bounded. Each Conduct review is a separate LLM call using a fast model (Haiku 4.5 by default). On a typical Write or Bash action, expect 1–3 seconds of added latency. For operations where you'd otherwise be evaluating a manual approval modal, this is a net improvement in decision quality. For read-only operations (Read, Glob, ListDirectory), skip Conduct entirely via the skip_tools config.

Can Agent Verifier and Conduct work with agents other than Claude Code?

Agent Verifier reads Claude Code's .jsonl session format natively. For other agents, pass session context explicitly as a JSON file via the --session flag. Conduct's hook integration is Claude Code-specific via PreToolUse, but the reviewer agent call can be wrapped as middleware for other agents that support pre-execution hooks — the intercept model is agent-agnostic.

What happens when Conduct blocks an action and the agent doesn't know why?

Claude Code receives the PreToolUse hook's non-zero exit and the rationale string as context for the next reasoning step. The agent then reformulates — typically removing the hardcoded secret, adding a retry limit, or surfacing the issue to the user for clarification. Conduct's structured rationale format ("Action contradicts stated requirement because...") gives the agent enough context to self-correct in most cases without needing user intervention.

Should I run both tools or just one?

They're complementary, not redundant. Agent Verifier is better as a pre-run gate before you hand off a long autonomous task — it evaluates the full session state at once and catches issues before the run starts. Conduct is better for continuous oversight during the run — catching emergent issues that weren't predictable at handoff. Running both gives you a two-stage gate that addresses both known and emergent risks.

Next Steps

Start with Agent Verifier — it's the lower-friction entry point. Run verify agent on your next Claude Code session before you step away from the keyboard. Fix the ❌ issues, note the ⚠️ warnings, and observe how many issues surface that you wouldn't have caught in a manual review. Then layer in Conduct for sessions long enough to warrant continuous oversight.

For the full picture, pair this with the CORE agentic workflow — pre-execution review fits naturally at the plan checkpoint, before you hand off from plan to execute. The automated checks don't replace that human checkpoint; they make it worth something.

For persistent execution, mobile review, and the ability to handle Conduct blocks wherever you are: Grass gives Agent Verifier and Conduct the always-on environment they need to be more than a local-only safeguard.

Originally published at codeongrass.com

DEV Community