Sahil Kathpal

Posted on Apr 24 • Originally published at codeongrass.com

The Permission Layer Is 98% of Agent Engineering

#agents #ai #architecture #security

Building an AI coding agent is not primarily about choosing the right model. It's about building the infrastructure around the model that keeps it safe, bounded, and trustworthy. A production agent harness contains only about 1–2% actual AI logic — the remaining 98% is permission infrastructure, safety layers, context management, and blast-radius controls. This guide maps all five architectural pillars, shows where each one fails with concrete examples, and gives you the mental model you need to design a harness that actually holds.

TL;DR: A production agent permission layer has five components: approval modes (what the agent can do without asking), hook composition (where inline gates live), sandboxing (what the agent can touch), context management (what the agent knows), and subagent delegation (what spawned agents inherit). Hooks are necessary but not sufficient — they can be bypassed. The only enforcement that the model cannot circumvent is a layer running outside the agent process.

Why the Model Is the Easy Part

If you've spent an afternoon with Claude Code or Codex, you know that getting the model to write code is not the bottleneck. The bottleneck is everything else: what does the agent have permission to touch, how do you handle a destructive bash command at 2 AM, how do you prevent a credential leak when the agent is exploring your filesystem?

A thread on r/openclaw put it precisely: only ~1–2% of the code in a production agent harness is actual AI logic, and the rest is infra around it. That framing holds across every production agent deployment, and what can go wrong with agents in production is a long and specific list. The failure modes are structural, not model-dependent.

This guide gives you a mental model for the five real engineering challenges.

Prerequisites

Before implementing a permission layer, you need:

An agent that exposes a hook or permission API (Claude Code, Codex, OpenCode)
A clear policy for what the agent is allowed to do by default (see Pillar 1)
A threat model: are you protecting against accidental damage, credential leaks, or both?
Node.js 18+ if you're writing custom hook scripts

Definition: An agent permission layer is the set of mechanisms that control what an AI coding agent can read, write, execute, or communicate — and who can grant or deny those capabilities at runtime.

Pillar 1: Approval Modes — What Can the Agent Do Without Asking?

Every agent harness has an approval mode: an implicit or explicit policy governing how tool invocations are handled before the agent executes them. Claude Code exposes this directly. There are three practical positions:

Full trust (--dangerously-skip-permissions): All tool calls execute without prompting. Useful for tightly scoped CI pipelines where the blast radius is already contained by the execution environment. Notably, a community thread exploring this flag found that the agent actually plans differently when it knows it has full permission — more aggressively, with fewer natural check-ins. The mode affects agent behavior, not just safety posture.

Interactive approval (default): The agent pauses before destructive tool use and waits for explicit confirmation. This is the baseline. An agent approval gate is the point at which the agent stops and waits for a human decision before continuing.

Structured deny-by-default: The harness ships a deny-all policy and explicitly allowlists specific operations. The hardest to maintain but the only position that yields a genuine security posture.

The design decision isn't which mode feels right — it's which mode you can operationally sustain. If interactive approval creates so much friction that you default to skipping it, you've already made your security decision implicitly. The full range of options for handling Claude Code's approval behavior is worth reading before you commit to a default.

Pillar 2: Hook Composition — Inline Gates and Their Limits

Claude Code's PreToolUse hooks are the primary inline gate mechanism. They fire before a tool invocation executes, receive the tool name and input, and can block or modify the call. Here's a minimal hook blocking writes to .env files:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Write|Edit|MultiEdit",
        "hooks": [
          {
            "type": "command",
            "command": "bash /path/to/env-guard.sh"
          }
        ]
      }
    ]
  }
}

#!/bin/bash
# env-guard.sh
input=$(cat)
if echo "$input" | grep -q '\.env'; then
  echo '{"decision": "block", "reason": "Direct writes to .env are not permitted."}'
  exit 0
fi
echo '{"decision": "allow"}'

This looks correct. It isn't sufficient.

A documented bypass proof-of-concept demonstrated that comprehensive PreToolUse hooks still left .env contents accessible. The bypass vectors include: reading the file rather than writing it, calling a subprocess that reads it, using an MCP tool that the hook matcher doesn't cover, or constructing a multi-step sequence where no single tool call looks dangerous in isolation.

One community-built response to this limitation is the meta-cognition gate: a filesystem hook that forces structured reasoning before any high-impact mutation. Before the agent can touch core files, it must emit a structured object mapping the full blast radius:

{
  "blast_radius": {
    "files_affected": ["src/auth/middleware.ts"],
    "state_changes": ["session validation logic"],
    "rollback_path": "git reset HEAD~1"
  }
}

This doesn't prevent bypasses, but it raises the cost of accidental destruction by forcing the model to surface its reasoning before executing.

The key insight: hooks are good at preventing accidental harm from straightforward tool calls. They are not good at preventing systematic harm from a model that has decided it needs access to something.

Pillar 3: Sandboxing — Containing Blast Radius

Sandboxing is the layer that hooks cannot replace: physical isolation of the execution environment from sensitive resources.

The strongest pattern is the opaque token broker, demonstrated by devcontainer-mcp, a container-based isolation tool built specifically because agents were "installing random crap on the host." The design: the agent never receives actual credentials. It gets opaque handles — references that the broker resolves at execution time.

Agent → requests handle "db-prod"
Broker → resolves to actual connection, executes operation
Agent → receives result, never sees the credential string

The agent can use a database connection but cannot print the connection string. It can push to a git remote but cannot read the OAuth token. This is the architecture that AgentRFC's security design principles identify as essential for production deployments: agents receive capabilities, not credentials.

Beyond credential isolation, filesystem sandboxing defines traversal scope. A well-implemented harness validates that all path arguments stay inside the registered project root, enforces file size caps on reads (5 MB is a reasonable default), and rejects any path that resolves outside the sandbox after symlink expansion.

Network isolation is harder. Container-based sandboxes can restrict outbound connections to an allowlist, but the agent's own API calls legitimately need outbound access, which creates an unavoidable hole unless you're proxying agent API traffic through your own endpoint.

Pillar 4: Context Management — What the Agent Knows

Context management is the least-discussed pillar and one of the most consequential. An agent operating on a stale or overflowed context makes mistakes with high confidence.

Context window overflow: Long sessions accumulate tokens. When the context window fills, older tool results and state get dropped. The agent may proceed as if it still has information it no longer has — particularly dangerous when earlier messages established scope or safety constraints. Use /compact (Claude Code) before overflow happens, not after.

State staleness: The agent's model of the filesystem diverges from reality. It writes a file, another process modifies it, the agent reads from a stale mental model. Multi-agent setups amplify this — a community thread on parallel agents documented agents continuously asking "did you know this happened?" because neither knew what the other had modified.

Scope drift: Without explicit re-anchoring, agents expand their interpretation of scope across turns. "Fix the auth bug" becomes "refactor the entire auth module" by turn 10. A structured reasoning gate at context boundaries — similar to the meta-cognition pattern — forces the agent to re-state its current understanding of scope before continuing a long session.

Pillar 5: Subagent Delegation — Authority Inheritance and the Handoff Problem

When an agent spawns a subagent, a critical question arises: what does the subagent inherit? In most current implementations, the answer is: everything. A subagent runs with the same permission mode, the same credential access, and the same filesystem scope as the parent. This is wrong by default.

A subagent delegated to "write unit tests for this module" should not inherit permission to modify core application files or make network calls. The right architecture defines an explicit authority contract at delegation time:

{
  "scope": "test/**",
  "allowed_tools": ["Read", "Write"],
  "disallowed_tools": ["Bash", "WebFetch"],
  "max_turns": 20,
  "parent_session_id": "abc123"
}

Most current frameworks don't enforce this contract natively. You implement it by wrapping subagent invocations in a harness that applies a tighter settings.json before launch.

The emerging pattern, from tools like Loopi and Lazyagent, is to enforce stage gates across agent boundaries: Plan → Implement → Review, where each stage uses a different model or CLI so that no single agent self-approves its own output. Loopi explicitly chains different CLIs to force agents to critique each other rather than rubber-stamp their own work.

Where Each Layer Fails: A Failure Mode Map

Layer	What It Protects	Where It Fails
Approval modes	Default execution policy	`--dangerously-skip-permissions` removes all gates; mode affects agent behavior too
Hooks (PreToolUse)	Accidental destructive calls	Bypassed by indirect access, subprocess chains, MCP tools not covered by matcher
Sandboxing	Credential and filesystem isolation	Network egress for agent API calls creates unavoidable outbound access
Context management	Scope drift and stale state	Silent — context overflow has no runtime error; state staleness is invisible
Subagent delegation	Authority inheritance	Implicit inheritance in most frameworks; no native enforcement of scoped contracts

The pattern across all five layers: controls that run inside the agent process can be navigated by the model. Controls that run outside the process — a remote approval surface, a container enforcing filesystem limits, a credential broker the agent never sees — are the ones that hold under pressure.

Practical patterns for agentic AI architectures from AWS re:Invent 2025 identified the same principle: the most robust controls are the ones that don't require the model's cooperation to be effective.

How to Verify Your Permission Layer Is Working

Test bypass paths, not just the happy path. Write a test case that attempts to access a protected resource indirectly — via a subprocess, a multi-step file chain, or an MCP tool. If your hook blocks Write .env but doesn't block Bash cat .env, you have a gap.

Audit post-run tool logs. Claude Code logs every tool call to ~/.claude/projects/<encoded-cwd>/<session-id>.jsonl. Parse these after a session to confirm the agent didn't drift outside its assigned scope.

Watch for context size warnings. Treat these as operational signals, not UI noise. A session approaching context capacity is a session whose constraints may already be degraded.

Run a credential probe. Grant the agent a fake credential with a recognizable string. Run a session that doesn't obviously require it. Verify the string doesn't appear in any tool input or output in the session log.

Troubleshooting Common Failures

"The agent keeps asking permission for basic commands."
Your hook matcher is too broad. Bash matching * catches every subprocess call. Tighten the matcher to the specific command patterns you want to gate — rm, git push, destructive filesystem operations — and allowlist the rest.

"Hooks aren't firing at all."
Verify the hook config is in the right scope: ~/.claude/settings.json for global, .claude/settings.json for project-local. Confirm the command path is absolute. Hook invocation failures are silent by default — add logging to your hook script.

"The agent completed the task but touched files it shouldn't have."
This is scope drift, not a permission failure. Add an explicit scope declaration to the system prompt and a meta-cognition gate requiring the agent to re-state its scope before each write to core files.

"My .env values appeared in a tool call despite a hook protecting the file."
This is the documented bypass pattern. The hook protects writes, not reads, subprocess access, or MCP tool calls. The fix is not a better hook — it's an opaque credential broker so the agent never receives the actual secret value in the first place.

How Grass Completes the Permission Layer

The five pillars above describe what you need to build. Grass provides the layer that sits above all of them: a human-approval surface that the model itself cannot bypass, accessible from anywhere.

The fundamental limit of in-process permission enforcement is that it depends on the agent process respecting its own constraints. A remote approval surface operates out-of-band: when Grass forwards a permission request to your phone, the agent is blocked at the server level until a human responds. There is no bypass vector because the gate is not inside the model's execution context — it's downstream of all hook processing, enforced at the transport layer before the response returns to the agent.

Handling permission requests from your phone in Grass works like this: when the agent hits a tool invocation that requires approval, the Grass server intercepts the permission_request event, sends a push notification to the mobile app, displays the tool name and a syntax-highlighted preview of the exact input, and waits. You tap Allow or Deny. The decision is forwarded back through the SSE stream. The agent continues or stops.

This matters in three specific cases where the in-process layers fail:

Late-night destructive operations. Your agent is running an overnight task and hits a bash command that would delete a directory. A hook might catch it — or might not, depending on matcher coverage. Grass catches it regardless, because it's enforced outside the agent process at the server boundary. You see the request on your phone, evaluate context, and decide.

Unexpected credential-adjacent access. Even with an opaque token broker in place, unexpected tool calls that shouldn't require credential access should trigger a human review. Grass surfaces these in real time rather than leaving them to be discovered in post-run logs.

Multi-agent handoff approvals. Grass's /permissions/events SSE endpoint provides a global view of all pending permissions across every active session simultaneously — useful for building a dashboard that shows every agent awaiting approval without requiring you to poll individual sessions. For teams running parallel agents, this is the operational layer described in how to manage multiple coding agents from a single mobile interface.

Setup takes under five minutes: npm install -g @grass-ai/ide, then grass start in your project directory. Scan the QR code. Every permission request from Claude Code or OpenCode flows to your phone for the lifetime of the session — no cloud relay, direct WiFi connection, sessions survive disconnects.

For long-running or overnight agent tasks where you want the full always-on setup — agent keeps running even when your laptop sleeps — Grass's cloud VM product at codeongrass.com extends the same permission forwarding to a persistent Daytona-backed environment.

FAQ

What is an agent permission layer?
An agent permission layer is the set of mechanisms that control what an AI coding agent can read, write, execute, or communicate — and who grants or denies those capabilities at runtime. It has five architectural components: approval modes (default policy), hooks (inline gates on tool calls), sandboxing (physical isolation of sensitive resources), context management (what the agent knows and when), and subagent delegation (what spawned agents inherit from the parent).

Why do PreToolUse hooks fail to protect .env files?
PreToolUse hooks fire on specific tool names. A hook blocking Write .env will not block a Bash call running cat .env, an MCP tool reading environment variables, or a multi-step sequence where no single call looks dangerous in isolation. The documented bypass PoC showed this is reproducible even with comprehensive hook coverage. The correct fix is to combine hooks with credential isolation (opaque token brokers) so the agent never receives actual secret values, not to add more hook patterns.

What does "blast radius" mean in the context of AI coding agents?
Blast radius refers to the scope of harm if an agent's action goes wrong — how many files it touches, whether it modifies shared infrastructure, whether it exposes credentials. Mapping blast radius before destructive operations (the meta-cognition gate pattern) forces the agent to emit an explicit account of impact scope before executing, making silent scope expansion visible.

What is the difference between --dangerously-skip-permissions and default mode?
In default mode, Claude Code pauses before destructive tool use and waits for human confirmation. --dangerously-skip-permissions removes all approval gates — every tool call executes without prompting. Beyond the security difference, community findings suggest the agent also behaves more aggressively in full-trust mode, making the risk asymmetric: you lose the gate and get a more expansive agent.

How do I prevent a coding agent from accessing credentials it shouldn't have?
The strongest pattern is the opaque token broker: the agent receives capability handles, not actual credential strings. A broker resolves the handle to the real credential at execution time, runs the operation, and returns only the result. The agent never has the underlying token. Combined with container-level filesystem isolation (as in devcontainer-mcp), this removes the credential exfiltration surface that hook-based controls leave open.

Next steps: Start with Pillar 1 — define your approval policy explicitly before writing any hooks. If you're running Claude Code today, Getting Started with Grass in 5 Minutes gets you the remote approval surface that makes interactive mode operationally sustainable — including for long sessions where you're not at your desk.

Originally published at codeongrass.com

Top comments (1)

Mehmet Can Farsak • Jun 14

You're right that hooks alone aren't enough — they're necessary but bypassable, which is why the enforcement layer matters. I built Brainstorm-Mode (mehmetcanfarsak on GitHub) that uses hooks as the first line of defense to keep agents in ideation mode. It's the behavioral equivalent of your approval modes: divergent, actionable, and academic modes that gate what the agent can do during brainstorming. Works as a pre-tool-use hook, but you're absolutely right it needs to sit inside a larger permission architecture.