DEV Community: Brian Hall

How to stop an AI agent from burning $47,000 in a loop nobody noticed.

Brian Hall — Tue, 23 Jun 2026 15:21:56 +0000

A multi-agent research system sat in production for eleven days doing exactly what it was built to do. Four agents, LangChain-style, coordinating over A2A to pull market data and summarize it. Every health check passed the whole time. No crash, no 500, no timeout... from the outside the system was perfectly healthy.

Two of the four agents had quietly locked into a recursive loop, passing clarification requests and verification instructions back and forth, thousands of times, around the clock. Nobody noticed because nothing was technically wrong. The thing that finally caught it was a person opening the invoice and asking why the number was so high. The number was $47,000.

That story has been making the rounds because it's so relatable, but it isn't a one-off. Uber said it burned through its entire 2026 AI coding budget in four months. One company reportedly ran up a $500M Claude bill after rolling out access with no usage caps. The FinOps Foundation said that around April, the conversation across the industry flipped from "go fast" to "we need guardrails, how do we control this." This is the dominant operational failure mode in production agents right now, and it's barely about the model at all.

Why this keeps happening

The reason isn't that people are careless. It's that the spend cap, when it exists at all, lives in the wrong place.

Most teams "control" cost with monitoring. A billing alert. A dashboard. Maybe a daily spend report. But every one of those is a postmortem. By the time the alert fires, the money is already gone. The $47K loop didn't trip anything because the team was watching user metrics, signups, queries completed, response quality, not per-agent spend in real time. The bill was a monthly line item, not a guardrail.

And the other common fix, putting a budget check inside the agent, has its own problem: the agent is the thing that's misbehaving. A loop that's lost the plot isn't going to cleanly evaluate its own "am I allowed to keep going" check. You're asking the runaway process to stop itself. Sometimes it does. The eleven-day ones are when it doesn't.

There's also the part nobody likes to admit: nothing here was broken. The agents were "working." That's exactly why it ran for eleven days. A failure that looks like normal operation won't get caught by anything watching for failures.

The fix is a hard ceiling that lives outside the agent

The cap has to sit somewhere the loop can't reach, and it has to fire before the call, not after the invoice. That means a budget that's enforced on the tool call itself, not advice in a prompt and not an alert after the fact.

This is the part I work on, so here's how we do it in Faramesh. Your whole policy lives in one file, and a budget is a few lines:

agent "research-crew/analysis" {
  default deny

  rules {
    permit market_data_lookup
    permit summarize
  }

  rate_limit "market_data_lookup": 30 per minute

  budget daily {
    max       $50
    on_exceed defer
  }
}

Two things are doing the work. The rate_limit caps how fast a single tool can be hit, so an agent stuck in a tight loop can't fire the same call hundreds of times a minute. The budget block puts a hard daily ceiling on spend, and on_exceed defer means that when the ceiling is hit, the next call doesn't run, it pauses and waits for a human. The loop stops at the dollar, not at the invoice.

The important part is where this runs. Faramesh sits between the agent and its tools as a local daemon, and every tool call goes through it before it executes. The decision is deterministic, there's no model in that path, so the same spend under the same policy always gets the same answer. The runaway agent can't talk its way past it, because the check isn't inside the agent. It's a wall in front of the tool.

On the $47K incident specifically: a daily cap would have turned an eleven-day, $47,000 silent loop into a one-day, $50 pause and a notification. Same loop. Same bug. Wildly different outcome, because the ceiling didn't depend on anyone noticing.

The takeaway

Runaway cost isn't really a model problem and it isn't bad luck. It's an architecture problem. The spend limit is usually in a place the spend can route around, inside the agent, or after the fact in a billing alert. Move it outside the agent and in front of the action, and the worst case stops being "we found out when the invoice came" and starts being "it paused and pinged us."

Faramesh is open source. If you want to put a hard ceiling in front of your agents, the repo's at github.com/faramesh/faramesh-core. If you wire it up and something's off, tell me, that's all useful feedback right now.

Put a hard stop in front of your CrewAI crew's tool calls

Brian Hall — Mon, 22 Jun 2026 14:31:24 +0000

CrewAI makes it easy to stand up a crew. You give a few agents roles, hand them tools, let them delegate work to each other, and the thing mostly runs itself. That autonomy is the appeal. It's also the problem. Once a crew is moving, every agent in it can reach for a tool, and there's nothing between the model deciding to call something and the call actually happening.

The usual fix is a careful prompt and crossed fingers. Or a second LLM that "reviews" the action, which is hoping with extra latency. I wanted a check that doesn't depend on a model being in a good mood: something deterministic that runs before the tool call fires and gives a real answer. Allow it, hold it for a human, or block it.

That's what Faramesh does. It's open source, it works with CrewAI through a one-line wrapper, and this is the actual end-to-end setup, every command and every policy snippet pulled straight from how the tool really works.

The idea

A tool call is the moment an agent stops reasoning and starts doing. Reading a doc is one thing. Spending money, sending mail to a customer, or hitting a production API is another. Those are the moments worth putting a rule in front of.

Faramesh runs as a local daemon. Your whole policy lives in one file, governance.fms, and the daemon checks every tool call against it before the call runs. No LLM sits in that decision path, so the same action under the same policy always gets the same verdict. You get one of three:

permit the call runs
defer the call pauses and waits for a human to approve or reject
deny the call is blocked before it happens

The point is that it's deterministic. You can read the policy, reason about it, and know what it'll do. That's the whole difference from asking a second model to babysit the first one.

Install

Install the CLI:

curl -fsSL https://raw.githubusercontent.com/faramesh/faramesh-core/main/install.sh | bash
faramesh --version

Then add the SDK to your CrewAI project:

pip install faramesh-sdk crewai

Generate the policy

From the root of your project:

faramesh init

Faramesh inspects the repo, finds your framework, discovers your tools, and writes a starter governance.fms. The important part of the default: every discovered tool starts at defer. Nothing runs until you've reviewed it. That's the safe direction to fail.

Wire the crew

This is the only step that's CrewAI-specific, and it's small. You wrap each agent's tools in a GovernedToolSet and give that set an identity. Here's a crew before:

from crewai import Agent, Crew, Task
from crewai_tools import SerperDevTool, BraveSearchTool

researcher = Agent(
    role="researcher",
    tools=[SerperDevTool(), BraveSearchTool()],
)
writer = Agent(role="writer", tools=[])

crew = Crew(agents=[researcher, writer], tasks=[...])

And after:

from faramesh import GovernedToolSet
from crewai import Agent, Crew, Task

researcher_tools = GovernedToolSet(
    [SerperDevTool(), BraveSearchTool()],
    agent_id="research-crew/researcher",
)

researcher = Agent(role="researcher", tools=researcher_tools)
writer     = Agent(role="writer",     tools=[])

crew = Crew(agents=[researcher, writer], tasks=[...])

That's the whole integration. Use one GovernedToolSet per agent so each crew member gets its own identity in the policy. That's what lets you give the researcher and the writer different rules, which matters more than it sounds like, since in a crew the agents have genuinely different jobs and should have genuinely different permissions.

Write the rules, per role

Open governance.fms. Because each agent has its own id, you write a policy block per role. Here the researcher can search but nothing else, and the writer can't touch tools at all:

import "github.com/faramesh/faramesh-registry/frameworks/crewai@1.0.0"

agent "research-crew/researcher" {
  default deny

  rules {
    permit serper_search
    permit brave_search
  }

  rate_limit "*_search": 30 per minute

  budget daily {
    max       $20
    on_exceed defer
  }
}

agent "research-crew/writer" {
  default deny
  rules { }
}

A few things worth reading off that:

default deny means anything not explicitly allowed gets blocked. You opt tools in, you don't opt them out. Rules are checked top to bottom and the first match wins.

The rate_limit line caps both search tools at 30 calls a minute, so a confused agent can't hammer an API in a loop. The budget block puts a daily ceiling on spend and, when it's hit, defers instead of denying, the work pauses for a human rather than just dying. The writer's empty rule block plus default deny means it has no tool access at all, which is exactly right for an agent whose job is to write, not act.

Validate before you ship anything:

faramesh check
faramesh plan

check parses and type-checks the file. plan prints the exact decision diff, so you can see what changes before it's live.

Apply and run

Turn on enforcement and run the crew normally:

faramesh apply
python my_crew.py

A permit returns the tool result like nothing's there. A defer returns a structured response telling the agent its action is pending approval, the crew doesn't crash, the call just doesn't go through yet. You watch and clear the queue from another terminal:

faramesh approvals list
faramesh approvals approve apr-9001

Once approved, the agent's next attempt goes through. Or, if you've decided it should always be allowed, promote the rule to permit in the file and run faramesh apply again. One thing to know: apply is the only way to change the running policy. There's no quiet hot-reload where editing a file changes what your crew can do mid-run. You edit, you apply. It's deliberate on purpose.

Crews delegate, so the policy understands delegation

The thing that makes CrewAI CrewAI is agents handing work to each other. Faramesh models that directly. If your researcher delegates to your writer, you can bound what that delegation is allowed to carry:

agent "research-crew/researcher" {
  delegate {
    target_agent = "research-crew/writer"
    scope        = "read-only"
    ttl          = "5m"
  }
}

The daemon validates delegation against the crew's actual structure at runtime, so one agent can't quietly hand another a capability it wasn't granted. That's a failure mode specific to multi-agent setups, and it's nice to have it covered in the same file as everything else.

Why bother

Every decision the daemon makes also lands in a tamper-evident log you can verify offline with faramesh audit verify. That matters the day someone asks what your crew actually did and "I think the prompt told it not to" isn't a good enough answer.

None of this makes your agents smarter. It means the moments that carry real risk go through a deterministic rule you wrote and can read, instead of through luck. For a single agent that's useful. For a crew, where several agents are acting and delegating at once, it's the difference between a demo and something you'd leave running.

Faramesh is open source. The repo is at github.com/faramesh/faramesh-core if you want to poke around or break it. If you wire it into a crew and something's off, tell me. That's all super helpful feedback at this point.

Don't use an LLM to decide what your AI agent is allowed to do

Brian Hall — Sun, 21 Jun 2026 16:15:31 +0000

I'm in a group called AARM. It's a bunch of people trying to work out how you actually secure what an AI agent can do once it's running, and the basic idea is that the control has to sit right at the action. You check a tool call before it runs, and the agent can't wriggle around the check. So everyone in there already agrees that telling an agent "please don't" isn't a security model.

What gets me is that even in that room, I keep seeing people reach for an LLM to be the thing that makes the call. The agent goes to do something, you take that action and hand it to a second model, ask it whether it's fine, and whatever it answers is what happens. A model watching the model. I don't really get it, and I want to walk through why, because I think people lean on this without sitting with what it actually buys them.

What you're actually defending against

Go back to why you want a guard on the agent in the first place. It's there because the agent can be talked into things. Some prompt injection sitting in a page it reads, a tool result that quietly hands it a new instruction, a user who words a request just so. The agent is a thing you can reason with, and the worry is that the wrong person reasons with it.

Now look at what the LLM-judge setup does about that. It puts a second thing you can reason with in front of the first one. That's the part I get stuck on, because it's the same weakness wearing a different hat. If somebody can craft input that bends the agent, there's a real chance the same sort of input bends the judge too, since under the hood it's the same kind of system responding to the same kind of pressure.

Maybe it holds. Maybe you've prompted the judge more carefully and it's tougher to push around. But "harder to talk into it" is a strange thing to be resting on when not getting talked into it is the entire job you hired this layer for.

Same question, different answer

There's a second problem and in day to day terms it's the one that actually bugs me. You can ask a model the same question twice and get two different answers. That isn't a bug you patch out, it's just what the thing is. It's sampling. It isn't a function that hands back the same output every time you give it the same input.

For most of what we build, that's completely fine, and honestly it's part of why models are useful. But once the question is something like whether the agent gets to drop the production database, that property turns into a real liability. The same action can get waved through on Tuesday and stopped on Wednesday, and there's no reason you can actually point at, because there isn't one. There's just a different roll of the dice. Good luck writing that up for an auditor, or explaining it to yourself at two in the morning when you're trying to figure out how something got through that shouldn't have.

A rule doesn't behave that way. deny delete on production means the production database does not get deleted, every single time, no exceptions. You can read the rule, you can test it, you can pull up the log six months later and see exactly what got asked and what came back. The decision is something you can actually stand behind, which is the whole reason it can be the part you trust.

This isn't an argument against LLMs

I want to be careful here, because it's easy to take this too far, and the version where models have no place anywhere near security is also wrong.

Models are great at a lot of this. Looking at an action and noticing something's off about it. Telling you a piece of text is sensitive. Putting a rough score on how risky something seems. Picking up on a pattern across a string of calls that no fixed rule was ever going to catch. That's all real, and for a lot of it a model is the best tool you've got. The issue was never an LLM being near the security boundary. It's the LLM being the boundary, the thing that says the final yes or no.

So where I land is layered. Let the model do the soft work it's genuinely good at, watching for the weird thing, flagging it, telling you to go take a look. Just don't let it be what opens the gate. The actual call on whether a real action runs has to sit on something that gives the same answer every time and can show its work afterward. The model can feed into that all it wants. It just can't be the thing that decides.

Where it actually bites

The closer your agent gets to anything that matters, money, prod, customer data, the less theoretical any of this is. If the worst it can do is write a bad paragraph, then fine, none of this is worth losing sleep over and you should go do something more useful with your afternoon. But the moment it can move money or drop a table, what's allowed to run can't come down to a coin flip, and it really can't live inside the same kind of system you were trying to protect yourself from to begin with.

Put the smart, context-aware stuff where it's strong, which is noticing when something's wrong. Put the hard line somewhere the agent can't talk its way past.

That last part is the thinking behind Faramesh, the open source thing I've been building. The permit/deny/defer decision is deterministic, no model sitting in that path, and every call lands in a signed log. But the tool is kind of beside the point. Even if you go build your own version of this, keep the final decision off the model. That piece should be boring on purpose.

Your coding agent will route around your rules. Here's how to actually stop it.

Brian Hall — Thu, 18 Jun 2026 21:09:16 +0000

Here's a thing that happened to a developer I was talking to recently, and I think anyone who's used a coding agent is going to recognize it.

He set up a rule to block rm in his Claude Code workspace, which is a pretty reasonable thing to do. Then he asked it to clean up some stale files, and it tried rm, hit the block, and then just went "since rm is blocked, I'll use Python instead" and deleted them with python3 -c "import os; os.remove(...)". Task complete. The rule was technically still there, but the files were still gone.

The thing is, the agent wasn't being malicious or sneaky. It was being helpful. You told it to delete the files and you didn't actually take away the goal, so it found the next tool in the box and got it done. This is basically the whole problem with trying to keep coding agents in line. A rule that lives inside the agent's context is a suggestion, and the agent can always reason its way around a suggestion.

Why blocking commands doesn't work

The natural instinct is to block the specific scary thing. No rm, no git push --force, no curl to some host you don't recognize. But an agent that can actually reason has more than one way to get anywhere. You block rm, it reaches for Python. You block the obvious shell call, it writes a little script that does the same thing. You end up playing whack-a-mole against something that's much better at finding paths than you are at blocking them, because finding the path is the whole thing it's good at.

The deeper issue is where the rule lives. If it's in the prompt or a config the agent can see, it's part of the agent's reasoning, and anything the agent reasons about, it can reason around. What you actually want is a check that sits outside the agent entirely, somewhere it can't see or skip, that every tool call has to physically pass through before it runs.

How I set this up with Faramesh

Faramesh is the open source thing I've been building for exactly this. The key idea for Claude Code is that you don't modify the agent at all. Claude Code talks to its tools over MCP, so Faramesh runs an MCP proxy: a local port that speaks the same protocol, sits between Claude Code and the real MCP server, and evaluates every tool call against your policy before forwarding it. Permit, deny, or defer to a human, decided by a deterministic engine with no LLM in the path.

The reason this matters: because it's a proxy the agent connects through, not a rule the agent reads, it isn't something Claude Code can route around. The call physically has to go through the daemon to reach the tool. That's the difference between asking the agent not to do something and actually being in the path when it tries.

Here's the whole setup.

Install

curl -fsSL https://install.faramesh.dev/install.sh | bash
faramesh --version

Declare the policy and the proxy port

In your project, your governance.fms looks roughly like this. You import the MCP framework profile, set a proxy port, and write your rules:

import "github.com/faramesh/faramesh-registry/frameworks/mcp@1.0.0"

runtime {
  mode           = "enforce"
  mcp_proxy_port = 8081
}

agent "coding-agent" {
  default deny

  rules {
    permit fs_read          # reading files is fine
    permit search_codebase  # searching the repo is fine
    permit run_tests

    defer  fs_write         # writing/editing files -> ask me first
    deny   shell_exec       # raw shell stays off
  }
}

A couple of things worth knowing. default deny means anything you didn't explicitly allow is blocked, so a tool you forgot about can't quietly slip through. And the tool names (fs_read, fs_write, shell_exec, etc.) are whatever your MCP server actually exposes, you reference them exactly as the server names them. Swap these for the tools your setup actually has.

Start Faramesh

faramesh apply

This compiles your policy and starts the daemon. The proxy binds on http://localhost:8081/mcp.

Point Claude Code at the proxy

In your Claude Code MCP config, route your tool server through Faramesh instead of connecting to it directly:

{
  "mcpServers": {
    "my-tools": {
      "command": "/path/to/real-mcp-server",
      "args": [],
      "proxy": "http://localhost:8081"
    }
  }
}

That's the whole integration. No code changes, no wrapping tools by hand. Every tool Claude Code calls now passes through Faramesh first.

How the workaround dies

Now go back to the rm -> python3 story. With this in place, the agent doesn't get a free pass to the filesystem just because it found a different command. Everything routes through the proxy, and default deny means the only things that run without asking are the ones you explicitly permitted (reads, search, tests). The moment it reaches for a write or a shell call, that lands on a defer or a deny, so it stops and waits for you instead of quietly running. The agent can't reason its way around a network hop it doesn't control.

When something defers, you'll see it in the approvals queue:

faramesh approvals list
faramesh approvals approve <id>   # or: faramesh approvals deny <id>

Approve and the call goes through. Deny and it never happens. Either way the call, the decision, and the reason all land in an audit log you can read back later with faramesh explain <action-id>.

Start in shadow mode if you want to ease in

Flipping straight to enforce on your daily driver can feel aggressive, so you don't have to. Set the runtime mode to shadow and Faramesh logs what it would have blocked or deferred without actually stopping anything. Run Claude Code normally for a few days, look at what it flagged with faramesh approvals list, tune the rules against how you actually work, then switch to enforce. Way less guessing.

The one thing worth taking from this even if you never touch Faramesh

Forget the tool for a sec. The thing I actually want to get across is that a prompt instruction, or a single blocked command, just isn't a real control for a coding agent. The agent isn't bound by it, it's nudged by it, and nudged stops being enough the moment it can touch your filesystem, your shell, or your credentials.

If you want real control it has to live outside the agent, somewhere it can't see or skip, and every action has to pass through it. Build that yourself or grab something off the shelf, doesn't matter, but that's the bar. The agent doesn't get to be the thing that decides what the agent is allowed to do.

Repo's here if you want to mess with it: github.com/faramesh/faramesh-core. It works with a bunch of other agents and frameworks too (LangGraph, LangChain, CrewAI, Cursor, others), Claude Code's just the one most people have actually felt this with. If you try it and something's rough or confusing, please yell at me. I would love to hear about it!

How to add policy enforcement to a LangGraph agent (before it does something dumb)

Brian Hall — Wed, 17 Jun 2026 15:20:14 +0000

If you've built anything with LangGraph past the demo stage, you've probably had the same uneasy moment I did. The agent works, it's calling tools, it's doing real things, and then you realize the only thing stopping it from doing the wrong real thing is a line in the prompt that says "please don't."

A prompt isn't a control. The agent can be talked into ignoring it, some upstream input can steer it somewhere you didn't expect, and either way the tool call just runs. Once that tool call can move money, hit prod, or touch customer data, "the model seemed confident" isn't where you want your safety to live.

So here's how to put a real check in front of the tool call instead. I'll use Faramesh, the open source thing I've been building for exactly this. It's a local daemon that sits in front of your agent's tool calls and returns permit / deny / defer based on a policy you write. No LLM in the decision path, so the same call always gets the same answer.

The whole thing takes about 10 minutes. Every command below is copy-pasteable. I'll be clear about the one or two spots where you swap in your own stuff.

How it works in one picture

Your agent tries to run a tool. Before it actually runs, the call hits Faramesh, which checks it against your policy:

permit -> runs normally
deny -> blocked, the agent never gets to run it
defer -> paused and sent to a human to approve or reject You write that policy in a single file called governance.fms. That file is the heart of Faramesh. It's the one place that defines what your agents are allowed to do, you commit it to your repo like any other code, and the daemon enforces whatever's in it.

Step 1: install

curl -fsSL https://install.faramesh.dev/install.sh | bash
faramesh --version

If faramesh --version prints a version number, you're good.

Step 2: let it generate your governance.fms

From the root of your agent project, run:

faramesh init

This detects your framework and the tools your agent uses, and writes a starter governance.fms for you. You don't have to write it from scratch. Open it up and it'll look something like this:

runtime {
  mode    = "enforce"
  wal_dir = "./wal"
}

agent "langgraph-agent" {
  default deny

  rules {
    permit http/get
    permit crm/read
    defer  payment/refund   reason: "refund needs a human"
    deny   billing/delete_account
  }
}

Here's how to read that, because this is the part that actually matters:

mode = "enforce" means decisions are live. (There's also a shadow mode if you just want to watch what would happen first, more on that at the end.)
default deny means anything you don't explicitly allow is blocked. So a tool you forgot about can't quietly slip through. This is the safe default and I'd leave it.
Each line under rules is a decision. permit lets it run, deny blocks it outright, defer pauses it for a human. This is the file you edit. The tool names (http/get, payment/refund, etc.) match the names of your actual tools, so swap these for whatever your agent actually does. The rule of thumb: permit the safe reads, defer anything risky or irreversible (payments, deletes, external emails), deny the stuff that should never happen automatically.

Step 3: name your tools to match

Faramesh checks tool calls by name, so your LangGraph tools just need names that line up with your policy scopes. In LangChain/LangGraph that's the first argument to @tool:

from langchain_core.tools import tool

@tool("http/get")
def http_get(url: str) -> str:
    return fetch(url)

@tool("payment/refund")
def payment_refund(amount: int) -> str:
    return issue_refund(amount)

So @tool("payment/refund") is the thing the defer payment/refund line in your policy is talking about. Keep the names consistent and you're done here.

Step 4: turn on interception

Add these two lines near the top of your agent script, before you build your graph:

from faramesh.adapters.langchain import install_langchain_interceptor

install_langchain_interceptor(include_langgraph=True, fail_open=False)

That's the whole integration. You don't rewrite your ToolNode and you don't wrap every tool by hand, it patches LangGraph's execution path so every tool call gets checked.

One flag to understand: fail_open=False. It means if the daemon ever errors or can't reach a decision, the call is denied, not waved through. You want enforcement to fail closed, if something breaks, the safe move is to not run the action.

Step 5: run it

Run your agent under governance with dev, which enforces your policy locally while you're still testing:

faramesh dev

Then run your agent as you normally would (in another terminal or however you launch it). Every tool call now routes through Faramesh.

When you're happy with how it behaves and want full enforcement, switch on:

faramesh apply

faramesh apply compiles your governance.fms and starts the daemon in full enforce mode.

Now watch what happens. When your agent calls http/get, it just runs. When it calls payment/refund, it doesn't, it pauses and waits, because you set that to defer. You'll get a pending approval. List and resolve it like this:

faramesh approvals list
faramesh approvals approve <id>   # or: faramesh approvals deny <id>

Approve, and the original call resolves and runs. Deny, and it never happens. Either way the call, the decision, and the reason all land in an audit log you can read back later with faramesh explain <action-id>.

If you want to test before you enforce

Flipping straight to enforce on a live agent is nerve-wracking, so you don't have to. Set the runtime to shadow mode (or run faramesh dev) and Faramesh will log what it would have blocked or deferred without actually stopping anything. You watch the decisions against real traffic, tune your rules until they're right, then switch to enforce. Way less scary than guessing.

Why deterministic, instead of "ask another LLM"

There's a popular pattern where you put a second LLM in front of the first one to judge whether an action is safe. I think that's the wrong bet for enforcement. The thing you're worried about is your agent getting manipulated into a bad action. If your guard is also an LLM, it can be manipulated too. You're using a promptable thing to protect a promptable thing.

A rule engine doesn't have that problem. deny billing/delete_account means the account does not get deleted. Same input, same answer, every time, and you can hand the log to an auditor without shrugging. The agent doesn't get the final say on what it's allowed to do, which, once it's touching real systems, is sort of the entire point.

Repo's here if you want to try it or dig into how it works: github.com/faramesh/faramesh-core. It works with a bunch of other frameworks too (LangChain, CrewAI, AutoGen, MCP, others), LangGraph's just what I used here. If you try it and something's confusing or broken, I'd love to hear it, that feedback is what's making it better right now :)

Your agent's guardrails are suggestions, not enforcement

Brian Hall — Wed, 01 Apr 2026 21:16:22 +0000

Yesterday, Anthropic's Claude Code source code leaked. The entire safety system for dangerous cybersecurity work turned out to be a single text file with one instruction: "Be careful not to introduce security vulnerabilities."

That is the safety layer at one of the most powerful AI companies in the world. Just a prompt asking the model nicely to behave.

This is not a shot at Anthropic. It is a symptom of something the whole industry is dealing with right now. We have confused guidance with enforcement, and as agents move into production, that distinction is starting to matter a lot.

Why prompt guardrails feel like they work

When you are building an agent in development, prompt-based guardrails seem totally reasonable. You write something like "never delete production data," the model follows it, and you ship it. It works.

The problem is that prompts are probabilistic. The model does not follow your instructions because it is enforced to. It follows them because that response is statistically likely given your system prompt, and that is a fundamentally different thing.

That gap is small in a controlled demo, but it widens under a few conditions that come up all the time in production.

Prompt injection happens when an attacker embeds instructions inside content your agent reads, whether that is a document, an email, or a database record. The injected instruction competes with your system prompt, and researchers have shown attack success rates exceeding 90% against production guardrail systems.

Multi-step reasoning is another problem. A prompt check happens at the input boundary, but agents do not operate at the input boundary. They reason across multiple steps, call tools, read results, and reason again. A message that looks completely clean at the first step can trigger a dangerous tool call three steps later that no classifier ever saw.

Model updates create a third issue. Your guardrail was tuned against one version of the model, and when the model updates, the probability distribution shifts. The guardrail that worked last month might not work the same way next month.

None of this is theoretical. The OWASP Agentic Top 10, published in late 2025, documents ten agent-specific attack categories that did not exist in the original LLM threat model, and most of them happen entirely outside the layer that prompt guardrails watch.

Where the gap actually lives

Here is what happens when a LangGraph agent calls a tool:

# The agent decides to call a tool
tool_call = {
    "name": "stripe/refund",
    "arguments": {"amount": 800, "customer_id": "cust_123"}
}

# The tool executes
result = stripe_refund(amount=800, customer_id="cust_123")

There is a moment between the agent deciding to call that tool and the tool actually running, and that moment is where enforcement has to happen. Not before the prompt, not after the response, but right there between intent and action.

Prompt guardrails do not live in that moment. They live before it, in the system prompt, where the model reads them and decides whether to follow them. If the model has been manipulated, or is just statistically unlikely to comply, nothing fires.

A runtime enforcement layer lives in that moment. It intercepts the tool call before it executes, checks it against policies defined in code, and makes a deterministic decision: permit, deny, or defer for human approval. The model does not get a vote.

What this looks like in practice

With Faramesh, you add one command to run your agent:

faramesh run agent.py

No SDK changes, no changes to your agent code. Faramesh wraps the execution layer and checks every tool call against your policies before anything runs.

Those policies are written in FPL, the Faramesh Policy Language. It is a domain-specific language built specifically for agent governance, with agent-native concepts as first-class primitives: sessions, delegation chains, budget limits, and human approval flows. Unlike YAML or OPA Rego, FPL is readable by engineers and non-engineers alike, and the same policy that takes 60+ lines of YAML takes around 50 lines of FPL with stronger guarantees.

A policy for a payment agent looks like this:

agent payment-bot {
  default deny
  model "gpt-4o"
  framework "langgraph"

  rules {
    deny! shell/* reason: "never shell"

    defer stripe/refund
      when amount > 500
      notify: "finance"
      reason: "high value refund"

    permit stripe/*
      when amount <= 500
  }
}

This is deterministic. It does not matter what the model was told or what was injected into the context. A refund over $500 gets deferred to a human every time, and a shell command gets blocked every time. That is what enforcement actually means.

The deny! effect is worth pointing out specifically. It is a compile-time guarantee, meaning no subsequent permit rule can override it, no child policy can override it, and the compiler verifies this structurally. It is not a runtime convention. It is a guarantee baked into the language itself.

Why this matters beyond security

This is not only a security problem. It is a deployment problem.

Right now, agents can assist with payments, infrastructure changes, customer operations, and credential management, but most teams will not deploy them autonomously into those workflows because there is no layer they can trust to keep things in bounds. Prompt guardrails create the appearance of control, while a runtime enforcement layer creates actual control, and that distinction is what unlocks the use cases that make agents genuinely valuable, not just as assistants, but as workers that can actually operate the systems that matter.

The Claude Code leak is a good reminder that even the companies building the most sophisticated AI in the world are still relying on text files to enforce safety boundaries. That is just where the industry is right now, and the enforcement layer that should exist is what we are building at Faramesh.

The core repo is open source at github.com/faramesh/faramesh-core. Would love to hear from anyone building agents in production.