Why AI agent teams are just hoping their agents behave

Cauã Ferraz — Tue, 31 Mar 2026 15:45:43 +0000

I'm 19, studying computer engineering in Brazil. A few weeks ago I was testing an AI agent with no restrictions. Just to see what it would do.

It was destructive.

Nothing permanent, I caught it. But it was the kind of moment where you sit back and think: what if I hadn't been watching? What if this was running in production? What if someone else's agent is doing this right now and nobody is watching?

That's when I realized the problem. Everyone is racing to give agents more tools, more autonomy, more access. But nobody is building the layer that controls what they can actually do with it. The assumption is that a good prompt is enough. It isn't.

The gap nobody is talking about

The AI agent space has exploded. LangChain, CrewAI, browser-use, OpenAI Agents SDK, the tooling for building agents has never been better. You can have an agent browsing the web, writing code, calling APIs, and moving files in an afternoon.

But here's what I couldn't find: a serious answer to "how do I control what my agent can actually do at runtime?"

The common answers I got:

"Write a good system prompt"
"Add some input validation"
"Just don't give it dangerous tools"

These are not answers. These are hopes dressed up as engineering.

A good system prompt doesn't stop an agent from being manipulated through prompt injection. Input validation doesn't catch an agent that decides rm -rf ./old_stuff is a reasonable interpretation of "clean up." And "don't give it dangerous tools" directly contradicts the reason you're using agents in the first place.

What actually needs to exist

The thing missing is embarrassingly simple: a policy layer that sits between your agent and the world.

Not prompt engineering. Not vibes. An actual enforcement layer that says:

This agent can read from ./workspace but cannot delete anything
This agent can call the OpenAI API but not your production database
This command requires a human to approve it before it executes
Everything gets logged, always

The goal isn't to babysit every action manually, that defeats the purpose of automation. The goal is to define the boundaries once, enforce them automatically, and only surface the genuinely ambiguous decisions to a human.

This is what firewalls do for networks. This is what WAFs do for web apps. Agents need the same thing, and almost nobody is building it.

So I built it

I built AgentGuard, an open source runtime firewall for AI agents.

It's a Go proxy that sits between your agent and its tools. You define policies in YAML. The proxy enforces them in real time, blocking, holding for approval, logging everything. It has adapters for LangChain, CrewAI, browser-use, and MCP. There's a dashboard that shows you live what your agents are doing and lets you approve or deny actions with one click.

It's not finished. The SQLite audit backend isn't done. Some adapters are still rough. But the core works, and I think the core is the right idea.

Caua-ferraz / AgentGuard

AgentGuard is a firewall for AI agents, preventing that any unwanted surprises go without supervision by your agent

The firewall for AI agents.
Policy enforcement, real-time oversight, and full audit logging for autonomous AI systems

Quickstart • Why AgentGuard • Architecture • Policy Engine • Dashboard • Adapters • Setup Guide • Contributing

The Problem

Every trending AI project is giving agents more autonomy — running shell commands, browsing the web, calling APIs, moving money, even performing penetration tests. But nobody is building the guardrails.

Right now, most teams deploying AI agents are just... hoping they behave.

AgentGuard fixes that.

Why AgentGuard

Without AgentGuard	With AgentGuard
Agent runs `rm -rf /` — you find out later	Policy blocks destructive commands before execution
Agent calls production API with no oversight	Action paused, you get a Slack/webhook notification to approve
No record of what the agent did or why	Full audit trail with timestamps, reasoning, and decisions
"It worked on my machine" debugging	Query any agent session from the audit

…

View on GitHub

In 5 days it's been cloned by 165 unique developers with almost no active distribution. I think that says something about how real this problem is.

The thing I keep thinking about

Only 14.4% of organizations send AI agents to production with full security approval. 88% reported confirmed or suspected AI agent security incidents last year.

Everyone is moving fast. Nobody is building the guardrails.

I don't know if AgentGuard is the right answer. But I'm pretty confident "hope" isn't.

DEV Community: Cauã Ferraz