Every week there is another story about an AI agent doing something it should not have done. A customer service bot that agreed to a refund policy that did not exist. An internal tool that deleted the wrong records. A coding assistant that pushed breaking changes to production.
The post-mortems almost always blame the model. The model hallucinated. The model misunderstood. The model got confused.
That is the wrong diagnosis.
The Real Failure Is Scope, Not Intelligence
When an AI agent takes a harmful action, the question is not why the model got it wrong. Models get things wrong. That is a known, expected property of the technology.
The question is: why did the agent have permission to execute that action without a review step?
Think about how you would onboard a new human employee. You would not give them root access to production on day one regardless of how smart they are. You would not let them send customer-facing communications without review for the first month. You would not authorize them to change pricing or delete records without a second sign-off.
That constraint is not about doubting their intelligence. It is about matching their permission scope to the maturity of your trust in their judgment.
AI agents need the same framework.
Two Categories, Not One
There is a fundamental distinction that most teams skip:
AI-Assisted: The model recommends. A human reviews. A human acts.
AI-Executed: The model acts. A human audits afterward.
Both are valid. Both have use cases. But they carry very different risk profiles, and the failure most teams make is treating an assisted workflow like an executed one without explicitly deciding to do so.
For AI-Assisted, a mistake surfaces before it causes damage. A human reviews the recommendation and catches the hallucination, the misunderstood context, the edge case the model did not handle.
For AI-Executed, a mistake propagates in real-time. By the time the audit happens, the action is done.
Matching Scope to Maturity
The question is not whether to use AI-Executed workflows. It is when you have earned the right to deploy them.
A useful framework: before any agent gets execution rights over a system, it should have demonstrated consistent, correct behavior in assisted mode for a meaningful sample size.
For a workflow that sends a customer email, maybe 200 assisted reviews with zero critical errors before you remove the human in the loop.
For a workflow that modifies database records or makes financial transactions, the bar should be higher, and you should probably never fully remove the human review layer.
What This Looks Like in Practice
When we build AI systems for clients, the deployment pattern we use:
Phase 1 (Weeks 1-4): Full assisted mode. Every recommendation is logged, reviewed by a human, and acted on manually if approved. The agent has zero execution rights.
Phase 2 (Weeks 5-8): Selective execution on low-stakes, reversible actions only. High-stakes actions stay in assisted mode. Audit everything.
Phase 3 (Ongoing): Expand execution scope incrementally based on error rate data from earlier phases.
The Asymmetry
The cost of being too conservative is measured in inconvenience and slower deployment. The cost of being too permissive is measured in incidents, eroded trust, and compliance failures.
The businesses using AI agents successfully in production almost universally went through a conservative early phase. They started narrow, built evidence, and expanded scope deliberately.
The incidents you hear about almost always trace back to someone skipping that phase.
AI agents need the same thing every powerful system needs: earned permission scope, matched to demonstrated reliability.
Othex Corp builds AI agents and automation for mid-market businesses. othexcorp.com
Top comments (1)
The new-employee onboarding analogy is the best framing I've seen for this. You wouldn't give a new hire root access on day one regardless of their credentials — so why do we give AI agents unbounded action spaces from the start?
This is especially timely. Just this week Meta had an AI agent post on an internal forum without the engineer's permission, exposing sensitive data (Sev 1 incident). And Summer Yue — a safety director at Meta — had her own agent delete her entire inbox despite explicitly telling it to confirm before acting. The instruction-following gap between "understood the constraint" and "reliably enforced the constraint across all execution paths" is exactly the permission scope problem you're describing.
Your AI-Assisted vs AI-Executed distinction is critical. I'd add a third category: AI-Contained — where the agent operates freely but within a structurally limited action space. Think capability-based security: the agent can do anything within its sandbox, but the sandbox itself is enforced at the infrastructure level, not the prompt level.