Nearly 40% of enterprises have run an AI agent pilot in the last year. A small fraction of those pilots made it to production. The gap isn't a model problem — GPT-class and Claude-class models are more than capable of the reasoning involved. The gap is architectural.
The Failure Pattern
Teams keep making the same mistake: they take an existing, broken, fragmented process — the same one three different ticketing systems and two manual handoffs have been duct-taping together for years — and they bolt an agent on top of it.
The agent doesn't fix the fragmentation. It automates the fragmentation, faster and with less oversight than the humans who used to catch the edge cases.
Legacy Process (Broken) Legacy Process + Agent (Still Broken)
──────────────────────── ──────────────────────────────────────
Manual triage → 3 systems Agent triage → 3 systems
Human catches edge cases Nobody catches edge cases
Slow, but self-correcting Fast, and silently wrong
If the underlying workflow doesn't have clear inputs, clear ownership, and clear success criteria, wrapping it in an LLM doesn't add intelligence — it adds a probabilistic actor to an already unstable system.
The Fix: Redesign the Domain Before You Automate It
The teams that get agents into production aren't the ones with the best prompts. They're the ones who picked a tight, governed domain — IT Ops, Sales Ops, tier-1 support triage — and rebuilt the process boundaries before writing a single agent node.
That means:
- Explicit state, not implicit tribal knowledge. Every input and output of the workflow is defined as a schema, not a Slack thread someone remembers.
- Bounded authority. The agent gets a scoped set of tools and a scoped set of systems it's allowed to touch — nothing more.
- A checkpoint before anything irreversible. Refunds, deployments, and credential changes get a human gate, full stop.
This is exactly the role frameworks like LangGraph and CrewAI play. They don't make an agent smarter — they enforce the process boundary as code.
What This Looks Like as a Graph
A minimal, production-shaped version of this pattern in LangGraph puts a human-in-the-loop (HITL) node directly in the execution path for anything irreversible:

from langgraph.graph import StateGraph, END
def classify(state):
state["intent"] = classify_intent(state["ticket"])
return state
def needs_approval(state):
# Route high-risk actions to a human checkpoint
return "human_review" if state["intent"] in RISKY_ACTIONS else "execute"
graph = StateGraph(AgentState)
graph.add_node("classify", classify)
graph.add_node("execute", execute_action)
graph.add_node("human_review", pause_for_human)
graph.add_conditional_edges("classify", needs_approval, {
"human_review": "human_review",
"execute": "execute",
})
graph.add_edge("human_review", "execute")
graph.add_edge("execute", END)
The important part isn't the code — it's what the code forces. The conditional edge is a hard architectural boundary. No amount of prompt engineering can route around it, because the routing decision isn't the model's to make.
The Metric That Actually Matters
Stop measuring agent pilots by "did it produce a plausible-looking output." Start measuring:
| Metric | What It Tells You |
|---|---|
| Task completion rate | Did the agent finish the job end-to-end, not just generate text about it |
| Escalation rate | How often did the HITL checkpoint correctly catch a risky action |
| Silent failure rate | How often did the agent complete a task incorrectly with no signal |
| Time-to-recovery | How fast can a human intervene when something goes wrong |
None of these require a fancier model. All of them require a harness — durable state, bounded tools, and an explicit checkpoint — around the process you already redesigned.
Bottom Line
The agentic reality check isn't "agents don't work yet." It's that agents inherit the shape of the process you give them. Fix the process boundary first — tight domain, explicit state, bounded authority, human gate on anything irreversible — and the agent has a real shot at production. Skip that step, and you've just made your broken process move faster.


Top comments (0)