Last quarter, three different companies I know personally—not "a source told me," literally people I've grabbed coffee with—quietly shelved their AI agent projects. Not paused. Shelved. One was an 18-month effort with a team of six. They're calling it a "strategic pivot." It's a failure.
Nobody's writing about this yet, which is exactly why I am.
Agents were the story of 2025. Every conference talk, every VC deck, every breathless announcement: autonomous AI agents are going to transform workflows, replace entire job functions, run your business while you sleep. The demos were stunning. The real-world deployments? A lot quieter.
Here's what I keep seeing: agents work beautifully in controlled environments. You build a demo where the agent books a meeting, writes a summary, updates the CRM — and it works, and everyone in the room loses their mind. Then you try to deploy it against your actual messy production data, with edge cases and ambiguous inputs and APIs that return unexpected things, and the wheels come off.
Not sometimes. Constantly.
The problem isn't that the models are bad. The models are incredible. The problem is that agents are non-deterministic systems operating in deterministic environments that weren't built to handle non-determinism gracefully.
Think about what "tool use" actually means in practice. Your agent calls an API. The API returns a 429 because of rate limits. Does the agent retry intelligently? Does it degrade gracefully? Does it tell the user what happened, or does it silently hallucinate a response as if the call succeeded? In a demo, you control every variable. In production, you don't control anything.
I watched a friend's "autonomous" customer support agent confidently tell a user their order had shipped when it hadn't. The agent had called the order API, gotten back a response it misinterpreted, and constructed a confident lie. The user waited four days for a package that wasn't coming.
The dirty secret of the agent space right now is that most "agent" deployments are actually just chains. Glorified, well-structured chains with a bit of routing logic — but chains. There's a human in the loop somewhere, or the "agent" is only allowed to operate within such a narrow domain that it can't really fail in interesting ways.
That's not nothing. Chains are useful. But it's not the autonomous future we were sold.
Real autonomy requires reliable judgment under uncertainty. And while models have gotten dramatically better at reasoning — the jump from GPT-4 to where we are now is genuinely wild — they still struggle with knowing what they don't know. An agent that confidently pursues the wrong goal is worse than no agent at all. At least a human stops and asks when something feels off.
The research community has known this for a while. There's a growing body of work on "agent failure modes" that most people building products haven't read. Compounding errors are a killer: small mistakes early in a multi-step workflow cascade into catastrophic outputs by step seven. The longer the chain, the worse the problem.
So where does that leave us?
Not where the hype cycle suggests. But also not nowhere.
The use cases that are actually working are narrower and more boring than the pitch decks promised. Code generation? Legitimately transformative, full stop. I'm shipping things in hours that would've taken days. Document processing pipelines with human review? Solid. Retrieval-augmented generation for internal knowledge bases? Works well when you invest in the retrieval part.
What isn't working is "set it and forget it" autonomy for anything involving external state, money, or customer-facing communication. That's not a model limitation. That's a systems architecture problem that hasn't been solved yet.
My honest read: we're about 18-24 months away from agents being reliably deployable in genuinely autonomous roles for complex tasks. Not because the models need to get smarter — they're already quite smart. Because the infrastructure, the error recovery patterns, the observability tooling, and frankly the institutional knowledge of how to build reliable agentic systems is still being figured out in real time.
The companies that will win are the ones quietly building that infrastructure now instead of chasing demos.
There's also a trust problem that doesn't get discussed enough.
Even if you build an agent that works correctly 99% of the time, that 1% can be catastrophic enough that humans won't trust it. And trust, once broken, is hard to rebuild. Several early adopters rushed agents into production, had visible failures, and now have leadership that won't greenlight anything "agentic" for the next two years. That's not irrational. That's learned caution.
The field needs a few years of boring, incremental reliability improvements before mainstream enterprise adoption happens for real. That's not pessimism — that's just how software adoption works. Databases had decades of this. Web services had it. The hype peaks before the infrastructure catches up. Always.
I'm not saying agents are dead or overhyped in the "this will never work" sense. I'm saying the timeline was off, the complexity was undersold, and the industry is now in the uncomfortable middle phase where the demo magic wears off and the real engineering begins.
The teams I respect are the ones that already know this. They're building carefully, failing quietly, learning fast, and not announcing anything until they have something that actually holds up in production.
The teams I worry about are the ones still running the 2025 playbook in 2026, wondering why their beautiful agent keeps doing weird things.
The bubble isn't popping. It's just... leaking. Slowly finding its actual shape. And the shape is smaller than promised — but honestly, still pretty interesting.
Top comments (0)