Why Most "AI Agents" Aren't Really Agents — And What We Built Instead
A 31-agent team on top of Claude Code, engineered with the rigor of a production distributed system.
The Problem With Today's "Agent" Frameworks
Every week another "AI agent" framework ships with breathless claims about autonomous reasoning. Most share the same shape underneath:
- One LLM
- A system prompt
- A tool-calling loop
- Marketing that uses the word agentic four times per page
The deeper you go, the more you notice what's missing:
- No specialization — one agent pretending to be five
- No cross-verification — findings go unchallenged
- No trust calibration — the system treats every output as equally credible
- No self-improvement — prompts stay static until a human rewrites them
- No team — just a lone reasoner pretending
What We Built
Over the past several months, I've been building something different: a 31-agent team on top of Claude Code. It's now open-sourced.
Here's what I learned trying to engineer it with the rigor you'd apply to any production distributed system.
Four Practices That Moved the Needle
1. Enforce Protocol Below the Model, Not Inside It
When an agent skips its required closing sections, a runtime hook fires exit 2 and the dispatch dies. Agents cannot talk their way past an exit code.
Most agent frameworks enforce discipline inside the loop — which is exactly where a sufficiently clever LLM can talk its way around any rule.
2. Contract-Test Your Prompts on Every Commit
I run 11 structural invariants × 31 agents = 341 assertions on every merge. A new agent can't join the team without passing the same shape tests as incumbents.
This single discipline prevents the agent-framework sprawl where every prompt drifts in a slightly different direction.
3. Separate Privileged Operations From Regular Agent Work
Teammates can't spawn agents, install MCPs, or ask the user questions — those are privileged ops.
They emit structured [NEXUS:*] requests to the main thread, which acts as the kernel. Every privileged call is logged and auditable.
4. Calibrate Trust From Outcomes, Not From Prompt Claims
A Bayesian per-agent trust ledger updates on every evidence-validator verdict. The CTO weights conflicting findings by trust during synthesis.
Agents earn credibility the way engineers do: by being right about things that were hard to be right about.
Honest About What's Not Proven
- N=1 battle testing
- Opus-heavy runtime cost
- Experimental Claude Code feature dependencies
- A first-ever dynamic hire whose probation verdicts are still incoming
Try It Yourself
Full version (for complex engineering):
github.com/asiflow/claude-nexus-hyper-agent-team
Light version (for cost optimization):
github.com/asiflow/claude-nexus-hyper-agent-team-light
📖 Full write-up on dev.to:
We Built a 31-Agent AI Team That Hires Itself, Critiques Itself, and Dreams
Feedback Welcome
Happy to hear where you think the model breaks down — especially from people who've tried to scale agent teams and hit walls I haven't.
#AIAgents · #EngineeringLeadership · #MultiAgentSystems · #LLMEngineering · #OpenSource
Top comments (0)