Khaled Elazab

Posted on Apr 24

claude-nexus-hyper-agent-team: We Built a 31-Agent AI Team That Hires Itself, Critiques Itself, and Dreams

#ai #claude #nexus #agents

Why Most "AI Agents" Aren't Really Agents — And What We Built Instead

A 31-agent team on top of Claude Code, engineered with the rigor of a production distributed system.

The Problem With Today's "Agent" Frameworks

Every week another "AI agent" framework ships with breathless claims about autonomous reasoning. Most share the same shape underneath:

One LLM
A system prompt
A tool-calling loop
Marketing that uses the word agentic four times per page

The deeper you go, the more you notice what's missing:

No specialization — one agent pretending to be five
No cross-verification — findings go unchallenged
No trust calibration — the system treats every output as equally credible
No self-improvement — prompts stay static until a human rewrites them
No team — just a lone reasoner pretending

What We Built

Over the past several months, I've been building something different: a 31-agent team on top of Claude Code. It's now open-sourced.

Here's what I learned trying to engineer it with the rigor you'd apply to any production distributed system.

Four Practices That Moved the Needle

1. Enforce Protocol Below the Model, Not Inside It

When an agent skips its required closing sections, a runtime hook fires exit 2 and the dispatch dies. Agents cannot talk their way past an exit code.

Most agent frameworks enforce discipline inside the loop — which is exactly where a sufficiently clever LLM can talk its way around any rule.

2. Contract-Test Your Prompts on Every Commit

I run 11 structural invariants × 31 agents = 341 assertions on every merge. A new agent can't join the team without passing the same shape tests as incumbents.

This single discipline prevents the agent-framework sprawl where every prompt drifts in a slightly different direction.

3. Separate Privileged Operations From Regular Agent Work

Teammates can't spawn agents, install MCPs, or ask the user questions — those are privileged ops.

They emit structured [NEXUS:*] requests to the main thread, which acts as the kernel. Every privileged call is logged and auditable.

4. Calibrate Trust From Outcomes, Not From Prompt Claims

A Bayesian per-agent trust ledger updates on every evidence-validator verdict. The CTO weights conflicting findings by trust during synthesis.

Agents earn credibility the way engineers do: by being right about things that were hard to be right about.

Honest About What's Not Proven

N=1 battle testing
Opus-heavy runtime cost
Experimental Claude Code feature dependencies
A first-ever dynamic hire whose probation verdicts are still incoming

Try It Yourself

Full version (for complex engineering):
github.com/asiflow/claude-nexus-hyper-agent-team

Light version (for cost optimization):
github.com/asiflow/claude-nexus-hyper-agent-team-light

📖 Full write-up on dev.to:
We Built a 31-Agent AI Team That Hires Itself, Critiques Itself, and Dreams

Feedback Welcome

Happy to hear where you think the model breaks down — especially from people who've tried to scale agent teams and hit walls I haven't.

#AIAgents · #EngineeringLeadership · #MultiAgentSystems · #LLMEngineering · #OpenSource

DEV Community