Most agent frameworks assume you know the best agent for the job before the job starts. You pick a model, wire a DAG, and hope it holds.
We didn't know. So we made 201 agents compete for every task — and let outcomes decide.
This is the architecture behind Sturna.ai, and why we call it the octopus brain.
The Problem with Static DAGs
LangGraph, CrewAI, AutoGen — they're all variations of the same idea: you compose agents into a fixed graph. Agent A calls Agent B which calls Agent C. The flow is known at design time.
That works until it doesn't.
In production, task diversity is brutal. A single "analyze my competitors" intent might need a web scraper, a summarizer, a data formatter, and a report writer — or it might need completely different agents depending on which competitors, which market, which output format. Static graphs require you to anticipate all of this upfront. You can't.
The deeper problem: when a node fails, the whole DAG fails. There's no self-healing. There's no "try something else." You get an error and you restart.
The Octopus Brain Model
An octopus has a central brain but its arms have their own neural clusters — each arm can act semi-independently, process information locally, and adapt without waiting for central coordination.
We built Sturna with the same principle:
- Central coordinator receives an intent and broadcasts it to all capable agents
- 201 specialized agents each evaluate the task independently and submit proposals
- Competitive routing selects the best proposal based on past performance, confidence scores, and task type
- Execution layer runs the winning agent — and if it fails, automatically routes to the next best proposal
No fixed DAG. No predetermined path. The route emerges from competition.
What "Self-Healing" Actually Means
When people say "self-healing," they usually mean retry logic. Retry the same thing 3 times, then give up.
That's not healing. That's hoping.
Sturna's self-healing is architectural:
- Every task has N competing proposals, ranked by predicted success
- If agent #1 fails, the system doesn't restart — it promotes agent #2
- Agent #2 runs with full context of what agent #1 attempted
- Failure data feeds back into routing scores, making future routing smarter
The agents aren't just competing for the first run. They're competing across every run, accumulating performance history that shapes every future routing decision.
The Numbers After 6 Months in Production
After running this system across thousands of real tasks:
- 201 active agents across 14 capability categories
- 86%+ first-attempt success rate (vs ~60% with our original static routing)
- 45-second median time-to-value from intent to delivered result
- Self-healing triggered on ~14% of tasks — those tasks still complete, they just take a second pass
The 86% number is the one I'm most proud of. That's not accuracy on benchmarks — that's real tasks from real users completing successfully on the first agent attempt.
Competitive Routing vs Static DAGs: The Real Tradeoff
I want to be honest about what you give up with competitive routing:
| Static DAG | Competitive Routing | |
|---|---|---|
| Predictability | High — same path every time | Lower — path varies by agent performance |
| Debuggability | Easy — trace the graph | Harder — need proposal replay logs |
| Latency (simple tasks) | Lower | Higher — broadcast overhead |
| Latency (complex tasks) | Higher — no fallback path | Lower — parallel evaluation |
| Failure recovery | Manual — fix the DAG | Automatic — next proposal promoted |
| Improvement over time | Manual — you retune | Automatic — routing learns |
For simple, well-scoped tasks you run thousands of times, static DAGs win on predictability. For diverse, open-ended tasks where failure matters, competitive routing wins on resilience.
We built Sturna for the second category.
How Agents Submit Proposals
Each agent in Sturna exposes a canHandle(intent) method that returns a confidence score (0-1) and an execution plan. When a task comes in:
// Simplified — real implementation has more context
interface AgentProposal {
agentId: string;
confidence: number;
estimatedDuration: number;
executionPlan: string;
requiredCapabilities: string[];
}
// Coordinator broadcasts and collects
const proposals = await Promise.all(
agents.map(agent => agent.evaluate(intent))
);
// Rank by: confidence × historical success rate × recency
const ranked = rankProposals(proposals, agentHistory);
The ranking function is the core IP. Confidence alone isn't enough — an agent can be overconfident on task types it's bad at. We weight heavily by actual historical success rate, with recency bias (recent performance matters more than old performance).
What We Got Wrong First
Two things killed our first two versions:
Version 1: Too much competition. Broadcasting to all 201 agents created ~400ms of overhead even before execution started. We added capability tagging — agents declare what they can handle, and broadcast only goes to capable agents. Overhead dropped to ~30ms.
Version 2: No proposal replay. When an agent failed, the next agent started completely fresh. Users saw inconsistent results. We built a context handoff layer — the winning backup agent receives what the failed agent attempted, and can continue rather than restart.
The context handoff was 3 weeks of work and cut re-execution time in half.
Where This Goes
The 201-agent number isn't a ceiling. Every new capability we add is a new agent. The routing system gets better the more agents compete — more data, more diversity, more paths to success.
We're currently working on agent coalitions: groups of agents that propose to handle a task collaboratively, with shared execution context. The octopus brain, but with arms that can coordinate.
If you're building agent infrastructure and want to compare notes, we're at sturna.ai. The system is live and handling real production traffic — we'd rather learn from builders than pitch in abstractions.
This post covers the architecture as it exists today. The numbers are from our internal dashboards as of April 2026.
Top comments (0)