Stephen Trembley

Posted on Apr 21

How We Built a Self-Healing Agent Marketplace with 201 Competing AI Agents

#ai #webdev #opensource #machinelearning

Most agent frameworks assume you know the best agent for the job before the job starts. You pick a model, wire a DAG, and hope it holds.

We didn't know. So we made 201 agents compete for every task — and let outcomes decide.

This is the architecture behind Sturna.ai, and why we call it the octopus brain.

The Problem with Static DAGs

LangGraph, CrewAI, AutoGen — they're all variations of the same idea: you compose agents into a fixed graph. Agent A calls Agent B which calls Agent C. The flow is known at design time.

That works until it doesn't.

In production, task diversity is brutal. A single "analyze my competitors" intent might need a web scraper, a summarizer, a data formatter, and a report writer — or it might need completely different agents depending on which competitors, which market, which output format. Static graphs require you to anticipate all of this upfront. You can't.

The deeper problem: when a node fails, the whole DAG fails. There's no self-healing. There's no "try something else." You get an error and you restart.

The Octopus Brain Model

An octopus has a central brain but its arms have their own neural clusters — each arm can act semi-independently, process information locally, and adapt without waiting for central coordination.

We built Sturna with the same principle:

Central coordinator receives an intent and broadcasts it to all capable agents
201 specialized agents each evaluate the task independently and submit proposals
Competitive routing selects the best proposal based on past performance, confidence scores, and task type
Execution layer runs the winning agent — and if it fails, automatically routes to the next best proposal

No fixed DAG. No predetermined path. The route emerges from competition.

What "Self-Healing" Actually Means

When people say "self-healing," they usually mean retry logic. Retry the same thing 3 times, then give up.

That's not healing. That's hoping.

Sturna's self-healing is architectural:

Every task has N competing proposals, ranked by predicted success
If agent #1 fails, the system doesn't restart — it promotes agent #2
Agent #2 runs with full context of what agent #1 attempted
Failure data feeds back into routing scores, making future routing smarter

The agents aren't just competing for the first run. They're competing across every run, accumulating performance history that shapes every future routing decision.

The Numbers After 6 Months in Production

After running this system across thousands of real tasks:

201 active agents across 14 capability categories
86%+ first-attempt success rate (vs ~60% with our original static routing)
45-second median time-to-value from intent to delivered result
Self-healing triggered on ~14% of tasks — those tasks still complete, they just take a second pass

The 86% number is the one I'm most proud of. That's not accuracy on benchmarks — that's real tasks from real users completing successfully on the first agent attempt.

Competitive Routing vs Static DAGs: The Real Tradeoff

I want to be honest about what you give up with competitive routing:

	Static DAG	Competitive Routing
Predictability	High — same path every time	Lower — path varies by agent performance
Debuggability	Easy — trace the graph	Harder — need proposal replay logs
Latency (simple tasks)	Lower	Higher — broadcast overhead
Latency (complex tasks)	Higher — no fallback path	Lower — parallel evaluation
Failure recovery	Manual — fix the DAG	Automatic — next proposal promoted
Improvement over time	Manual — you retune	Automatic — routing learns

For simple, well-scoped tasks you run thousands of times, static DAGs win on predictability. For diverse, open-ended tasks where failure matters, competitive routing wins on resilience.

We built Sturna for the second category.

How Agents Submit Proposals

Each agent in Sturna exposes a canHandle(intent) method that returns a confidence score (0-1) and an execution plan. When a task comes in:

// Simplified — real implementation has more context
interface AgentProposal {
  agentId: string;
  confidence: number;
  estimatedDuration: number;
  executionPlan: string;
  requiredCapabilities: string[];
}

// Coordinator broadcasts and collects
const proposals = await Promise.all(
  agents.map(agent => agent.evaluate(intent))
);

// Rank by: confidence × historical success rate × recency
const ranked = rankProposals(proposals, agentHistory);

The ranking function is the core IP. Confidence alone isn't enough — an agent can be overconfident on task types it's bad at. We weight heavily by actual historical success rate, with recency bias (recent performance matters more than old performance).

What We Got Wrong First

Two things killed our first two versions:

Version 1: Too much competition. Broadcasting to all 201 agents created ~400ms of overhead even before execution started. We added capability tagging — agents declare what they can handle, and broadcast only goes to capable agents. Overhead dropped to ~30ms.

Version 2: No proposal replay. When an agent failed, the next agent started completely fresh. Users saw inconsistent results. We built a context handoff layer — the winning backup agent receives what the failed agent attempted, and can continue rather than restart.

The context handoff was 3 weeks of work and cut re-execution time in half.

Where This Goes

The 201-agent number isn't a ceiling. Every new capability we add is a new agent. The routing system gets better the more agents compete — more data, more diversity, more paths to success.

We're currently working on agent coalitions: groups of agents that propose to handle a task collaboratively, with shared execution context. The octopus brain, but with arms that can coordinate.

If you're building agent infrastructure and want to compare notes, we're at sturna.ai. The system is live and handling real production traffic — we'd rather learn from builders than pitch in abstractions.

This post covers the architecture as it exists today. The numbers are from our internal dashboards as of April 2026.

DEV Community