Most AI agent systems come in two flavors: a single autonomous agent
looping until it declares victory (AutoGPT), or multiple agents
dividing labor on a task (CrewAI). I wanted a third flavor — agents
that actually disagree with each other, and a system that gets smarter
from those disagreements.
This is the story of building Agora, what I borrowed from existing
projects, and the one thing that's genuinely new.
The problem with "one AI thinks for you"
When engineers plan a feature, they don't just have one brain compute
the answer. They research prior art. They debate. Someone pokes holes.
Then they act. Each role catches different blind spots.
Single-LLM systems skip all of that. You get one answer, often with
confidence that isn't earned.
Multi-agent systems like CrewAI get you partway there — agents divide
labor — but they rarely argue. The "Research agent" hands off to
"Write agent" who hands off to "Edit agent" in a pipeline. No one's
job is to push back.
What Agora does differently
Agora has a council: Scout (research), Architect (design), Critic
(challenges), Synthesizer (summary), plus an optional Sentinel
(security review). They run in parallel on the same input. Their
outputs go to the Synthesizer, who notes where they agreed, where
they disagreed, and turns it into action items you approve before
anything executes.
Then the part I think is genuinely new: after the session, Agora runs
two skill extractors independently:
- Execution skill extractor — "what worked for this task type" (learned from the Executor's tool-calling trace)
- Discussion skill extractor — "how did the roles disagree, how was it resolved" (learned from the council transcript)
The second one has a dedicated prompt (_DISCUSS_PROMPT in the code)
that explicitly asks "what did each perspective contribute" and "how
were disagreements resolved". It's structurally impossible for a
single-agent system to produce this signal — there's no one to
disagree with.
Honest positioning
I stand on two people's work:
- DeerFlow (ByteDance) gave me the sandbox execution model and the memory design
- Hermes Agent (Nous Research) gave me the learn-skills-from-execution pattern
Agora's original contribution is the council discussion layer AND
the discussion-skill extraction. Both originals are credited in
the README.
Architecture overview
User input
↓
Moderator (routing) → QUICK / DISCUSS / EXECUTE / CLARIFY
↓
DISCUSS branch (parallel):
Scout (web search) ║ Architect (design) ║ Critic (challenge)
↓
Synthesizer (merges to action items)
↓
User approves
↓
Executor (tool-calling loop: read / write / patch / shell)
↓
SkillExtractor (discussion skill ‖ execution skill, independent)
Implementation details worth highlighting
Three-tier skill matching
# backend/agora/skills/store.py
match_embedding(query) # Primary: semantic similarity
match_llm(query) # Fallback: LLM relevance check
match_keyword(query) # Last resort: keyword overlap
Works in environments without embedding providers. Each tier has
different cost/quality tradeoffs — the system degrades gracefully
instead of hard-failing.
Parallel council execution
tasks = [scout.run(), architect.run(), critic.run()]
results = await asyncio.gather(*tasks)
summary = await synthesizer.run(results)
Council wall-clock time stays close to single-agent response time
instead of scaling with role count. This is what makes the
"council debate" pattern actually usable in a product.
SSE streaming across agents
The web UI shows all four agents streaming tokens simultaneously.
Without this, users wait 30 seconds for a blob of text. With it,
the "council discussing" metaphor feels alive. Worth every line of
the frontend work.
Testing AI systems
Two-tier strategy:
- Unit tests with mocks (188) — verify control flow, prompt structure, data shape
- Integration tests with LLM-as-Judge (15) — verify actual output quality
Unit tests catch regressions fast; integration tests catch quality
drift when you change prompts. Both matter.
What's next
- Web UI polish (discussion visualization still needs work)
- MCP server support (external tool integrations beyond the built-in set)
- Skill versioning (current skills are immutable; roll-forward should be possible)
- Dynamic sub-agent generation (on-demand specialist roles)
Try it
git clone https://github.com/wilbur-labs/Agora.git
cd Agora
cp .env.example .env # add CLAUDE_API_KEY
docker compose up -d
open http://localhost:8000
MIT license. FastAPI + Next.js. Works with Claude, Azure OpenAI,
OpenAI, or any OpenAI-compatible endpoint (including local Ollama
/ vLLM).
Ask
If you've seen prior art for extracting skills from multi-agent
disagreement specifically, please tell me. I've done due diligence
on the usual suspects but it's a small world and I could easily
have missed something. GitHub issues are open.

Top comments (0)