DEV Community

Umair Tareen
Umair Tareen

Posted on

5 real use cases for AI sub-agents (I built all of them with a council of 11 dead philosophers)

Sub-agents are the pattern everyone is talking about and almost nobody ships past a demo. The idea is simple: instead of asking one model one question and trusting one answer, you stand up several specialized agents, each with its own job, and make them work the problem together.

The hard part is making that more than theater. Eleven agents that all secretly agree are just one agent with a higher bill.

So I built a working test of the idea and pushed on it until it earned its keep. philosopher-council is an LLM deliberation engine where eleven philosophers each sit as their own sub-agent. Socrates interrogates your question. Kant checks whether your rule would survive as a universal law. Lao Tzu asks what happens if you do nothing. Ibn ?Arabi listens to all of them and writes the synthesis. Every voice is scored on the same four virtues so you can audit why it landed where it did, and a dissenter always gets the last word.

It is MIT licensed, runs offline with no API key in about sixty seconds, and it ships an eval that publishes the runs where it lost. Here are the five use cases that convinced me sub-agents are worth the wiring. Each one is a pattern you can lift straight into your own stack.

1. Pressure-test a decision before you commit to it

Pattern: one model gives you the answer it thinks you want. A panel of opposed agents gives you the answer plus the strongest case against it.

In the repo: pnpm ask --context "We run a 5-agent trading desk" "Where should a human stay in the loop?" convenes a quorum. Each philosopher returns a scored opinion with its reasoning and concerns, then the synthesizer reconciles them into one recommendation with a confidence number. When I asked it "Should agentic AI systems spend money autonomously?" the verdict came back 0.41, ignore, with Lao Tzu dissenting hardest. That is a more useful artifact than a confident paragraph.

Why it matters: for any decision you would defend to a boss, a board, or an auditor, the value is not the verdict. It is the legible disagreement that produced it.

2. Triage a firehose instead of reading it

Pattern: use cheap, parallel sub-agents as scouts. Point them at a stream, let them filter, and only spend your attention on what survives.

In the repo: pnpm trends:run aims the council at the AI-research firehose (Reddit, Hacker News, arXiv) and triages what is worth your time. A clerk can pull a fresh web brief first, so the bench is not reasoning from stale training data.

Why it matters: the bottleneck in 2026 is not information, it is judgment at scale. Sub-agents are how you rent more of it.

3. Build the dissent in, do not hope for it

Pattern: a single model averages toward the safe, agreeable answer. A dedicated red-team sub-agent whose only job is to disagree drags the blind spots into the light.

In the repo: dissent is a first-class output, not a footnote. The minority report survives synthesis instead of being smoothed away, so the seat that thinks the whole plan is wrong still gets quoted in the final record. Compare that with the common "a chairman model decides" design, which quietly deletes the disagreement.

Why it matters: the failure mode of AI assistance is plausible consensus. Engineering disagreement is how you fight it.

4. Route each agent to the cheapest model that can do its job

Pattern: sub-agents do not all need the frontier model. Match the model to the task and your bill drops without your quality following it.

In the repo: every seat can run on a different provider. One line of config puts Lao Tzu on a local Ollama 7B (fitting, for the philosopher of doing less), Kant on GPT-4o, Descartes on Gemini, and the fast self-critique loop on a cheap Haiku. Each opinion records which provider:model produced it. You can run the entire council for free on local models, or inside free tiers.

Why it matters: the standard objection to multi-agent systems is cost. Heterogeneous routing is the answer, and it is trivial once each agent is its own seat.

5. Make the reasoning auditable, not just the answer

Pattern: a single answer is a black box. Sub-agents with explicit, scored outputs turn the reasoning into something you can inspect, store, and cite later.

In the repo: every opinion is scored on the four cardinal virtues, Wisdom, Courage, Justice, Temperance, so verdicts from very different methods become comparable. Every deliberation is saved as a structured record, and when a similar question returns, the relevant precedents are put before the bench, which is told to follow or overturn them on the merits and say which. The system develops case law.

Why it matters: in regulated work, "the model said so" is not an answer you can ship. A panel of scored, stored, citable opinions is.

Does it actually work?

This is the part most multi-agent demos skip. The repo ships a blind eval: same questions, answers anonymized and shuffled, two independent judges scoring insight, rigor, blind-spot coverage, and actionability. On a fixed set of 50 questions the council won 31 of 50 head-to-head against a single direct answer, and a generic Advocate/Critic/Judge debate won exactly 1. Named perspectives with documented methodologies beat generic debate roles.

And the honest part: the very first run lost. The council scored 0.370 against 0.763 for a single answer, because it kept critiquing the question instead of answering it. The committed report says so in the judges' own words. The fix was a spokesperson stage, deliberation in and a direct answer out, and that flipped the result. All three reports, including the loss, are in the repo.

Try it in sixty seconds (no API key)

git clone https://github.com/umair-tareen/philosopher-council.git
cd philosopher-council
pnpm install && pnpm build
DRY_RUN=1 pnpm ask "What is a benchmark, really?"   # mock model, instant
pnpm ui                                              # the council chamber on :4173
Enter fullscreen mode Exit fullscreen mode

That runs the full experience on mock responses, so you can watch the deliberation stream before you spend a cent. Add an Anthropic, OpenAI, Gemini, or Ollama config and it goes live.

If the patterns above are useful, the repo is the reference implementation for all five: https://github.com/umair-tareen/philosopher-council. Stars help other people find it, and forks are where the interesting variations start. I would genuinely like to see which bench you would seat.

Top comments (0)