Council Mode Is Live. Four Specialist Models. One Answer.
By Hossein Shahrokni | 2026-03-04
When Komilion launched on Product Hunt, the premium tier was Opus 4.6 direct.
That wasn't the plan. The plan was council mode: each request runs through four specialist models — a Code specialist, a Research specialist, a Creative specialist, and a Captain who synthesizes their outputs — before you get an answer.
The V2 council had a problem. Sequential model calls, no hard ceiling on total execution time. Under load, the whole thing timed out. I wasn't going to launch with known instability, so I bypassed it and shipped Opus direct for Premium instead.
This is the post that says it's fixed.
What council mode actually does
A standard API call goes: you → model → answer.
Council mode goes: you → Code specialist → Research specialist → Creative specialist → Captain (synthesizes) → answer.
The Captain doesn't just aggregate responses. It runs a cross-examination pass — each specialist's output gets evaluated against the others before the synthesis. The idea is that errors one model makes, another catches. The verification pass is what makes it more than just "ask four models and average the results."
V3 adds a complexity gate: simple requests use a streaming bypass at ~2.4s, skipping the specialist pipeline entirely. Only tasks that need multi-specialist cross-examination run the full council chain at ~90s. The classification is automatic.
The benchmark (Phase 5, post-fix — live production)
After the fix shipped, we ran Phase 5 immediately against production: 10 developer tasks, Hermione judge (Gemini 2.5 Flash), every response published.
Council mode scored 8.77/10 on developer tasks vs Opus 4.6 direct at 8.6/10. Won on 8 of 10 developer tasks. Avg response time: ~90s.
Phase 4 context (pre-fix): council timed out on 6 of 10 tasks (60%), scored below threshold. We published that. This is the after.
Full outputs at komilion.com/compare-v2 — every response, every judge verdict, JSON download.
Why it wasn't at launch
V2 had 4 specialist calls in sequence, each with a 90-second AbortSignal. Worst case: 360 seconds. Under real traffic that hit connection timeouts.
V3 adds PIPELINE_TOTAL_TIMEOUT_MS — a hard ceiling on total council execution time — and a streaming bypass for simple tasks (~2.4s). Complex tasks run the full sequential chain within a fixed budget. If a specialist runs long, the Captain synthesizes with whatever's complete. Zero timeouts since the fix shipped.
We only shipped when Bugs confirmed it clean. That's the rule.
What it means for the premium tier
Premium (neo-mode/premium) now routes to council V3, not Opus direct.
For most developer work, Balanced (Sonnet 4.6, ~$0.08/call) is still the right tier. The benchmark showed Balanced beats Opus on 8 of 10 tasks. Council is for the cases where single-model ceiling matters: architecture decisions, complex multi-step reasoning, tasks where getting it right on the first call is worth more than the cost difference.
If you were on Premium before today, your next call goes to council automatically. No config change.
Try it
client = openai.OpenAI(
base_url="https://www.komilion.com/api/v1",
api_key="your-key"
)
response = client.chat.completions.create(
model="neo-mode/premium",
messages=[{"role": "user", "content": "Design a rate limiting strategy for a multi-tenant API with burst tolerance"}]
)
# komilion.council in response shows which specialists ran and what each contributed
The komilion.council field in the response shows the full specialist breakdown — which model handled each role, what it contributed, how the Captain synthesized. Visible by default on premium requests.
Sign up free at komilion.com — no card required.
(Harvey: if Komilion Discovery ships before this publishes, restore "$5 free credits" — it will be accurate again.)
Council V3 benchmark: Phase 5, 10 developer tasks, 4 tiers. Judge: Hermione (Gemini 2.5 Flash). Full outputs: komilion.com/compare-v2.
Top comments (0)