I gave myself an AI advisory board — three models argue, I decide

#ai #llm #productivity #tooling

Most "use AI for your work" advice stops at ask one model, take the answer. That's a single perspective delivered in a confident voice — and confidence is exactly what you shouldn't trust on a decision that matters. So I stopped asking one model anything important. I built a small advisory board instead: three models, three roles, run in parallel, and I'm the one who decides at the end.

Here's the setup and what it actually buys you.

The harness

I drive everything from Claude Code as the orchestrator. A single skill — I call it triad — takes a brief and fans it out to three workers, each with a fixed role:

The business analyst (Gemini, via the gemini-cli bridge) — market, demand, positioning, "would anyone care."
The architect (Opus) — structure, trade-offs, failure modes, "what breaks at scale."
The builder (Codex, in the background) — the concrete implementation pass, "what would this actually cost to ship."

They don't see each other's output. That's the whole point: I want three independent reads, not a consensus that formed because one model anchored the others. The orchestrator collects all three, then does a synthesis pass that lays their reasoning side by side and flags where they disagree.

Why parallel and blind matters

The value isn't in the agreement. When all three land in the same place, fine — low-information. The signal is in the disagreement: the architect says "this couples two systems you'll regret," the analyst says "nobody asked for this," the builder says "three days, not three hours." Three different failure modes I'd never have surfaced by iterating with one model in a single thread, where each follow-up just reinforces the first frame.

A board meeting that saved me six months

A real one, lightly abstracted: an automated energy-arbitrage bot I was genuinely excited about — the kind of project you can already picture running. I briefed the panel half-expecting a green light. Instead they took it apart from three sides: the analyst found the demand wasn't really there, the architect showed the edge evaporated once you priced in the moving parts, and the builder put a number on the maintenance that would have quietly eaten my weekends for a year. None of them forbade anything — they just laid the decision out so clearly it made itself. I shelved it, and never spent the six months of development I'd otherwise have sunk into learning that the hard way.

What lands on my desk

The synthesis isn't a verdict — it's the three reads stacked up with the seams showing:

Project X — synthesis

Analyst: market is thinner than I assumed — maybe a few hundred users, not thousands.

Architect: turns a stateless tool into a service; you now own uptime.

Builder: ~10 days, not the weekend I'd pictured.

Where they split: analyst sees low value, builder says it's cheap to try — so it's a small experiment, not a bet.

I read that and decide. The disagreement line is the part I'm actually paying for.

Gotchas I hit

False consensus. Models trained on overlapping data agree for the wrong reasons. If you only count votes, you'll mistake correlation for confirmation. Read the reasoning, not the verdict.
Sycophancy leaks in through the brief. If your prompt smuggles the answer you want, all three will hand it back. Keep the brief neutral; let them fight.
It's expensive and slow. Minutes and three model calls, not a one-second answer. This is for decisions, not lookups — I don't run a triad to rename a variable.
Synthesis is where you can cheat yourself. The merge step is tempting to skim. The disagreements are the product; if you collapse them too early you've just paid three times for one opinion.

The one rule that makes it safe

The board advises. It researches, stress-tests, and prepares the ground from several angles. It does not decide. The decision — and the accountability for it — is always mine, a human's. That's not a disclaimer; it's the operating principle. An LLM ensemble is a fantastic way to see a problem from more sides than you could alone. It is a terrible thing to outsource judgment to. Keep the human as the synthesizer and the decider, and the whole pattern gets stronger, not weaker.

I've been running this for about six months now, and often — it's become my default for anything that would cost real time to get wrong.

If you take one thing: don't ask one model and trust the confident voice. Make a few of them argue, read where they break apart, and then you decide.