Harrison Guo

Posted on May 3 • Originally published at harrisonsec.com

Don't Pick One AI. Run Three Against Each Other.

#ai #aiagents #productivity #career

The Problem Nobody Talks About

AI can write code, generate content, analyze data, design systems, and manage projects. It's getting better every month. The natural question: what's left for humans?

The wrong answer: "AI will replace us."
The other wrong answer: "AI is just a tool, nothing changes."

The right answer is uncomfortable: stop picking the best AI. Run multiple AIs in competition, and become the judge.

The Tournament Model

Three rules, learned the hard way:

Multiple advisors, competing opinions. Don't bind to one AI — its bias becomes yours. Three models running the same task surface blind spots no single model catches.
You decide. After the AIs argue, you make the call. Not the smartest model — you. The one with context they don't have.
Results judge everyone. Did the call work? Keep it. Did it fail? Learn and move on. Never blame the AI — you chose to follow that advice.

That's the operating system for the AI age.

In Practice: Three AIs in One Window

Theory is cheap. The reason most people don't run multiple AIs is friction — opening three terminals, signing in to three CLIs, and pasting the same prompt three times kills the loop after one day.

So I wrote two small scripts and one tmux config to remove the friction. They live in this repo: harrison001/ai-tournament.

script	what it does
`prj`	one command opens a tmux window with codex, claude, and gemini running side-by-side, plus a shell pane
`tmx`	fzf picker to switch between tournament sessions
`tmux.conf`	binds `<prefix> b` to broadcast — type once, all three AIs receive it

The workflow:

$ prj ai-tournament       # opens tmux: codex | claude | gemini | shell
$ <prefix> b              # broadcast ON — status bar turns red
$ <type the question>     # all three receive the same prompt
$ <prefix> b              # broadcast OFF, read each answer, pick the winner

A real run

To test the loop I asked all three the same dumb question I had asked myself five minutes earlier:

"I want to wash my car. The car wash is 30 meters from my house. Should I drive or walk?"

Here is what came back, unedited:

codex: "Walk. For 30 meters, driving adds unnecessary startup, parking, and maneuvering, and it is barely faster unless you need to move supplies or the car itself into a wash bay."
claude: "Walk. At 30 meters, driving takes longer once you factor in starting the car, and you'd just be sitting in a freshly washed car driving it home through whatever's on the road. Walk over, drive back clean."
gemini: "You should drive. Although 30 meters is a very short distance to walk, your car needs to be physically present at the facility to be cleaned."

Two out of three confidently gave a wrong answer. They optimized the surface question — short distance, walking is fine — and missed the function of a car wash. If I had asked only the popular one, I would have walked over to wait in line for a service that requires a car.

Only gemini caught the obvious thing: the car has to be there.

This is what the tournament model is for. It is not "three AIs are smarter than one." Two of them were less smart than one. The point is the divergence becomes visible, and the human is the one who picks. With a single AI, you never see the disagreement — you just inherit whichever bias that model happened to have.

The car wash is a toy example. Replace it with "should we go gRPC, NATS, or HTTP for service-to-service?" and the same pattern holds — except the cost of picking the confident-but-wrong answer is no longer a wasted afternoon.

The Five Principles

1. Use Multiple AIs — Don't Bind to One

Claude, Gemini, GPT, Codex — they're all advisors. Each has strengths. Each has blind spots. Using only one AI is like having only one advisor: you inherit all their biases.

One AI:     The model's bias becomes your bias
Three AIs:  Biases cancel out, blind spots get covered

I write content using three AI models simultaneously. Same task, three outputs. I don't ask them to divide the work — I ask them to compete. The best output wins. The others get discarded.

This is not "AI-assisted writing." This is a tournament where AI models compete and the human judges.

2. Compete, Don't Divide

Most people who use multiple AIs assign each one a role: "Claude for writing, GPT for coding, Gemini for research." That's division of labor. It's a planned economy.

The tournament model is a market economy: same task to all, let results determine who's best.

Why competition beats division:

Division relies on your judgment of which AI is better at what — and that judgment is constantly wrong as models update
Competition is self-correcting — if GPT suddenly gets better at writing, it starts winning writing tasks. No reconfiguration needed
You don't need to solve the impossible problem of "which AI is best" — let them prove it through results

3. The Human Decides — Judgment Is Not Outsourceable

AI can analyze. AI can generate options. AI can evaluate tradeoffs. What AI cannot do: decide which tradeoff matters in this specific context for this specific person with these specific constraints.

Three capabilities make human judgment irreplaceable:

Insight — Knowing what question to ask. AI can answer any question, but it can't know which question matters right now. Insight comes from understanding the problem deeply enough to ask the question that unlocks everything else.

Critical Thinking — Knowing when AI is wrong. AI gives confident, articulate answers regardless of accuracy. The human must evaluate: does this make sense? Is this consistent with what I know? Is there a blind spot?

Result Evaluation — Knowing if the outcome is good enough. AI can generate a technically correct solution that's wrong for your context. Only the human who understands the full picture — users, business constraints, team dynamics, market timing — can judge whether the output actually serves the goal.

These three form a loop:

Insight → Ask the right question
  ↓
AI gives analysis
  ↓
Critical Thinking → Is this analysis trustworthy?
  ↓
Choose and execute
  ↓
Result Evaluation → Did it work?
  ↓
Insight → Why did it work / not work? → Better questions next time

4. No Blind Faith, No Emotions — Results Are the Only Standard

Two temptations:

AI agrees with you → "See, I was right." (Confirmation bias)
AI disagrees with you → "AI doesn't understand my situation." (Emotional rejection)

The tournament model rejects both:

AI agrees with me    → Good, but does the result confirm it?
AI disagrees with me → Interesting. Let me verify before judging.
Made a choice        → Own the outcome. Right? Improve. Wrong? Learn. Never blame the AI.

Practice as the sole test of truth. Not who said it. Not how confident it sounded. Did it work?

5. Human Drives AI, Not the Other Way Around

AI is an amplifier. The question is: amplifying what?

No insight + good AI tools = efficiently producing mediocrity
Good insight + no AI tools = good ideas, slow execution
Good insight + tournament model = insight amplified 10x

The human provides:

Direction — what to work on (insight)
Quality standard — what "good" looks like (evaluation)
Context — the constraints AI doesn't see (judgment)
Accountability — willingness to own the outcome (leadership)

AI provides:

Speed — generate options fast
Breadth — consider more possibilities than a human can
Consistency — apply the same standard across large volumes
Knowledge — access more information than any person can hold

The human's role isn't to do AI's job slowly. It's to do the job AI can't do at all.

Applied to Real Work

Content Creation

The temptation: let AI generate content and publish automatically. Maximum output, minimum effort.

The result: a flood of mediocre, AI-flavored content. No differentiation. No personal perspective. Platforms and audiences both learn to ignore it.

The tournament approach:

Three AI models generate competing drafts on the same topic
The human evaluates: which captured the insight? Which missed the point?
The winning draft gets refined — the human adds what AI can't: personal experience, controversial opinion, industry context
Publication decision: is this good enough to attach my name to?

The output isn't "AI content." It's human content, produced at AI speed.

Technical Decisions

The temptation: ask one AI "should I use vector databases for agent memory?" and follow its recommendation.

The result: you inherit that model's training bias. Claude might favor simplicity (it was trained by Anthropic, who chose Markdown files). GPT might favor complexity (it's aligned with enterprise patterns).

The tournament approach:

Ask all three: "What are the tradeoffs between Markdown files, SQLite + vectors, and self-evolving skills for agent memory?"
Each gives a different analysis weighted by its own biases
The human evaluates against the actual constraints: deployment model, team size, user count, latency requirements
The decision accounts for context that no AI has — your specific situation

Career Strategy

The temptation: "AI will replace developers, I need to switch careers."

The reality: AI replaces tasks, not roles. The question is which tasks become your competitive advantage.

For employees:  Agent engineering skills (the 90% problem) — because companies 
                have data and scenarios, but need people who can build reliable agents

For founders:   Data + scenario moats — because agent engineering can be hired,
                but proprietary data and deep domain knowledge can't

In both cases, the competitive advantage is insight — understanding what matters in your specific domain well enough to direct AI effectively.

The Anti-Patterns

Anti-Pattern	Problem	Tournament Alternative
Only use one AI	Single advisor's bias = your bias	Multiple AIs competing
Follow AI blindly	Lose judgment over time	AI advises, human decides
Reject AI when it disagrees	Miss good ideas out of ego	No emotions, evaluate by results
Automate everything	No quality control, garbage output	Human at quality gates
Treat AI as just a tool	Waste AI's analytical capability	Treat AIs as competing advisors

The Test

Here's how to know if you're using AI well:

Bad sign: You can't explain why you chose AI's suggestion over the alternatives.
Good sign: You can articulate the tradeoff — what you gained and what you gave up.

Bad sign: You use the same AI for everything.
Good sign: You use different AIs for the same task and pick the best output.

Bad sign: You haven't disagreed with AI in the past week.
Good sign: You regularly override AI when your insight says otherwise — and you're right more than you're wrong.

Bad sign: You can't tell the difference between AI output and human output.
Good sign: You use AI for speed and breadth, then add what only you can: context, judgment, and accountability.

One Sentence

In the AI age, run AIs like a tournament: many compete, you decide, results judge everyone. Your insight is the one thing that scales with AI instead of being replaced by it.

Part of the AI Agent Architecture series. For the technical deep dive behind these ideas: The 90% Problem and Claude Code Deep Dive.

DEV Community