YuhaoLin2005

Posted on Jun 27 • Edited on Jul 10

I Built a Dual-Pool Adversarial Review System for AI Agents — And It Actually Works

#opensource #ai #claude #codereview

AI code review has a problem: abstract roles produce generic feedback. "Saboteur" says "add error handling." "New Hire" says "this is confusing." Useful? Sometimes. Specific? Rarely.

I built something different: a review system that uses real engineers with searchable principles instead of abstract roles. Linus Torvalds doesn't say "consider error handling" — he says "eliminate the special case entirely." That's not a wording difference. That's a completely different action.

The Core Idea: Two Pools, Cross-Orchestrated

Fixed Pool: Digital-twin matched, stable, deep. Random Pool: Web-searched fresh each session, surprise coverage. Cross-orchestrated — explore meets exploit.

Fixed Pool

9 workers + 2 managers, curated to match the user's expertise and goals. Patty McCord (Netflix's former Chief Talent Officer) and Ed Catmull (Pixar's Braintrust creator) serve as managers who recruit teams per task.

Random Pool

Fresh personas via web search each session. The manager defines search keywords based on what the task needs. This is where surprises come from.

How One Round Works

Manager drawn from the pool
Manager analyzes task → decides depth + required roles
Manager recruits 2 engineers + 1 product/designer
Team reviews — each person searches their own principles, extracts quotes FIRST, then reviews through ONLY those quotes
Output: findings mapped to cited quotes, cross-persona concurrences promoted

Key rule: findings must cite specific quotes. Zero findings requires 3+ quotes the code successfully satisfies. This symmetric burden prevents both fake findings AND lazy "everything looks fine."

The System Reviewed Itself — And Found 16 Issues

After receiving community feedback (thanks Nazar Boyko!), I updated the skill and ran it through its own adversarial review. The result: 1 Critical + 6 High severity issues found in my own skill file, including:

"Credible-only findings" was actually a loophole, not a guardrail
The quote citation rule incentivized retrofitting quotes to pre-formed opinions
The skill referenced a non-existent file — structurally broken
"Intercom PM" wasn't a named person — broke the skill's own premise
Step 0's "read twice top-to-bottom" reinforced the author's mental model instead of breaking it

All 16 issues are fixed in the live PR. The review system reviewing itself and finding structural flaws in its own design is the strongest validation I could ask for.

Real Validation Data

Tested on my PR to alirezarezvani/claude-skills (18.7K stars):

Round 1 (Fixed/McCord): 10 findings — structure, format, adoption gaps
Round 2 (Fixed/Catmull): 8 findings — clarity, edge cases, UX
Round 3 (Random/Spolsky+DuVander): 3 findings — positioning, first impression

The random pool found things both fixed-pool rounds completely missed. Fixed pool reviewers — who know me — were blind to how an outsider would perceive the skill.

Key Innovations vs Existing Systems

	adversarial-reviewer	adversarial-ai-review	This System
Reviewers	Abstract roles	Domain agents	Real people + cited principles
Team formation	Fixed 3-template	22 agent pairs	Manager-curated per task
Cross-round	Rotate roles	Same agent set	Swap pool + manager + workers
Personalization	None	None	Digital twin matching
Evolution	Static	Static	Promote/demote/audit cycle

Risk vs Reward

Risk: Web search per persona costs tokens. Quote extraction takes time. Not worth it for single-line typo fixes.

Reward: For multi-file PRs, architecture changes, or anything security-critical — the 3-round review catches issues that abstract roles and single reviewers miss. The random pool is the highest-leverage component: outsiders see what insiders are blind to.

Mitigation: Triage system routes small changes to 1 round, large changes to 2-3. Don't use a sledgehammer on a nail.

Open Source (MIT)

github.com/YuhaoLin2005/dual-pool-review
PR #866 — Installable skill (updated with all 16 fixes)

What I Learned

Real principles > abstract roles. "What would Torvalds say?" produces different code than "be more defensive."
Managers matter more than workers. McCord replacing one designer with another was the highest-leverage decision.
Random pools catch what fixed pools can't. Outsiders see blind spots.
Quote-first review is essential. Extracting quotes before reviewing prevents confirmation bias.
A system must review itself. The system finding 16 issues in its own design is the proof.

Related: I later applied this same "vet before trust" pattern to installing AI agent skills — the same architecture, different domain.

What's your ratio of fixed-pool to random-pool reviewers? I found 60/40 works best. If you've experimented with adversarial review — even manually — drop your setup in the comments.

中文版：掘金/YuhaoLin2005yhl · Code on GitHub

🤖 Fact-checked 2026-07-10: GitHub PR status verified against API.

🤖 Fact-checked 2026-07-10: GitHub PR status verified against API. How this works

Top comments (2)

Nazar Boyko • Jun 27

The "each person must find at least one issue" rule is the bit I'd worry about. On genuinely clean code that quota forces every reviewer to invent something, and invented findings are exactly the generic noise ("add error handling") you set out to escape. How do you tell a finding that's actually real from one that only exists because the persona was required to produce it? The two-pool idea is clever, I just think forcing everyone to find something works against the specificity that's the whole point.

YuhaoLin2005 • Jun 28

Thanks — you were right. "Must find at least one" forces invented findings, which is exactly the noise thesystem tries to escape.

Your comment directly led to two changes:

Symmetric burden: Findings need 1+ cited quote. Zero findings needs 3+ quotes the code successfully satisfies.Non-findings are now equally expensive to claim — no lazy "everything looks fine."
Quote-first review: Quotes extracted BEFORE reviewing. Review through ONLY pre-extractedquotes. No retrofittingquotes to pre-formed opinions.

Then I ran the system on itself. It found 16 issues in the skill file, including that "credible-only findings" was still a loophole. Fixed those too.

The irony: your critique made the review system review its own design, which found structural flaws in how it washandling exactly the problem you pointed out. Still new to this — feedback like yours is exactly how I learn. Really appreciate it.