Simeon Brett

Posted on Apr 12

I Hired 8 IT Gurus to Give Me a Code Review

#ai #claude #codereview

I had a bug fix PR — a NullReferenceException crash in production. Straightforward: remove a null-forgiving operator, add a guard, write some tests, ship it. The kind of thing you'd rubber-stamp in a team review.

Instead, I sent it to Uncle Bob, Linus Torvalds, Kent Beck, Sandi Metz, John Carmack, Martin Fowler, Charity Majors, and Dan Abramov.

Not really, of course. I built a Claude Code skill that dispatches 8 AI agents — each one prompted to embody the philosophy, voice, priorities, and severity calibration of a real-world software figure. They independently review the code (Round 1), then read each other's reviews and argue (Round 2). The disagreements are the output.

Here's what happened.

Round 1: Eight Independent Reviews

Each reviewer analyzed the same diff through their own lens. They didn't know what the others would say.

Linus was characteristically blunt about the root cause:

Whoever wrote that ! was lying to the compiler. They said "trust me, this won't be null" and then shipped code that crashes in production when someone hands it a 1900-01-01 date. That's not a type system problem. That's a developer choosing to ignore a failure mode and moving on.

He voted YES. It fixes the bug, has tests, doesn't over-engineer. Ship it.

Charity Majors saw something else entirely — the fix was invisible:

Right now, the story is: ten messages sitting in the dead-letter queue with NullReferenceException. The engineer wakes up, pulls the exception, sees the stack trace, and starts reading InsertJobCode at midnight trying to reconstruct what state the Employee object was in. They have no idea which employee triggered it, what the incoming date was, or how many times this message has been retried. They have a stack trace. That's it. That's the entire observability story.

She voted CONDITIONAL — ship it, but log the paths. Her concern: the new early-return path succeeds silently. If it ever produces wrong data, nobody will know until business reports break three weeks later.

Dan Abramov proposed something nobody else saw. Instead of guarding against a null that might happen, reorder the loop body so the null can't happen:

What if you move the Remove before the lookup? Then you check for Count == 0 before ever calling JobCodeByDate. If the collection is empty, you've already handled it. If it's not, the lookup always has a predecessor. The null guard becomes structurally unreachable.

John Carmack counted the actual cost of OrderBy(...).Last() inside a while loop — O(n^2 log n) total, heap allocations per iteration — and pointed out it's a two-word fix: swap to MaxBy.

Kent Beck asked his signature question: "Where's the test?" — specifically, for the symmetric PayRate fix that had no corresponding edge-case test.

After Round 1: 1 YES (Linus), 7 CONDITIONAL. Everyone agreed the fix was correct. They disagreed on what else it needed.

Round 2: They Read Each Other's Reviews and Argued

This is where it gets interesting.

Linus changed his vote. The only reviewer to do so. Three arguments moved him:

On Dan Abramov's reordering:

That's genuinely better than what the current diff does. The current code checks null after a lookup that might return null. Abramov's reordering makes the null case provably impossible. That's not a cosmetic change — that's eliminating a category of doubt.

On Carmack's performance point:

His math is solid. The fix already has tests watching this loop, this is the right time to swap in MaxBy, and it's a two-word change. I'm upgrading this from "pre-existing, not my problem" to a real condition.

But he pushed back hard on Charity Majors:

She's treating a message handler as if it's an observable microservice with telemetry dashboards. Adding LogWarning calls to a domain model method is mixing concerns. The Employee domain object should not know about logging infrastructure.

And on Uncle Bob:

You cannot block a specific, targeted, correct bug fix on the existence of a large file. That's not how code review works. He's using the review as a sermon instead of a technical evaluation.

Final tally: 0 YES, 8 CONDITIONAL, 0 NO. The lone YES voter was persuaded to add conditions by the collective analysis.

What the Boardroom Surfaced

The synthesis identified six concrete findings that multiple reviewers independently flagged:

Finding	Flagged By
Reorder loop to eliminate null guard structurally	7 of 8 reviewers
Add missing PayRate edge-case test	7 of 8
Replace `OrderBy(...).Last()` with `MaxBy(...)`	5 of 8
Add structured logging on early-return path	5 of 8
Create backlog item for shared algorithm extraction	5 of 8
Remove `[ExcludeFromCodeCoverage]` from test class	2 of 8

I didn't agree with the last recommendation, but I implemented the first 4 and created a backlog item for the shared algorithm recommendation. Total effort: about an hour. Every one of them was a real improvement I wouldn't have caught in a solo review or a quick team walk-through.

The reordering — Dan Abramov's suggestion — was the single highest-value insight. It didn't just fix the null guard; it made the null guard impossible. That's the kind of structural thinking that a single reviewer rarely produces, because it requires looking at the code from a perspective that isn't your default.

How It Works

The skill is a Claude Code orchestrator, inspired by an article I read about using multiple AI personas for personal decision making. I'd credit the original author if I could find it again — if you recognize the concept, please let me know.

You invoke it with /boardroom-codereview pr1234 and it:

Gathers context — reads the diff, commit messages, optionally your project's coding conventions
Round 1 — dispatches all 8 agents in parallel, each with a personality file that defines their philosophy, review focus areas, severity calibration, voice, and signature move. They write independently.
Round 2 — feeds all Round 1 positions back into each agent. They read everyone else's review and write rebuttals. Votes may change.
Synthesis — the orchestrator analyzes all positions to find the biggest disagreements, mind changes, consensus points, and high-confidence findings.

Output is markdown files plus an interactive HTML report with a filterable findings table.

Each personality is a markdown file you can edit or replace. Want to add a security-focused reviewer? A DBA? An accessibility advocate? Write a personality file and drop it in the directory.

It's Not Just for Code

One of the more interesting sessions was a pure design question: "Should message handlers use constructor injection for optional dependencies?" No code to review — just a question.

Uncle Bob, Linus, and Kent Beck independently converged on the same answer from completely different philosophies. Uncle Bob argued it from SOLID principles. Linus argued it from simplicity. Beck argued it from testability. The consensus was unanimous, but the reasoning diverged — which is exactly the kind of multi-perspective analysis that helps you understand why a practice is correct, not just that it is.

Try It

The skill is open source. If you use Claude Code:

git clone https://github.com/simbrett/boardroom-codereview.git ~/.claude/skills/boardroom-codereview

Then: /boardroom-codereview pr1234 or /boardroom-codereview files:src/whatever.ts or /boardroom-codereview "Should we use event sourcing?".

A full 8-reviewer, 2-round session runs about 3-5 minutes. You can dial it down with --board 3 --rounds 1 for a quick pass.

The value isn't in any single agent's opinion. It's in the tensions between them — the places where Linus and Uncle Bob disagree, where Charity Majors sees risk that Dan Abramov dismisses, where Carmack's performance analysis changes someone's mind. A single perspective gives you a review. Eight competing perspectives give you a debate. The debate is where the real insights hide.

GitHub: https://github.com/simbrett/boardroom-codereview

Top comments (1)

Serren • Jul 18

had a look at the report template. the All Findings table is probably the bit i'd send to someone outside the coding session, but each row only renders location/severity/confidence/text. i couldn't find a stable finding id or row anchor.

when a human says keep this one, drop that one, or add one condition before Claude runs again, does that decision get recorded anywhere structured?

i'm testing this handoff with PreApp. happy to run one non-sensitive boardroom report through publish -> outside note -> pull if you have a fixture.