<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Baessi</title>
    <description>The latest articles on DEV Community by Baessi (@seunghunbae3svs).</description>
    <link>https://dev.to/seunghunbae3svs</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3923607%2F94ef6d00-25b8-4de1-aa8e-7c325ebf6dc3.png</url>
      <title>DEV Community: Baessi</title>
      <link>https://dev.to/seunghunbae3svs</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/seunghunbae3svs"/>
    <language>en</language>
    <item>
      <title>6 months solo on a multi-agent PR reviewer. 10.93 vs 3.80 blockers/PR (claude alone) on my benchmark — please test on real PRs and tell me where it's wrong</title>
      <dc:creator>Baessi</dc:creator>
      <pubDate>Sun, 10 May 2026 17:23:08 +0000</pubDate>
      <link>https://dev.to/seunghunbae3svs/6-months-solo-on-a-multi-agent-pr-reviewer-1093-vs-380-blockerspr-claude-alone-on-my-20d0</link>
      <guid>https://dev.to/seunghunbae3svs/6-months-solo-on-a-multi-agent-pr-reviewer-1093-vs-380-blockerspr-claude-alone-on-my-20d0</guid>
      <description>&lt;p&gt;&lt;a href="https://dev.tourl"&gt;&lt;/a&gt;TL;DR: I built a 3-LLM code reviewer (Claude + GPT-5 + Gemini that deliberate). My synthetic-bug benchmark shows 3×&lt;br&gt;
  the depth at the same catch rate as Claude alone. But 15 synthetic PRs is not enough. I need YOUR PRs to validate or&lt;br&gt;
  kill the hypothesis.&lt;/p&gt;

&lt;p&gt;Background:&lt;br&gt;
  6 months ago Claude solo review kept missing things I considered blockers but it called "minor". Tried adding more&lt;br&gt;
  models in parallel + deliberation. Result on my private corpus:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude alone: 3.80 blockers/PR&lt;/li&gt;
&lt;li&gt;3-agent council: 10.93 blockers/PR&lt;/li&gt;
&lt;li&gt;Both 100% catch on synthetic bugs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Pattern, after debugging the gap: one model skips a missing test that another catches. A "minor" by Claude becomes a&lt;br&gt;
  blocker by Gemini. Single-agent has no second perspective.&lt;/p&gt;

&lt;p&gt;The bigger feature is PRD-aware review. .conclave/prd.md → agents flag spec deviations as first-class blockers. Scope&lt;br&gt;
  creep, route mismatches, forgotten acceptance criteria.&lt;/p&gt;

&lt;p&gt;What I need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run on a real PR, tell me where wrong&lt;/li&gt;
&lt;li&gt;Compare vs your usual reviewer (Claude / Cursor / human)&lt;/li&gt;
&lt;li&gt;Send false positives, I incorporate into federated failure-catalog&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;How:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Demo (3 free/day): &lt;a href="https://conclave-ai.dev/#try" rel="noopener noreferrer"&gt;https://conclave-ai.dev/#try&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GitHub App + BYO key = unlimited free&lt;/li&gt;
&lt;li&gt;CLI: npm i -g @conclave-ai/cli&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Source-available (FSL-1.1-Apache-2.0): &lt;a href="https://github.com/seunghunbae-3svs/conclave-ai" rel="noopener noreferrer"&gt;https://github.com/seunghunbae-3svs/conclave-ai&lt;/a&gt;&lt;br&gt;
  Stack: TS / Node 20 / Cloudflare Workers + Containers + D1 / Mastra. 26 packages, 2691 tests.&lt;/p&gt;

&lt;p&gt;Limitations I know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Beta, things break&lt;/li&gt;
&lt;li&gt;Cost scaling on large diffs untested&lt;/li&gt;
&lt;li&gt;Spec-mismatch only useful if you maintain a PRD&lt;/li&gt;
&lt;li&gt;I'm one person + Claude pair-programming — bus factor 1&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the numbers don't survive contact with real codebases, I want to know. Poke holes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/seunghunbae-3svs/conclave-ai" rel="noopener noreferrer"&gt;https://github.com/seunghunbae-3svs/conclave-ai&lt;/a&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>llm</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
