<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sho_Ikeda</title>
    <description>The latest articles on DEV Community by Sho_Ikeda (@sho_ikeda).</description>
    <link>https://dev.to/sho_ikeda</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3860985%2F52fbd2d1-8997-456f-87a8-015f6689c0f8.png</url>
      <title>DEV Community: Sho_Ikeda</title>
      <link>https://dev.to/sho_ikeda</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sho_ikeda"/>
    <language>en</language>
    <item>
      <title>Building Lysis: A Review Engine Where AI Models Collaborate and Evolve</title>
      <dc:creator>Sho_Ikeda</dc:creator>
      <pubDate>Sat, 04 Apr 2026 13:04:42 +0000</pubDate>
      <link>https://dev.to/sho_ikeda/building-lysis-a-review-engine-where-ai-models-collaborate-and-evolve-41b6</link>
      <guid>https://dev.to/sho_ikeda/building-lysis-a-review-engine-where-ai-models-collaborate-and-evolve-41b6</guid>
      <description>&lt;p&gt;AI reviews have a memory problem.&lt;/p&gt;

&lt;p&gt;They can catch a bug, flag a weak plan, or point out a vague call to action. But in the next run, the system often starts from zero again. The same issue gets rediscovered instead of becoming part of a stronger review process.&lt;/p&gt;

&lt;p&gt;I built &lt;strong&gt;Lysis&lt;/strong&gt; to close that loop.&lt;/p&gt;

&lt;p&gt;Lysis is an open-source review engine for AI-generated work. It reviews not only code, but also plans, marketing copy, and strategy documents. The core idea is simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a different AI model should review the work&lt;/li&gt;
&lt;li&gt;repeated findings should become reusable checks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That gives you a review loop that does more than evaluate one output. It gets better over time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: AI Reviews Have Amnesia
&lt;/h2&gt;

&lt;p&gt;A lot of current AI review workflows are useful, but stateless.&lt;/p&gt;

&lt;p&gt;You generate something.&lt;br&gt;
You review it.&lt;br&gt;
You fix it.&lt;br&gt;
Then the next review starts fresh.&lt;/p&gt;

&lt;p&gt;That means the same class of issue can appear again and again:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SQL injection patterns in generated code&lt;/li&gt;
&lt;li&gt;missing rollback plans in implementation proposals&lt;/li&gt;
&lt;li&gt;vague calls to action in marketing copy&lt;/li&gt;
&lt;li&gt;strategy documents with no exit criteria&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A human reviewer usually develops pattern recognition. After seeing the same issue a few times, they begin to catch it earlier and more reliably.&lt;/p&gt;

&lt;p&gt;Most AI review workflows do not.&lt;/p&gt;

&lt;p&gt;I wanted a system where repeated review findings would accumulate and harden into the process itself.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Two Ideas Behind Lysis
&lt;/h2&gt;

&lt;p&gt;Lysis is built on two ideas: &lt;strong&gt;collaboration&lt;/strong&gt; and &lt;strong&gt;evolution&lt;/strong&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  1. Collaboration: Different Models, Different Blind Spots
&lt;/h2&gt;

&lt;p&gt;If the same model writes and reviews the work, it often misses the same thing twice.&lt;/p&gt;

&lt;p&gt;Lysis works better when one model creates and another reviews. In the current setup, a common pairing is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Creator:&lt;/strong&gt; Claude Code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reviewer:&lt;/strong&gt; Codex CLI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is not because one model is universally better than the other. It is because they have different strengths and different blind spots.&lt;/p&gt;

&lt;p&gt;A separate reviewer gives the work a more independent pass.&lt;/p&gt;

&lt;p&gt;This applies beyond code. The same idea is useful for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;architecture and implementation plans&lt;/li&gt;
&lt;li&gt;marketing copy&lt;/li&gt;
&lt;li&gt;business proposals&lt;/li&gt;
&lt;li&gt;strategy documents&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  2. Evolution: Every Finding Can Become a Reusable Check
&lt;/h2&gt;

&lt;p&gt;Cross-model review is useful by itself, but it still is not enough if every run forgets the last one.&lt;/p&gt;

&lt;p&gt;So Lysis keeps track of findings using fingerprints such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;security::sql_injection&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;planning::missing_rollback&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;marketing::vague_cta&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When the same pattern appears repeatedly, it can be promoted into a permanent check.&lt;/p&gt;

&lt;p&gt;The simplified flow looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Review 1: security::sql_injection  -&amp;gt; logged
Review 2: security::sql_injection  -&amp;gt; logged
Review 3: security::sql_injection  -&amp;gt; logged -&amp;gt; threshold reached
Review 4+: similar issue -&amp;gt; caught immediately by permanent check
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the part I care about most.&lt;/p&gt;

&lt;p&gt;I do not want review to be a one-shot judgment.&lt;br&gt;
I want review to become a system that learns from repeated mistakes.&lt;/p&gt;
&lt;h2&gt;
  
  
  What Lysis Reviews
&lt;/h2&gt;

&lt;p&gt;Lysis is not limited to code review.&lt;/p&gt;

&lt;p&gt;It currently supports review flows for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Code implementation&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Plans and architecture&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Marketing&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Strategy&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/lysis impl
/lysis planning
/lysis planning+marketing
/lysis planning+strategy
/lysis impl+ux src/app.tsx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The idea is that AI-generated work in any of these areas can benefit from a loop of:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;create&lt;/li&gt;
&lt;li&gt;review&lt;/li&gt;
&lt;li&gt;fix or escalate&lt;/li&gt;
&lt;li&gt;learn&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Architecture: Core + Adapter
&lt;/h2&gt;

&lt;p&gt;I wanted the system to be flexible enough to support different environments and different reviewer backends.&lt;/p&gt;

&lt;p&gt;So Lysis is split into two layers:&lt;/p&gt;

&lt;h3&gt;
  
  
  Core
&lt;/h3&gt;

&lt;p&gt;The core contains the review logic and review data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;configuration&lt;/li&gt;
&lt;li&gt;rubrics&lt;/li&gt;
&lt;li&gt;learning pipeline&lt;/li&gt;
&lt;li&gt;operational rules&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This layer is tool-agnostic.&lt;/p&gt;

&lt;h3&gt;
  
  
  Adapter
&lt;/h3&gt;

&lt;p&gt;The first shipping adapter is for &lt;strong&gt;Claude Code&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That adapter exposes Lysis as a slash command workflow. It wires the review engine into a CLI environment people can actually use today.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reviewer Contract
&lt;/h3&gt;

&lt;p&gt;The reviewer side is intentionally simple.&lt;/p&gt;

&lt;p&gt;Any CLI-based model can theoretically be plugged in if it can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;accept input&lt;/li&gt;
&lt;li&gt;run a review&lt;/li&gt;
&lt;li&gt;return a verdict&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Right now, the repo ships with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Codex CLI&lt;/strong&gt; for cross-model review&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;self-review fallback&lt;/strong&gt; when Codex is unavailable&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Baseline Results
&lt;/h2&gt;

&lt;p&gt;I wanted some directional evidence that the system was not just conceptually neat.&lt;/p&gt;

&lt;p&gt;So I ran two small benchmark sets.&lt;/p&gt;

&lt;h2&gt;
  
  
  OWASP Security Benchmark
&lt;/h2&gt;

&lt;p&gt;Lysis was tested against 5 OWASP-style vulnerability categories using 10 samples total: 5 vulnerable samples and 5 clean ones.&lt;/p&gt;

&lt;p&gt;Baseline result:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;5/5 categories detected&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;14 total findings&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The categories included:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SQL Injection&lt;/li&gt;
&lt;li&gt;XSS&lt;/li&gt;
&lt;li&gt;Broken Authentication&lt;/li&gt;
&lt;li&gt;Security Misconfiguration&lt;/li&gt;
&lt;li&gt;Sensitive Data Exposure&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Business Review Benchmark
&lt;/h2&gt;

&lt;p&gt;I also tested the system on non-code review targets: plans, marketing, and strategy documents.&lt;/p&gt;

&lt;p&gt;This benchmark covered 5 business-document quality categories using 10 samples total.&lt;/p&gt;

&lt;p&gt;Baseline result:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;5/5 categories detected&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;39 total findings&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The categories included:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;plan completeness&lt;/li&gt;
&lt;li&gt;exit criteria&lt;/li&gt;
&lt;li&gt;alternatives considered&lt;/li&gt;
&lt;li&gt;CTA clarity&lt;/li&gt;
&lt;li&gt;factual accuracy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are &lt;strong&gt;small-scale directional tests&lt;/strong&gt;, not comprehensive benchmark claims. But they were enough to show that the same review-and-learning pattern can work across both code and business documents.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Think Is Interesting About This
&lt;/h2&gt;

&lt;p&gt;There are a lot of AI tools that generate.&lt;br&gt;
There are many tools that review.&lt;/p&gt;

&lt;p&gt;What I think is still underexplored is the loop between them.&lt;/p&gt;

&lt;p&gt;The useful question is not only:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Did the model catch a problem this time?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It is also:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Does the review process become stronger after seeing the same problem repeatedly?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is where I think systems like this get interesting.&lt;/p&gt;

&lt;p&gt;Not because they replace judgment, but because they make repeated judgment more structured and reusable.&lt;/p&gt;
&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;Lysis is open source and available here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Blastrum/Lysis" rel="noopener noreferrer"&gt;https://github.com/Blastrum/Lysis&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Quick start:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/Blastrum/Lysis.git
&lt;span class="nb"&gt;cd &lt;/span&gt;Lysis
bash adapters/claude-code/install.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On Windows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;git&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;clone&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;https://github.com/Blastrum/Lysis.git&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Lysis&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;powershell&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-File&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;adapters\claude-code\install.ps1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If Codex CLI is available, you can enable cross-model review.&lt;br&gt;
If not, Lysis falls back to self-review with stricter checklist application.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The current roadmap includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;team-shared learning&lt;/li&gt;
&lt;li&gt;more reviewer backends&lt;/li&gt;
&lt;li&gt;CI/CD integration&lt;/li&gt;
&lt;li&gt;editor integrations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I am especially interested in how reusable review memory could work across teams rather than only within one local setup.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;I built Lysis because I wanted AI review to behave less like a one-off check and more like a process that accumulates judgment.&lt;/p&gt;

&lt;p&gt;If the same class of mistake keeps appearing, the review system should not have to rediscover it forever.&lt;/p&gt;

&lt;p&gt;It should learn.&lt;/p&gt;

&lt;p&gt;GitHub:&lt;br&gt;
&lt;a href="https://github.com/Blastrum/Lysis" rel="noopener noreferrer"&gt;https://github.com/Blastrum/Lysis&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Disclosure: this article was drafted with AI assistance and manually reviewed, edited, and fact-checked by the author before publication.
&lt;/h2&gt;




</description>
      <category>ai</category>
      <category>opensource</category>
      <category>codereview</category>
      <category>devtools</category>
    </item>
  </channel>
</rss>
