DEV Community

Sho_Ikeda
Sho_Ikeda

Posted on

Building Lysis: A Review Engine Where AI Models Collaborate and Evolve

AI reviews have a memory problem.

They can catch a bug, flag a weak plan, or point out a vague call to action. But in the next run, the system often starts from zero again. The same issue gets rediscovered instead of becoming part of a stronger review process.

I built Lysis to close that loop.

Lysis is an open-source review engine for AI-generated work. It reviews not only code, but also plans, marketing copy, and strategy documents. The core idea is simple:

  • a different AI model should review the work
  • repeated findings should become reusable checks

That gives you a review loop that does more than evaluate one output. It gets better over time.

The Problem: AI Reviews Have Amnesia

A lot of current AI review workflows are useful, but stateless.

You generate something.
You review it.
You fix it.
Then the next review starts fresh.

That means the same class of issue can appear again and again:

  • SQL injection patterns in generated code
  • missing rollback plans in implementation proposals
  • vague calls to action in marketing copy
  • strategy documents with no exit criteria

A human reviewer usually develops pattern recognition. After seeing the same issue a few times, they begin to catch it earlier and more reliably.

Most AI review workflows do not.

I wanted a system where repeated review findings would accumulate and harden into the process itself.

The Two Ideas Behind Lysis

Lysis is built on two ideas: collaboration and evolution.

1. Collaboration: Different Models, Different Blind Spots

If the same model writes and reviews the work, it often misses the same thing twice.

Lysis works better when one model creates and another reviews. In the current setup, a common pairing is:

  • Creator: Claude Code
  • Reviewer: Codex CLI

That is not because one model is universally better than the other. It is because they have different strengths and different blind spots.

A separate reviewer gives the work a more independent pass.

This applies beyond code. The same idea is useful for:

  • architecture and implementation plans
  • marketing copy
  • business proposals
  • strategy documents

2. Evolution: Every Finding Can Become a Reusable Check

Cross-model review is useful by itself, but it still is not enough if every run forgets the last one.

So Lysis keeps track of findings using fingerprints such as:

  • security::sql_injection
  • planning::missing_rollback
  • marketing::vague_cta

When the same pattern appears repeatedly, it can be promoted into a permanent check.

The simplified flow looks like this:

Review 1: security::sql_injection  -> logged
Review 2: security::sql_injection  -> logged
Review 3: security::sql_injection  -> logged -> threshold reached
Review 4+: similar issue -> caught immediately by permanent check
Enter fullscreen mode Exit fullscreen mode

That is the part I care about most.

I do not want review to be a one-shot judgment.
I want review to become a system that learns from repeated mistakes.

What Lysis Reviews

Lysis is not limited to code review.

It currently supports review flows for:

  • Code implementation
  • Plans and architecture
  • Marketing
  • Strategy

Example commands:

/lysis impl
/lysis planning
/lysis planning+marketing
/lysis planning+strategy
/lysis impl+ux src/app.tsx
Enter fullscreen mode Exit fullscreen mode

The idea is that AI-generated work in any of these areas can benefit from a loop of:

  1. create
  2. review
  3. fix or escalate
  4. learn

Architecture: Core + Adapter

I wanted the system to be flexible enough to support different environments and different reviewer backends.

So Lysis is split into two layers:

Core

The core contains the review logic and review data:

  • configuration
  • rubrics
  • learning pipeline
  • operational rules

This layer is tool-agnostic.

Adapter

The first shipping adapter is for Claude Code.

That adapter exposes Lysis as a slash command workflow. It wires the review engine into a CLI environment people can actually use today.

Reviewer Contract

The reviewer side is intentionally simple.

Any CLI-based model can theoretically be plugged in if it can:

  • accept input
  • run a review
  • return a verdict

Right now, the repo ships with:

  • Codex CLI for cross-model review
  • self-review fallback when Codex is unavailable

Baseline Results

I wanted some directional evidence that the system was not just conceptually neat.

So I ran two small benchmark sets.

OWASP Security Benchmark

Lysis was tested against 5 OWASP-style vulnerability categories using 10 samples total: 5 vulnerable samples and 5 clean ones.

Baseline result:

  • 5/5 categories detected
  • 14 total findings

The categories included:

  • SQL Injection
  • XSS
  • Broken Authentication
  • Security Misconfiguration
  • Sensitive Data Exposure

Business Review Benchmark

I also tested the system on non-code review targets: plans, marketing, and strategy documents.

This benchmark covered 5 business-document quality categories using 10 samples total.

Baseline result:

  • 5/5 categories detected
  • 39 total findings

The categories included:

  • plan completeness
  • exit criteria
  • alternatives considered
  • CTA clarity
  • factual accuracy

These are small-scale directional tests, not comprehensive benchmark claims. But they were enough to show that the same review-and-learning pattern can work across both code and business documents.

What I Think Is Interesting About This

There are a lot of AI tools that generate.
There are many tools that review.

What I think is still underexplored is the loop between them.

The useful question is not only:

"Did the model catch a problem this time?"

It is also:

"Does the review process become stronger after seeing the same problem repeatedly?"

That is where I think systems like this get interesting.

Not because they replace judgment, but because they make repeated judgment more structured and reusable.

Getting Started

Lysis is open source and available here:

https://github.com/Blastrum/Lysis

Quick start:

git clone https://github.com/Blastrum/Lysis.git
cd Lysis
bash adapters/claude-code/install.sh
Enter fullscreen mode Exit fullscreen mode

On Windows:

git clone https://github.com/Blastrum/Lysis.git
cd Lysis
powershell -File adapters\claude-code\install.ps1
Enter fullscreen mode Exit fullscreen mode

If Codex CLI is available, you can enable cross-model review.
If not, Lysis falls back to self-review with stricter checklist application.

What's Next

The current roadmap includes:

  • team-shared learning
  • more reviewer backends
  • CI/CD integration
  • editor integrations

I am especially interested in how reusable review memory could work across teams rather than only within one local setup.

Closing

I built Lysis because I wanted AI review to behave less like a one-off check and more like a process that accumulates judgment.

If the same class of mistake keeps appearing, the review system should not have to rediscover it forever.

It should learn.

GitHub:
https://github.com/Blastrum/Lysis

Disclosure: this article was drafted with AI assistance and manually reviewed, edited, and fact-checked by the author before publication.


Top comments (0)