Building Lysis: A Review Engine Where AI Models Collaborate and Evolve

Sho_Ikeda — Sat, 04 Apr 2026 13:04:42 +0000

AI reviews have a memory problem.

They can catch a bug, flag a weak plan, or point out a vague call to action. But in the next run, the system often starts from zero again. The same issue gets rediscovered instead of becoming part of a stronger review process.

I built Lysis to close that loop.

Lysis is an open-source review engine for AI-generated work. It reviews not only code, but also plans, marketing copy, and strategy documents. The core idea is simple:

a different AI model should review the work
repeated findings should become reusable checks

That gives you a review loop that does more than evaluate one output. It gets better over time.

The Problem: AI Reviews Have Amnesia

A lot of current AI review workflows are useful, but stateless.

You generate something.
You review it.
You fix it.
Then the next review starts fresh.

That means the same class of issue can appear again and again:

SQL injection patterns in generated code
missing rollback plans in implementation proposals
vague calls to action in marketing copy
strategy documents with no exit criteria

A human reviewer usually develops pattern recognition. After seeing the same issue a few times, they begin to catch it earlier and more reliably.

Most AI review workflows do not.

I wanted a system where repeated review findings would accumulate and harden into the process itself.

The Two Ideas Behind Lysis

Lysis is built on two ideas: collaboration and evolution.

1. Collaboration: Different Models, Different Blind Spots

If the same model writes and reviews the work, it often misses the same thing twice.

Lysis works better when one model creates and another reviews. In the current setup, a common pairing is:

Creator: Claude Code
Reviewer: Codex CLI

That is not because one model is universally better than the other. It is because they have different strengths and different blind spots.

A separate reviewer gives the work a more independent pass.

This applies beyond code. The same idea is useful for:

architecture and implementation plans
marketing copy
business proposals
strategy documents

2. Evolution: Every Finding Can Become a Reusable Check

Cross-model review is useful by itself, but it still is not enough if every run forgets the last one.

So Lysis keeps track of findings using fingerprints such as:

security::sql_injection
planning::missing_rollback
marketing::vague_cta

When the same pattern appears repeatedly, it can be promoted into a permanent check.

The simplified flow looks like this:

Review 1: security::sql_injection  -> logged
Review 2: security::sql_injection  -> logged
Review 3: security::sql_injection  -> logged -> threshold reached
Review 4+: similar issue -> caught immediately by permanent check

That is the part I care about most.

I do not want review to be a one-shot judgment.
I want review to become a system that learns from repeated mistakes.

What Lysis Reviews

Lysis is not limited to code review.

It currently supports review flows for:

Code implementation
Plans and architecture
Marketing
Strategy

Example commands:

/lysis impl
/lysis planning
/lysis planning+marketing
/lysis planning+strategy
/lysis impl+ux src/app.tsx

The idea is that AI-generated work in any of these areas can benefit from a loop of:

create
review
fix or escalate
learn

Architecture: Core + Adapter

I wanted the system to be flexible enough to support different environments and different reviewer backends.

So Lysis is split into two layers:

Core

The core contains the review logic and review data:

configuration
rubrics
learning pipeline
operational rules

This layer is tool-agnostic.

Adapter

The first shipping adapter is for Claude Code.

That adapter exposes Lysis as a slash command workflow. It wires the review engine into a CLI environment people can actually use today.

Reviewer Contract

The reviewer side is intentionally simple.

Any CLI-based model can theoretically be plugged in if it can:

accept input
run a review
return a verdict

Right now, the repo ships with:

Codex CLI for cross-model review
self-review fallback when Codex is unavailable

Baseline Results

I wanted some directional evidence that the system was not just conceptually neat.

So I ran two small benchmark sets.

OWASP Security Benchmark

Lysis was tested against 5 OWASP-style vulnerability categories using 10 samples total: 5 vulnerable samples and 5 clean ones.

Baseline result:

5/5 categories detected
14 total findings

The categories included:

SQL Injection
XSS
Broken Authentication
Security Misconfiguration
Sensitive Data Exposure

Business Review Benchmark

I also tested the system on non-code review targets: plans, marketing, and strategy documents.

This benchmark covered 5 business-document quality categories using 10 samples total.

Baseline result:

5/5 categories detected
39 total findings

The categories included:

plan completeness
exit criteria
alternatives considered
CTA clarity
factual accuracy

These are small-scale directional tests, not comprehensive benchmark claims. But they were enough to show that the same review-and-learning pattern can work across both code and business documents.

What I Think Is Interesting About This

There are a lot of AI tools that generate.
There are many tools that review.

What I think is still underexplored is the loop between them.

The useful question is not only:

"Did the model catch a problem this time?"

It is also:

"Does the review process become stronger after seeing the same problem repeatedly?"

That is where I think systems like this get interesting.

Not because they replace judgment, but because they make repeated judgment more structured and reusable.

Getting Started

Lysis is open source and available here:

https://github.com/Blastrum/Lysis

Quick start:

git clone https://github.com/Blastrum/Lysis.git
cd Lysis
bash adapters/claude-code/install.sh

On Windows:

git clone https://github.com/Blastrum/Lysis.git
cd Lysis
powershell -File adapters\claude-code\install.ps1

If Codex CLI is available, you can enable cross-model review.
If not, Lysis falls back to self-review with stricter checklist application.

What's Next

The current roadmap includes:

team-shared learning
more reviewer backends
CI/CD integration
editor integrations

I am especially interested in how reusable review memory could work across teams rather than only within one local setup.

Closing

I built Lysis because I wanted AI review to behave less like a one-off check and more like a process that accumulates judgment.

If the same class of mistake keeps appearing, the review system should not have to rediscover it forever.

It should learn.

GitHub:
https://github.com/Blastrum/Lysis

DEV Community: Sho_Ikeda