Adam Schulte

Posted on Apr 12

Building a Five-Layer Quality Gate for Agent-Written Code

#agents #ai #automation #codequality

When velocity is the priority, quality needs a system.

I'm a solo developer building Grimoire, a tabletop RPG session management platform. I've been leaning heavily into AI coding agents — Claude Code sessions running on a VPS, working in parallel across git worktrees, each with their own branch and PR. My goal is to move fast, and that means I'm spending more time describing what I want and reviewing outputs than reading every line of code.

That trade-off works surprisingly well for shipping features. But it creates a real risk: code that looks correct on the surface but breaks in ways you'd only catch with a careful, adversarial read. The kind of read that's hard to do consistently when you're moving fast.

So I built a gate. This is the story of how I assembled a five-layer automated quality gate for agent-written code, why each layer exists, and what I learned about the gap between "looks correct" and "won't break in production."

The Problem: Self-Review is Broken

When an AI agent writes code and then reviews it, it shares the same mental model, assumptions, and blind spots as the author. It produced the code — of course it thinks it looks correct. This is the self-review monoculture, and it produces rubber-stamp approvals on code that a fresh reviewer would flag immediately.

I experienced this firsthand. An agent wrote a placement function for NPCs on a combat grid. It wrote tests. It reviewed its own code. Everything passed. Then CodeRabbit (an external AI reviewer) looked at the PR and found five issues the agent missed:

Grid coordinates accepted NaN, Infinity, and negative numbers
A shared helper silently swallowed database errors
Concurrent requests could create duplicate combatants
Error messages told users what went wrong but not what to do about it
Tests asserted error messages but not HTTP status codes

None of these were caught by the agent's own review. The code was structurally clean, well-tested, and completely correct — for the happy path. The adversarial cases were invisible because the reviewer shared the author's assumptions.

The Five Layers

I now run five automated review steps before any code leaves the local branch. Each one asks a fundamentally different question.

Layer 1: Tests

Question: "Does this code do what it claims?"

The test suite runs first — 1,800+ unit tests and 179 property tests. This is the floor, not the ceiling. Tests only catch what they assert, and agent-written tests tend to test the happy path thoroughly while missing edge cases.

Recent improvement: I upgraded to Vitest 4.1's --changed flag, which uses the module dependency graph to run only tests affected by the agent's changes. This cut iteration feedback from 137 test files to as few as 1, saving significant time and context tokens during the edit-test loop. The full suite still runs as a safety net before push.

Layer 2: Structural Review (/review)

Question: "Is this clean and reusable?"

The simplify skill launches three parallel review agents: a code reuse reviewer (checks for duplicated logic), a code quality reviewer (structural and readability issues), and an efficiency reviewer (performance concerns).

This catches the "correct but messy" category: functions that reinvent existing helpers, unnecessary complexity, code that works but could be half the length.

Layer 3: Test Coverage Review (/review-tests)

Question: "Are there tests for this?"

Verifies that new code has corresponding tests and that the tests cover the actual behavior, not just implementation details. This layer exists because agents sometimes write code and forget to test the new paths, or write tests that pass trivially.

Layer 4: Security Review (/security-review)

Question: "Is this safe?"

A Grimoire-specific security review across six categories: authentication and authorization, input validation, SQL injection and query safety, Row Level Security compliance, XSS and output encoding, and secrets management. This is tailored to the stack (Supabase + Express + Vue) rather than being generic OWASP boilerplate.

Layer 5: Adversarial Review (/adversarial-review)

Question: "How does this break?"

This is the newest layer and the one that addresses the self-review monoculture directly. Three hostile personas each review the code and each MUST find at least one issue:

The Saboteur asks: "What's the worst input I could send? What if this runs twice? Concurrently? What if the external call fails?"
The New Hire asks: "Can I understand this code in six months with zero context from the author? What implicit knowledge is baked in?"
The Security Auditor asks: "Where are the trust boundaries? What can an authenticated user escalate to?"

The mandatory-findings rule is critical. It eliminates the "LGTM" escape hatch that makes self-review useless. If a persona finds nothing wrong, it hasn't looked hard enough — the instruction is to go back and look again.

Issues caught by multiple personas get promoted one severity level. The output is a structured verdict: BLOCK (critical findings, don't merge), CONCERNS (warnings, merge at your own risk), or CLEAN (only notes, safe to merge).

The External Safety Net: CodeRabbit

After all five local layers pass and the code is pushed, CodeRabbit ($30/mo) runs an independent review on the PR. This is the "different model" layer — it uses its own model with different training, different blind spots, and different priorities.

CodeRabbit's value isn't catching things the local reviews should have caught. It's catching things a fundamentally different perspective notices. In practice, after the adversarial review was added, CodeRabbit findings dropped significantly — most remaining flags are minor style suggestions.

The feedback loop is tight: CodeRabbit flags issues → I dispatch an agent with the exact fix descriptions CodeRabbit wrote → agent fixes → CodeRabbit re-reviews. My involvement is one sentence: "dispatch an agent."

What I Learned

Structural reviews and adversarial reviews are different questions. My first four layers ask "is this correct?" The adversarial layer asks "how does this fail?" Those are fundamentally different cognitive modes, and an agent can't do both in the same pass. Separating them into distinct steps with distinct instructions produces better results than asking one reviewer to do everything.

Mandatory findings beat optional thoroughness. The single most effective design decision was "each persona MUST find at least one issue." Without this constraint, every layer tends toward approval. With it, the reviewer is forced to think adversarially even when the code looks clean.

The cost is time, not money. Five review steps add 3-5 minutes to every push. For a solo developer moving fast, that's the cost of maintaining quality at velocity. The alternative — finding bugs in the playtest with real users — is much more expensive.

Measure the pipeline, not just the code. I parse the JSONL session logs from every agent session to track tool usage patterns. This tells me whether agents are using the right tools, whether the review skills are catching real issues, and whether the infrastructure investments are paying off. The metrics aren't about counting tool calls — they're about telling stories: "agents were reading a 3,000-line file because they didn't know a one-call tool existed. We fixed that."

The Full Chain

Here's what /finish runs today, in order:

1. pnpm --filter server test          # Full test suite
2. /review                             # Structural quality (3 parallel agents)
3. /review-tests                       # Test coverage verification
4. /security-review                    # Stack-specific security audit
5. /adversarial-review                 # Three hostile personas, mandatory findings
6. git push                            # Only if no BLOCK verdict
7. gh pr create                        # Open PR with review results in body
→ CodeRabbit reviews on PR              # External safety net (different model)

Every step is automated. Every step runs without human intervention. The only human decision is whether to merge the PR after CodeRabbit's final pass — and increasingly, that decision is straightforward because the local reviews have already caught the substantive issues.

This isn't the final form. The adversarial review is new and unproven. The metrics pipeline is just starting to produce data. But the architecture — multiple independent review perspectives, each asking a different question, with mandatory findings and structured verdicts — feels right for a world where the code author and the code reviewer are both AI.

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.