DontaRuffin

Posted on May 27

I got tired of paying $240/month for a code reviewer that ignored half our standards

#webdev #typescript #opensource #aitools

I got tired of paying $240/month for a code reviewer that ignored half our standards

At a 10-person team, CodeRabbit runs you around $240/month ($24/user/month billed annually). Greptile is around $300/month ($30/user). Cursor BugBot is around $400/month ($40/user).

For that price, you'd expect the tool to know your team's specific rules. It doesn't. It knows its rules — generic best practices that apply to every codebase equally, which means they apply to yours specifically not at all.

We had a no-any TypeScript rule that we'd explained to every new hire for two years. We had an auth check pattern every route had to follow. We had a list of things the previous team had learned the hard way.

None of that made it into the code reviewer. It flagged missing semicolons and suggested refactors nobody asked for.

The real problem isn't review quality. It's review relevance.

The tools in this space are competing on who catches the most bugs. That's the wrong race for most teams.

A team of 10 shipping production software doesn't need an AI to find every edge case in every PR. They need the AI to enforce their standards — the ones written down nowhere, living only in the heads of whoever's been there longest.

That problem got worse when we started shipping AI-generated code. Cursor and Claude Code are fast. The output is mostly correct. But it has consistent failure patterns: any types everywhere types get hard, unhandled promises, hallucinated imports from libraries the model half-remembered, happy-path-only error handling.

Generic review tools don't know to look for those things specifically. They treat AI-generated code the same as hand-written code and produce the same boilerplate feedback.

What I built

Solon AI reviews every PR against a playbook — a JSON file that describes your team's specific rules. Flat $29/month, no per-seat math.

Today I'm open-sourcing the playbook library: github.com/Solon-Dev/solon-playbooks

Four playbooks to start:

Next.js + TypeScript — 12 rules for App Router codebases. Covers the mistakes that don't show up in linters: useEffect for data fetching, unvalidated route handler input, client components where server components would work, raw <img> tags, missing HTTP status codes.

Security — 12 rules based on OWASP Top 10 for JavaScript/TypeScript. Hardcoded secrets, SQL injection, missing auth checks, IDOR vulnerabilities, localStorage for auth tokens, eval with user input. The things that cause breaches, not the things that cause code review comments.

Accessibility (WCAG 2.2) — 11 rules for Level AA compliance. Focus on the gaps automated tools miss: focus management, keyboard patterns, ARIA correctness, live regions for dynamic content. Automated tools like axe and Lighthouse miss a significant portion of real accessibility issues — semantic structure, keyboard patterns, and focus management require human judgment that automated tools don't have. A playbook encodes that judgment.

Vibe Coder — 12 rules built specifically for AI-generated code. This one took the longest to write because it required cataloging what Cursor and Claude Code get wrong most often: type escape hatches, floating promises, wrong third-party API signatures (the model knows the library but not the version you're running), unnecessary useEffect for derived state, empty catch blocks.

The Vibe Coder playbook is the one I'd start with if your team is shipping any meaningful volume of AI-generated code. The failure modes are consistent enough that they're worth encoding explicitly.

How the playbook format works

Each rule has an ID, a severity (blocking, warning, or info), a description, and a bad/good example:

{
  "id": "no-any-escape-hatches",
  "title": "No 'any' type escape hatches",
  "severity": "blocking",
  "description": "AI models default to 'any' when they're unsure about a type...",
  "examples": {
    "bad": "function transform(data: any): any { return data.map((item: any) => item.value); }",
    "good": "interface DataItem { value: string; } function transform(data: DataItem[]): string[] { return data.map(item => item.value); }"
  }
}

Solon reads the playbook, builds a review prompt from your rules, runs the diff through Claude Haiku, and posts the result as a PR comment. Blocking violations require a human decision before the PR merges.

You can stack multiple playbooks:

{
  "playbooks": ["security", "nextjs-typescript", "vibe-coder"],
  "severity": {
    "blocking": true,
    "warning": true,
    "info": false
  }
}

Why open source the playbooks

Two reasons.

First, these are more useful if the community improves them. The Vibe Coder playbook in particular is going to need updates as the models evolve. A fintech team is going to have security rules I haven't thought of. A team running a design system is going to have accessibility rules I missed.

Second, the playbooks are not the product. The enforcement is the product. A JSON file sitting in a GitHub repo doesn't help you — it helps you when something is reading it on every PR.

What's in the repo

solon-playbooks/
├── README.md
├── schema.json
├── nextjs-typescript/playbook.json
├── security/playbook.json
├── accessibility/playbook.json
└── vibe-coder/playbook.json

Full schema is included if you want to write your own or validate contributions. MIT license — use these in any tool.

Contributions welcome. If your team has a playbook that's earned its way into your process, open a PR.

github.com/Solon-Dev/solon-playbooks

If you want to see them enforced automatically on your PRs: solonreview.dev — free tier is 25 reviews/month, no card required.

Top comments (2)

Harjot Singh • May 31

The "ignored half our standards" part is the real complaint, not the $240 - an AI reviewer that doesn't enforce YOUR conventions is just generic linting with extra steps. The gap is almost always that these tools review against general best practices but have no durable, enforced memory of your team's specific rules (naming, architecture patterns, the three things your senior dev always flags). Generic review is cheap to provide and low-value; your-standards review is the hard, valuable version.

The fix that actually works is making the standards explicit and machine-checkable - a rules file the reviewer must apply, plus deterministic gates for the non-negotiables so they can't be "ignored" by a model having an off day. Verification you can't talk past beats a reviewer that's persuadable. That's the principle behind how I handle quality in Moonshift (a multi-agent pipeline that ships a prompt to a deployed SaaS) - encode the standards as gates, not suggestions, so they're enforced every run. Relatable rant. Did you end up rolling your own reviewer with your standards baked in, or just go back to human review? The "encode our rules as hard checks" path is usually where this frustration resolves.

DontaRuffin • Jun 8

This is exactly the framing, gates not suggestions. The playbook format in Solon is built on that principle, blocking violations require a human override before merge. Curious how you're handling the standards definition problem at Moonshift - are the gates hand-coded or is there a config layer?