DEV Community

Cover image for I gave Claude the same code review twice. It missed a SQL injection the second time. So I encoded 12 engineering books into it.
hyhmrright
hyhmrright

Posted on

I gave Claude the same code review twice. It missed a SQL injection the second time. So I encoded 12 engineering books into it.

"The bearing of a child takes nine months, no matter how many women are assigned."
— Frederick Brooks, The Mythical Man-Month (1975)

Fifty years later, Brooks was still right. And so was the rest of my shelf.


I ran the same code review with Claude twice.

First time: caught the SQL injection, flagged separation of concerns. Solid review.
Second time: focused on naming conventions. Missed the injection entirely.

Same code. Same model. Completely different results.

That's not a Claude problem. That's a consistency problem. And it's fixable — if you give the model a framework to work from.

So I encoded 12 classic engineering books into a Claude Code skill. Here's what happened.

The Problem

Most code quality tools count lines and cyclomatic complexity. That's useful, but it misses the deeper problems: architectural drift, knowledge silos, domain model distortion — the issues that slow teams down for months before anyone notices.

Meanwhile, the software engineering classics have had answers to these problems for decades. Brooks, Fowler, Martin, McConnell, Evans, Ousterhout — twelve books, fifty years of hard-won wisdom. The insights haven't changed. We just stopped encoding them consistently.

What I Built

brooks-lint is a Claude Code skill (also works with Gemini CLI and Codex CLI) that diagnoses code against twelve decay risk dimensions synthesized from 12 classic engineering books, producing structured findings with book citations, severity labels, and concrete remedies every time.

The 12 Books

Book Author
The Mythical Man-Month Frederick Brooks
Code Complete Steve McConnell
Refactoring Martin Fowler
Clean Architecture Robert C. Martin
The Pragmatic Programmer Hunt & Thomas
Domain-Driven Design Eric Evans
A Philosophy of Software Design John Ousterhout
Software Engineering at Google Winters, Manshreck & Wright
xUnit Test Patterns Gerard Meszaros
The Art of Unit Testing Roy Osherove
How Google Tests Software Whittaker, Arbon & Carollo
Working Effectively with Legacy Code Michael Feathers

The Six Production Code Decay Risks

Risk Diagnostic Question
🧠 Cognitive Overload How much mental effort to understand this?
🔗 Change Propagation How many unrelated things break on one change?
📋 Knowledge Duplication Is the same decision expressed in multiple places?
🌀 Accidental Complexity Is the code more complex than the problem?
🏗️ Dependency Disorder Do dependencies flow in a consistent direction?
🗺️ Domain Model Distortion Does the code faithfully represent the domain?

Every finding follows the same chain: Symptom → Source (book + chapter) → Consequence → Remedy

The Six Test-Suite Decay Risks (New in v0.5)

brooks-lint now also audits your test suite against six test-space decay risks sourced from xUnit Test Patterns, The Art of Unit Testing, How Google Tests Software, and Working Effectively with Legacy Code:

Risk Diagnostic Question
🔍 Test Obscurity Can you understand what this test verifies at a glance?
🧱 Test Brittleness Does this test break when unrelated implementation details change?
📋 Test Duplication Are the same scenarios covered in multiple places?
🎭 Mock Abuse Are mocks hiding real design problems?
📊 Coverage Illusion Does high coverage give false confidence?
🏗️ Architecture Mismatch Do tests reflect the production architecture?

What It Looks Like

Given this code:

class UserService:
    def update_profile(self, user_id, name, email, avatar_url):
        user = self.db.query(f"SELECT * FROM users WHERE id = {user_id}")
        user['email'] = email
        if user['email'] != email:  # always False — silent bug
            self.smtp.send(...)
        points = user['login_count'] * 10 + 500
        self.db.execute(f"UPDATE loyalty SET points={points} WHERE user_id={user_id}")
Enter fullscreen mode Exit fullscreen mode

brooks-lint produces:

Health Score: 28/100

🔴 Change Propagation — Single Method Changes for Four Unrelated Business Reasons
Symptom: update_profile performs profile updates, email notifications, loyalty
         points recalculation, and cache invalidation all in one method body.
Source:  Fowler — Refactoring — Divergent Change
         Hunt & Thomas — The Pragmatic Programmer — Orthogonality
Consequence: Any change to the loyalty formula risks breaking email notifications.
Remedy: Extract NotificationService, LoyaltyService, and UserCacheInvalidator.

🔴 Domain Model Distortion — Silent Logic Bug: Email Notification Never Fires
Symptom: user['email'] = email overwrites the old value before the comparison —
         the condition is always False. The notification is dead code.
Source:  McConnell — Code Complete — Ch. 17: Unusual Control Structures
Consequence: Users are never notified when their email address changes.
Remedy: Capture old_email = user['email'] before any mutation.

(+ 6 more findings including SQL injection, dependency disorder, magic numbers)
Enter fullscreen mode Exit fullscreen mode

Architecture Audit with Dependency Graph (v0.6)

In Mode 2, brooks-lint generates a Mermaid dependency graph color-coded by severity — red = Critical, yellow = Warning, green = clean. It renders natively in GitHub, VS Code, and Notion.

Four Modes

Command Short Form Action
/brooks-lint:brooks-review /brooks-review PR-level code review
/brooks-lint:brooks-audit /brooks-audit Architecture audit with Mermaid dependency graph
/brooks-lint:brooks-debt /brooks-debt Tech debt assessment with prioritized roadmap
/brooks-lint:brooks-test /brooks-test Test suite health review

Benchmark Results

Tested across 3 real-world scenarios (PR review, architecture audit, tech debt):

Criterion brooks-lint Plain Claude
Structured findings ✅ 100% ❌ 0%
Book citations ✅ 100% ❌ 0%
Severity labels ✅ 100% ❌ 0%
Health Score (0–100) ✅ 100% ❌ 0%
Overall pass rate 94% 16%

The gap isn't what Claude can find — it's what it consistently finds, with traceable evidence and actionable remedies every time.

How It Compares

brooks-lint ESLint/Pylint GitHub Copilot Plain Claude
Structured diagnosis chain
Traces findings to classic books
Architecture-level insights ~ ~
Domain model analysis ~
Zero config, no plugins
Works with any language

brooks-lint doesn't replace your linter. It catches what linters can't.

Installation

Claude Code (Recommended)

/plugin marketplace add hyhmrright/brooks-lint
Enter fullscreen mode Exit fullscreen mode

Gemini CLI

/extensions install https://github.com/hyhmrright/brooks-lint
Enter fullscreen mode Exit fullscreen mode

Codex CLI

$brooks-review  # skills trigger automatically on code quality discussions
Enter fullscreen mode Exit fullscreen mode

Manual Install

cp commands/*.md ~/.claude/commands/
cp -r skills/ ~/.claude/skills/brooks-lint
Enter fullscreen mode Exit fullscreen mode

Configuration (v0.7)

Place a .brooks-lint.yaml in your project root to customize behavior:

version: 1
disable:
  - T5  # skip coverage metrics check
severity:
  R1: suggestion  # downgrade Cognitive Overload for this domain
ignore:
  - "**/*.generated.*"
  - "**/vendor/**"
Enter fullscreen mode Exit fullscreen mode

GitHub: https://github.com/hyhmrright/brooks-lint — MIT licensed, free to use.


AI can help you write code faster, but it can't tell you whether you're building a cathedral or a tar pit. brooks-lint bridges that gap.


If you've used AI for code reviews, I'm curious: what's your biggest frustration with consistency? Drop it in the comments — I'd love to hear what decay risks you're seeing most.

If this was useful, a ❤️ or unicorn helps others find it.

Top comments (0)