Shane Wilkey

Posted on Mar 13

How I Turned a Bespoke Code Reviewer Into a Skill Any Project Can Use

#python #beginners #crewai #webdev

The review layer in Code Genie works. The problem is it's hardwired — the criteria, the persona, the output format are all written specifically for that project. Every new project that wants the same quality gate has to rebuild it from scratch. That's not a system, that's a copy-paste habit. Here's how I extracted it into a reusable Skill.

The Problem With Bespoke Logic

Here's what the Code Genie reviewer looks like in its original form:

reviewer = Agent(
    role="Senior Code Reviewer",
    goal="""Review code critically and objectively. Identify bugs, edge cases,
    convention violations, and anything that would fail in production.
    Do not give the benefit of the doubt.""",
    backstory="""You are a senior engineer with high standards and low tolerance
    for sloppy code. You've seen too many production incidents caused by code
    that looked fine in review. You are thorough, specific, and direct.""",
    llm=llm,
    verbose=True
)

review_task = Task(
    description="""Review the following Python code for a crewAI agent workflow.
    Check for: correct use of crewAI Agent and Task APIs, proper error handling
    for LLM failures, retry logic that won't loop infinitely, and clear separation
    of agent responsibilities. Return PASS with brief justification, or FAIL with
    specific actionable feedback.""",
    expected_output="PASS or FAIL with specific feedback",
    agent=reviewer
)

Look at the task description. It's reviewing Python code for a crewAI agent workflow. It's checking for correct use of crewAI APIs. It knows the retry logic is there and checks it specifically. Every line of that task is written for Code Genie and Code Genie only.

Drop this into a Next.js project and it'll review TypeScript like it's looking for crewAI API misuse. Drop it into a data pipeline and it'll flag missing retry logic that was never supposed to be there. The reviewer works — inside the project it was written for.

That's the cost of bespoke logic. It's not wrong. It's just not portable.

What "Reusable" Actually Means

Before extracting anything, it helps to define what you're extracting toward. A reusable Skill needs to do three things:

Work without project-specific assumptions baked in. The Skill can't know in advance what language you're using, what framework you're on, or what your conventions look like. That information has to come from outside — from CLAUDE.md, from the task prompt, from context passed in at runtime.

Accept context from whatever project it's dropped into. The Skill should have explicit slots for project-specific information. Not hardcoded defaults — actual inputs it expects to receive and knows how to use.

Produce output in a consistent format. Whatever calls the Skill — a crewAI agent, a GitHub Actions workflow, a human reading the result — should be able to rely on the output looking the same every time. PASS or FAIL, specific feedback, actionable items. No surprises.

That's the design spec. If the extracted Skill satisfies all three, it's genuinely reusable. If it satisfies two out of three, it's still bespoke with better formatting.

Identifying What's Generic vs What's Project-Specific

The extraction exercise is simple in theory: go through the original reviewer and put everything into one of two buckets.

Generic — applies to any code review, regardless of project:

The reviewer persona (skeptical, specific, direct)
The output format (PASS or FAIL with actionable feedback)
The core review criteria (correctness, edge cases, readability)
The instruction not to give the benefit of the doubt

Project-specific — belongs in CLAUDE.md or the task prompt, not the Skill:

The language and framework being reviewed
The specific APIs being checked
The architectural patterns to enforce
The conventions that count as violations

Here's the same reviewer split into those two buckets:

# GENERIC — goes into the Skill
role="Senior Code Reviewer"
goal="Review code critically. Identify bugs, edge cases, and anything that would fail in production."
backstory="You are a senior engineer with high standards. You are thorough, specific, and direct."
output_format="PASS with brief justification, or FAIL with specific actionable feedback"

# PROJECT-SPECIFIC — goes into the task prompt, sourced from CLAUDE.md
language="Python"
framework="crewAI"
specific_checks=["crewAI Agent and Task API usage", "retry logic", "agent separation"]
conventions="[from CLAUDE.md]"

Once it's split this cleanly, the Skill writes itself.

Writing the Skill

Here's the full SKILL_code_review.md, built from the generic bucket:

---
name: code-review
description: Review code critically and return a structured PASS or FAIL with
specific, actionable feedback. Use this skill whenever an agent or workflow needs
a quality gate on generated or submitted code. Trigger when a task involves
reviewing, evaluating, or checking code before it proceeds to the next step.
---

# Code Review Skill

A reusable code review layer for agent workflows and manual review tasks.

## Reviewer Persona

You are a senior software engineer with high standards and low tolerance for
code that merely looks correct. You have seen too many production incidents
caused by plausible-looking code that was never properly scrutinized. You are
thorough, specific, and direct. You do not give the benefit of the doubt.

## What You Always Check

Regardless of language or framework, every review covers:

1. **Correctness** — Does the code do what it's supposed to do?
   Does it handle the happy path? Does it handle failure?
2. **Edge cases** — What happens at the boundaries? Empty input,
   null values, unexpected types, off-by-one conditions.
3. **Readability** — Can another developer understand this code
   without asking questions?
4. **Side effects** — Does this code do anything it shouldn't?
   Modify state it doesn't own? Make calls it wasn't asked to make?

## What You Check From Project Context

The following criteria come from the project's CLAUDE.md and task prompt.
Apply them in addition to the universal criteria above:

- Language and framework conventions
- Architectural patterns to enforce or avoid
- Specific APIs or libraries in use
- Anti-patterns explicitly called out in CLAUDE.md

## Output Format

Return exactly one of the following:

**PASS**
Brief justification (1-2 sentences). What makes this code acceptable.

**FAIL**
Specific feedback only. For each issue:
- What the problem is
- Where it is (function name, line reference, or code snippet)
- What the fix should be

Do not return vague feedback. "This could be improved" is not actionable.
"The retry loop on line 24 has no exit condition — add a max_retries check
before the recursive call" is actionable.

That's the complete Skill. Generic enough to work anywhere, structured enough to produce consistent output, with explicit slots for project context to plug into.

Plugging It Back Into Code Genie

With the Skill written, the original Code Genie reviewer slims down considerably. The agent definition drops to its essentials and delegates to the Skill. The project-specific criteria move into the task prompt where they belong:

reviewer = Agent(
    role="Senior Code Reviewer",
    goal="Apply the code-review Skill to evaluate the submitted code.",
    backstory="You follow the code-review Skill process precisely.",
    llm=llm,
    verbose=True
)

review_task = Task(
    description="""Using the code-review Skill, review the submitted Python code.

    Project context (from CLAUDE.md):
    - Language: Python 3.11
    - Framework: crewAI
    - Conventions: agents have single responsibilities, tasks return structured output
    - Anti-patterns: no raw exception swallowing, no infinite retry loops

    Apply both the universal criteria from the Skill and the project context above.
    Return PASS or FAIL with specific feedback.""",
    expected_output="PASS or FAIL with specific feedback per the code-review Skill format",
    agent=reviewer
)

The loop still works exactly as before. The output format is identical. But the review layer is now a dependency rather than a hardcoded implementation — and that distinction matters the next time you need a quality gate somewhere else.

Plugging It Into a Different Project

Here's the same Skill wired into a completely different context — a Next.js TypeScript project with no connection to crewAI:

review_task = Task(
    description="""Using the code-review Skill, review the submitted TypeScript code.

    Project context (from CLAUDE.md):
    - Language: TypeScript strict
    - Framework: Next.js 16, React 19
    - Conventions: named exports, Result pattern for error handling,
      no raw throws in business logic
    - Anti-patterns: no default exports, no useEffect for data fetching,
      do not mutate props

    Apply both the universal criteria from the Skill and the project context above.
    Return PASS or FAIL with specific feedback.""",
    expected_output="PASS or FAIL with specific feedback per the code-review Skill format",
    agent=reviewer
)

Same Skill. Different language, different framework, different conventions. The Skill adapts because the project-specific context is coming from the task prompt — sourced from CLAUDE.md — not baked into the Skill itself. The reviewer persona, the universal criteria, the output format are all unchanged.

That's portability in practice.

When to Extract vs When to Leave It Bespoke

Not everything should be a Skill. The extraction is worth the effort when all three of the following are true:

The logic is genuinely reusable. If you can't imagine using it in at least two different projects, it probably belongs where it is. Extraction for its own sake adds maintenance overhead without payoff.

The project-specific parts can be cleanly separated. If you spend more time trying to untangle what's generic from what's specific than you would just rewriting it for the next project, leave it bespoke. The extraction exercise should feel like sorting, not surgery.

You'll actually use it again. This one sounds obvious but it's easy to miss. Building a reusable Skill you never reach for is just premature abstraction with extra steps.

If all three are true, extract it. If one or two are true, extract the concept but not necessarily the implementation — document what worked and use it as a reference when the next similar problem comes up.

The code review Skill cleared all three bars easily. The reviewer persona and output format are the same everywhere. The project-specific criteria separated cleanly into the task prompt. And a quality gate on generated code is useful in every project that generates code.

What's Next

The code review Skill is now in the [ai-workflow-toolkit](https://github.com/southwestmogrown/ai-workflow-toolkit) repo alongside the CLAUDE.md generator and the README writer. Three skills, one repo, all portable.

The pattern behind all three is the same: build it bespoke first, extract what's generic, make it portable. Resist the urge to abstract before you've built something real — you can't know what's generic until you've seen the specific version work.

Next up: how I'm using these building blocks to wire up a full agent workflow on GitHub issues — from writing the issue to merging the PR, with Claude handling every step in between.

Tags: python, ai, crewai, webdev
Series: Building With AI Agents — Article 3 of 12

DEV Community