brian austin

Posted on May 2

I built an automated code reviewer using Claude. It explains every decision. #openclawchallenge

#openclawchallenge #discuss #ai #codereview

OpenClaw Challenge Submission 🦞

I built an automated code reviewer using Claude. It explains every decision. #openclawchallenge

Most AI code reviewers just flag problems. Mine explains why something is a problem — and shows its reasoning before giving the verdict.

This is my entry for the #openclawchallenge — building AI that's transparent about what it says and why.

The problem with opaque AI reviewers

I was using a popular AI code review tool and it flagged a function as "inefficient." No explanation. No alternative. Just: inefficient.

I pushed back. Asked why. Got: "Consider optimizing this code."

Useless. I need to understand the reasoning to trust the verdict.

So I built my own — one that shows its work.

The system prompt that makes Claude think out loud

import anthropic
import re

client = anthropic.Anthropic(api_key="YOUR_API_KEY")  # simplylouie.com/developers — $2/month

SYSTEM_PROMPT = """
You are a senior code reviewer. For every issue you identify, you MUST:

1. Show your reasoning inside <think> tags BEFORE giving your verdict
2. In your reasoning, ask yourself:
   - Is this a real problem or a style preference?
   - What's the actual impact (performance, security, maintainability)?
   - Would a senior engineer at Google flag this in a PR review?
   - What's the simplest fix that preserves intent?
3. After reasoning, give a structured verdict:
   - SEVERITY: critical | warning | suggestion
   - ISSUE: one sentence
   - WHY: one sentence explaining the impact
   - FIX: the corrected code or specific change

If you find no real issues, say so explicitly. Do not invent problems to seem thorough.
"""

def review_code(code: str, language: str = "python") -> dict:
    """Review code with full reasoning transparency."""

    prompt = f"""Review this {language} code:

{language}
{code}


Show your reasoning for each potential issue before giving your verdict."""

    response = client.messages.create(
        model="claude-opus-4-5",
        max_tokens=2000,
        system=SYSTEM_PROMPT,
        messages=[{"role": "user", "content": prompt}]
    )

    raw = response.content[0].text

    # Extract reasoning blocks
    think_blocks = re.findall(r'<think>(.*?)</think>', raw, re.DOTALL)
    verdict = re.sub(r'<think>.*?</think>', '', raw, flags=re.DOTALL).strip()

    return {
        "reasoning": [t.strip() for t in think_blocks],
        "verdict": verdict,
        "raw": raw
    }


def format_review(result: dict) -> str:
    """Pretty-print the review with reasoning visible."""
    output = []

    if result["reasoning"]:
        output.append("## 🧠 Claude's Reasoning Process")
        for i, thought in enumerate(result["reasoning"], 1):
            output.append(f"\n### Thought {i}:")
            output.append(thought)

    output.append("\n## 📋 Review Verdict")
    output.append(result["verdict"])

    return "\n".join(output)

python

Real test: reviewing a common Python mistake

I fed it this code:

def get_users(db, active=True):
    users = []
    for user in db.query("SELECT * FROM users"):
        if user.active == active:
            users.append(user)
    return users

Here's what Claude's reasoning looked like:

<think>
Let me think about this carefully. The function fetches ALL users from the database 
then filters in Python. 

Is this a real problem? Yes — at small scale it's fine, but:
- At 1,000 users: minor overhead
- At 100,000 users: loading entire table into memory is a real performance issue
- At 1,000,000 users: this will likely cause OOM errors or extreme latency

Would a senior Google engineer flag this in a PR review? Yes, immediately — this 
is a classic N+1 adjacent pattern. The fix is trivial: push the filter to SQL.

Is the fix simple and preserves intent? Yes:
SELECT * FROM users WHERE active = ? 
with the active parameter passed in.

Severity: warning (not critical because it works at small scale, but this is 
exactly how systems fail under load).
</think>

Then the verdict:

SEVERITY: warning
ISSUE: Full table scan with Python-side filtering
WHY: Loads all users into memory before filtering — fails under load at 10k+ rows
FIX:

def get_users(db, active=True):
    return db.query("SELECT * FROM users WHERE active = ?", (active,)).fetchall()

Why transparency matters for code review

When Claude shows its reasoning, I can:

Disagree with the reasoning — if the thought process is wrong, I know why the verdict is wrong
Learn from the reasoning — seeing how to evaluate code quality is more valuable than the verdict itself
Trust the verdict — when the reasoning is solid, I can ship faster

Opaque AI review = either trust blindly or reject blindly. Neither is good engineering.

The cost angle

I'm running this on a flat-rate Claude API for $2/month (not usage-based). For a solo developer doing code review on 10-20 PRs a week, usage-based APIs add up fast.

Tool	Cost	Reasoning shown?
Claude.ai Pro	$20/month	No
Claude API (pay-per-token)	$5-40/month	Yes, if you build it
SimplyLouie API	$2/month	Yes — flat rate
GitHub Copilot	$10/month	No

Flat-rate means I run code reviews without watching token counts.

Run it yourself

# Install
pip install anthropic

# Set your key (get one at simplylouie.com/developers)
export ANTHROPIC_API_KEY="your-key"

# Run
python reviewer.py

Full code on the gist linked below.

The #openclawchallenge connection

The challenge asks: what should AIs say — and what shouldn't they say — about the code they review?

My answer: they should show their reasoning, not hide it. An AI that flags problems without explaining why is making editorial judgments without accountability. The <think> blocks make the AI's assumptions visible and challengeable.

That's the model I want reviewing my code: one that argues with itself before telling me I'm wrong.

What's the most useful thing an AI code reviewer could tell you that current tools don't? Drop it in the comments — genuinely curious what's missing from existing tools.

API access: simplylouie.com/developers — $2/month flat rate, no per-token billing

DEV Community

I built an automated code reviewer using Claude. It explains every decision. #openclawchallenge

I built an automated code reviewer using Claude. It explains every decision. #openclawchallenge

The problem with opaque AI reviewers

The system prompt that makes Claude think out loud

Real test: reviewing a common Python mistake

Why transparency matters for code review

The cost angle

Run it yourself

The #openclawchallenge connection

Top comments (0)