Mudavath Srinivas

Posted on Apr 11

I Built an AI Code Reviewer That Uses Any LLM to Review Claude Code Output — Zero Dependencies, 7 Commands, Infinite Engines

#opensource #ai #codereview #claudecode

I Built an AI Code Reviewer That Uses Any LLM to Review Claude Code Output — Zero Dependencies, 7 Commands, Infinite Engines

TL;DR: I built cc-review — a pure bash Claude Code skill that spins up any external LLM (Gemini, Ollama, DeepSeek, OpenAI) to independently review Claude's own code output. No npm. No pip. Just 7 slash commands, a YAML config, and an uncomfortable truth about trusting AI to review itself. Repo here.

Here's the uncomfortable truth nobody talks about: Claude is reviewing Claude's code.

You vibe-code a multi-phase feature. You run /review. Claude reads its own output and says "looks good!" You ship. Two days later you're debugging a race condition that any second pair of eyes would have caught in 30 seconds.

You didn't get a code review. You got a mirror.

I hit this exact wall while building a multi-phase AI Second Brain project — an agentic system with memory modules, knowledge indexing, scheduled tasks, and a self-learning loop. Each phase produced hundreds of lines of generated code. I was using Claude Code for everything: architecting, implementing, reviewing. The confirmation bias was baked in.

I needed a reviewer with zero loyalty to the original author.

So I built cc-review: an open-source Claude Code skill that outsources your code review to any external LLM engine. Gemini reviews Claude's work. Ollama stays local and private. DeepSeek brings a different training distribution. The engine is pluggable. The bash is pure. The cost, with Gemini's free tier, is zero.

Here's exactly how it works and how you can add it to your own Claude Code setup in under 10 minutes.

What We're Building

cc-review is a Claude Code skill — a bash-powered plugin that extends Claude Code with new slash commands. When you trigger a review, it:

Grabs your recent git diff (staged + unstaged changes)
Routes the diff to an external LLM engine of your choice
Scores the code on four dimensions: Completeness, Correctness, Quality, Security
Returns structured feedback with line-level comments
Optionally runs an adversarial review mode that actively tries to break your assumptions

Your Code Changes (git diff)
        │
        ▼
┌───────────────────┐
│   cc-review skill │  ← Pure bash, reads engines.yaml
└────────┬──────────┘
         │
    ┌────▼─────────────────────────────────┐
    │         engines.yaml router          │
    └──┬──────────┬───────────┬────────────┘
       │          │           │
   ┌───▼───┐  ┌───▼───┐  ┌───▼───────┐
   │Gemini │  │Ollama │  │ DeepSeek  │  ← any LLM, zero code changes
   └───────┘  └───────┘  └───────────┘
       │          │           │
       └──────────┼───────────┘
                  │
         ┌────────▼────────┐
         │ Scored Report   │  Completeness / Correctness
         │ (4 dimensions)  │  Quality / Security
         └─────────────────┘

Seven commands ship out of the box:

Command	What It Does
`/review`	Standard review using your default engine
`/review-adversarial`	Skeptic mode — actively challenges your code
`/review-result`	Show full output of last review
`/review-status`	Check running/recent jobs
`/review-setup`	Verify engine auth and readiness
`/review-cancel`	Kill a running background job
`/review-rescue`	Delegate investigation/fix to external engine

Prerequisites

Claude Code installed and running (>= 0.2.x)
git available in your shell
At least one of: a Gemini API key (free tier), Ollama running locally, or an OpenAI/DeepSeek key
Basic comfort with YAML config files

Step-by-Step

1. Install the Skill

Clone cc-review into your Claude Code skills directory:

git clone https://github.com/mudavathsrinivas/cc-review ~/.claude/skills/cc-review

Claude Code auto-discovers skills in ~/.claude/skills/. No import step. No config file edit. The skill system reads the directory on startup.

Verify it loaded:

# Inside Claude Code
/review-setup

You'll see output like:

cc-review v1.0.0
─────────────────────────────────
Engine Check:
  gemini    ✓  (GEMINI_API_KEY set)
  ollama    ✓  (http://localhost:11434 reachable)
  deepseek  ✗  (DEEPSEEK_API_KEY not set)
  openai    ✗  (OPENAI_API_KEY not set)

Default engine: gemini
Ready to review.

2. Configure Your Engines

This is the part I'm most proud of. Every engine lives in a single YAML file — engines.yaml. Adding a new LLM requires zero code changes. You just describe it:

# ~/.claude/skills/cc-review/engines.yaml

default: gemini

engines:
  gemini:
    provider: google
    model: gemini-2.0-flash
    api_key_env: GEMINI_API_KEY
    endpoint: https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent
    max_tokens: 8192
    temperature: 0.3
    enabled: true

  ollama:
    provider: ollama
    model: llama3.2
    endpoint: http://localhost:11434/api/generate
    max_tokens: 4096
    temperature: 0.2
    enabled: true
    private: true   # flag: never send to external APIs

  deepseek:
    provider: deepseek
    model: deepseek-coder
    api_key_env: DEEPSEEK_API_KEY
    endpoint: https://api.deepseek.com/v1/chat/completions
    max_tokens: 8192
    temperature: 0.2
    enabled: false  # flip to true when key is set

  openai:
    provider: openai
    model: gpt-4o
    api_key_env: OPENAI_API_KEY
    endpoint: https://api.openai.com/v1/chat/completions
    max_tokens: 8192
    temperature: 0.2
    enabled: false

The router reads this at runtime. Set default to swap your primary reviewer. Set enabled: false to disable an engine without deleting its config. The private: true flag on Ollama is a guardrail — the skill will refuse to send that diff to any external endpoint even if you fat-finger the engine flag.

Want to add a brand new LLM? Add a YAML block. Done.

3. Run Your First Review

Make some changes in your project, then:

/review

The skill captures git diff HEAD (staged + unstaged), structures a review prompt, sends it to your default engine, and streams back a scored report. A real output looks like this:

cc-review | engine: gemini-2.0-flash | 2026-04-11 09:14 CST
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

SCORES
  Completeness  8/10  — Core logic is present; error branches for
                        empty API response are missing.
  Correctness   7/10  — Line 84: off-by-one in pagination cursor.
                        Will silently drop the last record.
  Quality       9/10  — Clean separation of concerns. Good.
  Security      6/10  — API key interpolated directly into log
                        string at line 112. Rotate and fix.

CRITICAL (fix before merge)
  [security] src/client.ts:112
  → `console.log(\`Auth: \${apiKey}\`)` logs the raw key.
    Replace with a masked version: apiKey.slice(0,4) + '****'

  [correctness] src/paginator.ts:84
  → Cursor offset is `page * limit` but should be
    `(page - 1) * limit` for 1-indexed pagination.
    Current code skips page 1 entirely.

SUGGESTIONS
  [completeness] src/client.ts:67
  → No handling for HTTP 429 (rate limit). Add exponential
    backoff or surface the error to caller.

  [quality] src/types.ts:23
  → ApiResponse<T> type is wide. Consider discriminated union
    for success/error states.

SUMMARY
  Solid implementation with two ship-blockers. The security
  issue is trivial to fix. The pagination bug would have caused
  silent data loss in production. Review cost: $0.00 (free tier).

That pagination bug? 100% something Claude wrote and Claude would have rubber-stamped. Gemini caught it because it has no attachment to the original decision.

4. Use Adversarial Mode for Critical Phases

Standard review finds bugs. Adversarial mode finds assumptions you didn't know you were making.

/review-adversarial

The prompt instructs the external engine to play the role of a skeptical senior engineer who actively looks for: race conditions, wrong abstractions, over-engineering, security footguns, and implicit dependencies that will break in production.

Real example output from my Second Brain project's memory indexer:

ADVERSARIAL REVIEW | engine: gemini-2.0-flash
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

CHALLENGED ASSUMPTIONS

  1. "Files are processed sequentially"
     → Your glob pattern returns files in filesystem order.
       On macOS this is usually alphabetical. On Linux ext4
       it's creation order. On network mounts, undefined.
       Your tests will pass locally and break in CI.

  2. "MEMORY.md is always writable"
     → No lock file, no atomic write. Two agents running
       concurrently will corrupt this file. You mentioned
       scheduled tasks — this WILL happen.

  3. "The embedding model is stable"
     → You hardcode 'text-embedding-3-small' but never pin
       the version. OpenAI has silently updated embeddings
       before. Your similarity scores will drift over time
       and you won't know why.

VERDICT
  Ship with fixes for #1 and #2. #3 is acceptable risk
  for a solo project but document the assumption explicitly.

None of those were in the standard review output. Adversarial mode thinks differently.

5. Keep Private Code Private with Ollama

If you're working on proprietary code and can't send diffs to Google or OpenAI, flip to Ollama:

/review --engine ollama

Or set default: ollama in engines.yaml for all reviews. Everything stays on your machine. The private: true config flag means the router will hard-fail rather than accidentally route to an external API.

# Verify your Ollama setup first
/review-setup --engine ollama

# Output:
# ollama: ✓ (llama3.2 loaded, 8B params, ~6GB RAM used)
# private mode: ON — external routing disabled for this engine

Quality is lower than frontier models, but for security-sensitive codebases or pure sanity checks, local Ollama at zero cost is a real option.

The Result

After integrating cc-review into my Second Brain project workflow, here's what changed:

Gemini free tier handles ~1,000 reviews/day at $0.00. For a solo developer shipping phases one at a time, this is effectively unlimited.
5 real bugs caught in 3 weeks that I confirmed would have reached production — 2 correctness issues, 2 security issues, 1 missing error handler.
Review latency: 8-14 seconds per phase diff using Gemini Flash. Fast enough to run after every significant change without breaking flow.
Adversarial mode changed how I think about code I generate. I now proactively consider race conditions and assumption brittleness because I've seen the engine surface them repeatedly.

The workflow became:

Implement phase with Claude → /review → fix blockers → /review-adversarial → fix assumptions → commit

Key Takeaway

An LLM cannot objectively review its own output. Not because the model is bad — because the training distribution, the context window, and the confirmation bias are all pointing the same direction. Independent review means a genuinely different model, with different training data, reading your code cold.

cc-review is a 20-minute setup that gives you that independence, at zero cost, with full control over which engine reviews which code and whether anything ever leaves your machine.

The irony is: the better your AI coding assistant gets, the more you need this. The faster Claude ships code, the faster bugs accumulate without a second opinion.

Star the repo if this is useful: github.com/mudavathsrinivas/cc-review

Pull requests welcome — especially new engine configs for engines.yaml. If you've got a working block for Mistral, Cohere, or any local model, open a PR and I'll merge it.

Follow me here on Dev.to — I'm documenting the full AI Second Brain build in public, including the infrastructure, the failures, and the moments where the AI confidently wrote something completely wrong.

Top comments (1)

NEXADiag Nexa • Apr 13

Nice work — zero dependencies approach is smart for adoption.

One thing I ran into building something similar: single-model
review has blind spots that are model-specific and consistent.
GPT-4 systematically misses architecture issues, Claude misses
certain security patterns.

Running multiple models and requiring agreement cuts false
positives dramatically. Did you experiment with comparing
outputs across models, or staying single-model by design?

Some comments may only be visible to logged-in visitors. Sign in to view all comments.