I Built an AI Code Reviewer That Uses Any LLM to Review Claude Code Output — Zero Dependencies, 7 Commands, Infinite Engines
TL;DR: I built
cc-review— a pure bash Claude Code skill that spins up any external LLM (Gemini, Ollama, DeepSeek, OpenAI) to independently review Claude's own code output. No npm. No pip. Just 7 slash commands, a YAML config, and an uncomfortable truth about trusting AI to review itself. Repo here.
Here's the uncomfortable truth nobody talks about: Claude is reviewing Claude's code.
You vibe-code a multi-phase feature. You run /review. Claude reads its own output and says "looks good!" You ship. Two days later you're debugging a race condition that any second pair of eyes would have caught in 30 seconds.
You didn't get a code review. You got a mirror.
I hit this exact wall while building a multi-phase AI Second Brain project — an agentic system with memory modules, knowledge indexing, scheduled tasks, and a self-learning loop. Each phase produced hundreds of lines of generated code. I was using Claude Code for everything: architecting, implementing, reviewing. The confirmation bias was baked in.
I needed a reviewer with zero loyalty to the original author.
So I built cc-review: an open-source Claude Code skill that outsources your code review to any external LLM engine. Gemini reviews Claude's work. Ollama stays local and private. DeepSeek brings a different training distribution. The engine is pluggable. The bash is pure. The cost, with Gemini's free tier, is zero.
Here's exactly how it works and how you can add it to your own Claude Code setup in under 10 minutes.
What We're Building
cc-review is a Claude Code skill — a bash-powered plugin that extends Claude Code with new slash commands. When you trigger a review, it:
- Grabs your recent
git diff(staged + unstaged changes) - Routes the diff to an external LLM engine of your choice
- Scores the code on four dimensions: Completeness, Correctness, Quality, Security
- Returns structured feedback with line-level comments
- Optionally runs an adversarial review mode that actively tries to break your assumptions
Your Code Changes (git diff)
│
▼
┌───────────────────┐
│ cc-review skill │ ← Pure bash, reads engines.yaml
└────────┬──────────┘
│
┌────▼─────────────────────────────────┐
│ engines.yaml router │
└──┬──────────┬───────────┬────────────┘
│ │ │
┌───▼───┐ ┌───▼───┐ ┌───▼───────┐
│Gemini │ │Ollama │ │ DeepSeek │ ← any LLM, zero code changes
└───────┘ └───────┘ └───────────┘
│ │ │
└──────────┼───────────┘
│
┌────────▼────────┐
│ Scored Report │ Completeness / Correctness
│ (4 dimensions) │ Quality / Security
└─────────────────┘
Seven commands ship out of the box:
| Command | What It Does |
|---|---|
/review |
Standard review using your default engine |
/review-adversarial |
Skeptic mode — actively challenges your code |
/review-result |
Show full output of last review |
/review-status |
Check running/recent jobs |
/review-setup |
Verify engine auth and readiness |
/review-cancel |
Kill a running background job |
/review-rescue |
Delegate investigation/fix to external engine |
Prerequisites
- Claude Code installed and running (>= 0.2.x)
-
gitavailable in your shell - At least one of: a Gemini API key (free tier), Ollama running locally, or an OpenAI/DeepSeek key
- Basic comfort with YAML config files
Step-by-Step
1. Install the Skill
Clone cc-review into your Claude Code skills directory:
git clone https://github.com/mudavathsrinivas/cc-review ~/.claude/skills/cc-review
Claude Code auto-discovers skills in ~/.claude/skills/. No import step. No config file edit. The skill system reads the directory on startup.
Verify it loaded:
# Inside Claude Code
/review-setup
You'll see output like:
cc-review v1.0.0
─────────────────────────────────
Engine Check:
gemini ✓ (GEMINI_API_KEY set)
ollama ✓ (http://localhost:11434 reachable)
deepseek ✗ (DEEPSEEK_API_KEY not set)
openai ✗ (OPENAI_API_KEY not set)
Default engine: gemini
Ready to review.
2. Configure Your Engines
This is the part I'm most proud of. Every engine lives in a single YAML file — engines.yaml. Adding a new LLM requires zero code changes. You just describe it:
# ~/.claude/skills/cc-review/engines.yaml
default: gemini
engines:
gemini:
provider: google
model: gemini-2.0-flash
api_key_env: GEMINI_API_KEY
endpoint: https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent
max_tokens: 8192
temperature: 0.3
enabled: true
ollama:
provider: ollama
model: llama3.2
endpoint: http://localhost:11434/api/generate
max_tokens: 4096
temperature: 0.2
enabled: true
private: true # flag: never send to external APIs
deepseek:
provider: deepseek
model: deepseek-coder
api_key_env: DEEPSEEK_API_KEY
endpoint: https://api.deepseek.com/v1/chat/completions
max_tokens: 8192
temperature: 0.2
enabled: false # flip to true when key is set
openai:
provider: openai
model: gpt-4o
api_key_env: OPENAI_API_KEY
endpoint: https://api.openai.com/v1/chat/completions
max_tokens: 8192
temperature: 0.2
enabled: false
The router reads this at runtime. Set default to swap your primary reviewer. Set enabled: false to disable an engine without deleting its config. The private: true flag on Ollama is a guardrail — the skill will refuse to send that diff to any external endpoint even if you fat-finger the engine flag.
Want to add a brand new LLM? Add a YAML block. Done.
3. Run Your First Review
Make some changes in your project, then:
/review
The skill captures git diff HEAD (staged + unstaged), structures a review prompt, sends it to your default engine, and streams back a scored report. A real output looks like this:
cc-review | engine: gemini-2.0-flash | 2026-04-11 09:14 CST
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
SCORES
Completeness 8/10 — Core logic is present; error branches for
empty API response are missing.
Correctness 7/10 — Line 84: off-by-one in pagination cursor.
Will silently drop the last record.
Quality 9/10 — Clean separation of concerns. Good.
Security 6/10 — API key interpolated directly into log
string at line 112. Rotate and fix.
CRITICAL (fix before merge)
[security] src/client.ts:112
→ `console.log(\`Auth: \${apiKey}\`)` logs the raw key.
Replace with a masked version: apiKey.slice(0,4) + '****'
[correctness] src/paginator.ts:84
→ Cursor offset is `page * limit` but should be
`(page - 1) * limit` for 1-indexed pagination.
Current code skips page 1 entirely.
SUGGESTIONS
[completeness] src/client.ts:67
→ No handling for HTTP 429 (rate limit). Add exponential
backoff or surface the error to caller.
[quality] src/types.ts:23
→ ApiResponse<T> type is wide. Consider discriminated union
for success/error states.
SUMMARY
Solid implementation with two ship-blockers. The security
issue is trivial to fix. The pagination bug would have caused
silent data loss in production. Review cost: $0.00 (free tier).
That pagination bug? 100% something Claude wrote and Claude would have rubber-stamped. Gemini caught it because it has no attachment to the original decision.
4. Use Adversarial Mode for Critical Phases
Standard review finds bugs. Adversarial mode finds assumptions you didn't know you were making.
/review-adversarial
The prompt instructs the external engine to play the role of a skeptical senior engineer who actively looks for: race conditions, wrong abstractions, over-engineering, security footguns, and implicit dependencies that will break in production.
Real example output from my Second Brain project's memory indexer:
ADVERSARIAL REVIEW | engine: gemini-2.0-flash
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
CHALLENGED ASSUMPTIONS
1. "Files are processed sequentially"
→ Your glob pattern returns files in filesystem order.
On macOS this is usually alphabetical. On Linux ext4
it's creation order. On network mounts, undefined.
Your tests will pass locally and break in CI.
2. "MEMORY.md is always writable"
→ No lock file, no atomic write. Two agents running
concurrently will corrupt this file. You mentioned
scheduled tasks — this WILL happen.
3. "The embedding model is stable"
→ You hardcode 'text-embedding-3-small' but never pin
the version. OpenAI has silently updated embeddings
before. Your similarity scores will drift over time
and you won't know why.
VERDICT
Ship with fixes for #1 and #2. #3 is acceptable risk
for a solo project but document the assumption explicitly.
None of those were in the standard review output. Adversarial mode thinks differently.
5. Keep Private Code Private with Ollama
If you're working on proprietary code and can't send diffs to Google or OpenAI, flip to Ollama:
/review --engine ollama
Or set default: ollama in engines.yaml for all reviews. Everything stays on your machine. The private: true config flag means the router will hard-fail rather than accidentally route to an external API.
# Verify your Ollama setup first
/review-setup --engine ollama
# Output:
# ollama: ✓ (llama3.2 loaded, 8B params, ~6GB RAM used)
# private mode: ON — external routing disabled for this engine
Quality is lower than frontier models, but for security-sensitive codebases or pure sanity checks, local Ollama at zero cost is a real option.
The Result
After integrating cc-review into my Second Brain project workflow, here's what changed:
- Gemini free tier handles ~1,000 reviews/day at $0.00. For a solo developer shipping phases one at a time, this is effectively unlimited.
- 5 real bugs caught in 3 weeks that I confirmed would have reached production — 2 correctness issues, 2 security issues, 1 missing error handler.
- Review latency: 8-14 seconds per phase diff using Gemini Flash. Fast enough to run after every significant change without breaking flow.
- Adversarial mode changed how I think about code I generate. I now proactively consider race conditions and assumption brittleness because I've seen the engine surface them repeatedly.
The workflow became:
Implement phase with Claude → /review → fix blockers → /review-adversarial → fix assumptions → commit
Key Takeaway
An LLM cannot objectively review its own output. Not because the model is bad — because the training distribution, the context window, and the confirmation bias are all pointing the same direction. Independent review means a genuinely different model, with different training data, reading your code cold.
cc-review is a 20-minute setup that gives you that independence, at zero cost, with full control over which engine reviews which code and whether anything ever leaves your machine.
The irony is: the better your AI coding assistant gets, the more you need this. The faster Claude ships code, the faster bugs accumulate without a second opinion.
Star the repo if this is useful: github.com/mudavathsrinivas/cc-review
Pull requests welcome — especially new engine configs for engines.yaml. If you've got a working block for Mistral, Cohere, or any local model, open a PR and I'll merge it.
Follow me here on Dev.to — I'm documenting the full AI Second Brain build in public, including the infrastructure, the failures, and the moments where the AI confidently wrote something completely wrong.
Top comments (0)