Previously: The Complete Dev Cycle
In Part 4 of this series, my AI assistant achieved something remarkable. Running inside a secure Docker container, it could now execute the entire development cycle:
Code → Test → Build → Deploy → Commit
I called it the finale. The trilogy was complete. The AI could write code, run tests, build artifacts, deploy to containers, and commit changes — all while keeping secrets safely hidden.
I was wrong. Something was missing.
The Missing Piece
Look at that cycle again. Now think about how a real development team works.
Code → Test → Build → Deploy → Commit → PR → ...
Where's the review?
In any professional team, code doesn't just flow from writing to deployment. Someone reads it. Someone checks for bugs, security issues, architectural problems. Someone asks "did you consider this edge case?"
My AI could do everything — except check its own work.
The Official Plugin
Claude Code has an official /code-review plugin. When I discovered it, I was impressed by its design:
- Parallel agents: Multiple AI agents analyze code simultaneously from different angles — bug scanning, CLAUDE.md compliance checking
- Confidence scoring: Each finding gets a score, filtering out noise
- Verification step: A separate agent re-checks findings to eliminate false positives
This is serious engineering. Not "ask AI to review code" but a structured, multi-stage pipeline designed to produce high-signal results.
I installed it immediately.
And it didn't work.
Why It Couldn't Reach
The official plugin is designed for a standard GitHub workflow. It expects:
-
ghCLI — to fetch PR details from GitHub - A GitHub PR — the review target is a pull request
- A single repository — it operates within one project
My AI Sandbox environment has none of that:
- No
ghCLI (the container has no GitHub authentication) - No PR yet (I want review before pushing, not after)
- Multiple independent repositories in one workspace (API, Web, iOS — each with their own Git history)
The plugin couldn't reach my code. Not because it was poorly designed — it's excellent at what it does. But it was built for a different moment in the development cycle: after you push. I needed something before.
Learning From the Design
I couldn't use the plugin directly, but I could learn from it.
The plugins documentation showed me that Claude Code's custom commands are just Markdown files — structured instructions that become slash commands. The official /code-review demonstrated what a well-designed review pipeline looks like: parallel analysis, scoring, verification.
So I did what my AI Sandbox was built for. I asked the AI:
Analyze the code-review plugin and create a custom command that works locally. Allow selecting which project to review. Confirm the target branch with the user. Run the same kind of review, but without GitHub access.
The AI read the official plugin, understood its structure, and produced a local version. No gh dependency. Multi-project support. Git and non-Git modes.
It worked.
From One to Nine
Once the local review command was running, the next thought was obvious.
If I can have a general code reviewer, why not a security reviewer? A performance reviewer? An architecture reviewer?
Each review type needs different expertise. A security review looks for injection vulnerabilities, authentication gaps, and data exposure. A performance review looks for N+1 queries, unnecessary allocations, and missing caching. A general review catches bugs and checks CLAUDE.md compliance.
One command became nine:
| Command | Purpose |
|---|---|
ais-local-review |
General code review (bugs, CLAUDE.md) |
ais-local-security-review |
Security vulnerabilities |
ais-local-performance-review |
Performance bottlenecks |
ais-local-architecture-review |
Structural concerns |
ais-local-test-review |
Test quality assessment |
ais-local-doc-review |
Documentation accuracy |
ais-local-prompt-review |
AI prompt/command quality |
ais-refactor |
Concrete refactoring suggestions |
ais-test-gen |
Automated test generation |
All nine share the same pipeline architecture inspired by the official plugin:
Parallel Analysis → Scoring → Verification → Report
(4-5 Sonnet agents) (Haiku) (Sonnet)
Each specialized command sends parallel agents with different review perspectives. A scoring agent evaluates confidence. A verification agent eliminates false positives. Only high-confidence, verified findings make it to the final report.
The Pipeline in Action
Here's what happens when you run /ais-local-review:
Step 1: Select a project and branch (or files, if no Git)
Step 2: Four Sonnet agents launch in parallel:
- Agent #1: CLAUDE.md compliance — does the code follow project conventions?
- Agent #2: Bug scan — obvious logic errors, edge cases
- Agent #3: History analysis — are we reintroducing a previously fixed bug?
- Agent #4: Comment check — does the code match its own documentation?
Step 3: A Haiku agent scores every finding (0-100)
Step 4: A Sonnet verification agent re-checks anything scoring 75+
Step 5: Only confirmed, high-confidence issues appear in the report
The result is a focused report. Not a wall of nitpicks — a short list of things that actually matter.
Two Reviews, Two Moments
Here's what's interesting: the official plugin and my local commands aren't competing. They serve different moments in the development cycle.
Code → Review → Test → Build → Deploy → Commit → PR → Review
↑ ↑
ais-* commands Official /code-review
Before you push After you push
Quality gate Team review
Local, private GitHub, collaborative
The official /code-review is for when your code is ready for team eyes. It posts comments on PRs, suggests changes, integrates with GitHub's collaboration features.
My ais-* commands are for before that moment. While you're still developing. Before you've committed, sometimes before you've even finished writing tests. A private quality gate that catches issues early, when they're cheapest to fix.
The Completed Cycle
Remember the development cycle from Part 4?
Code → Test → Build → Deploy → Commit
Here's what it looks like now:
Code → Review → Test → Build → Deploy → Commit
↑
The missing piece
The AI can write code, review its own work (from multiple perspectives), run tests, build, deploy, and commit. The quality gate that was missing is now in place.
What I Learned
This project started because the official plugin couldn't reach my code. But that limitation led somewhere unexpected.
The official plugin's design — parallel agents, confidence scoring, false positive elimination — was the blueprint. Open source at its best: you read how something works, understand the principles, and adapt them to your environment.
I didn't just get a code reviewer. I got nine specialized review tools, a refactoring assistant, and an automated test generator. All because the official plugin showed me what a well-designed review pipeline looks like, and my AI Sandbox gave me a place to build one that works locally.
The Series So Far
What started as "my AI can see my API keys" has become something larger:
- Secrets: Hide sensitive files from AI using Docker volume mounts
- Toolbox: AI discovers and uses tools autonomously via SandboxMCP
- Host Access: AI breaks out of its container with controlled host OS access
- Review (this article): AI reviews its own code, completing the dev cycle
The trilogy became a tetralogy. I'll stop promising it's complete.
The AI Sandbox with DockMCP is open source: GitHub repository
If you've built custom review commands for your AI workflow, I'd love to hear about it in the comments.
Top comments (0)