Mariano Gobea Alcoba

Posted on May 11 • Originally published at mgatc.com

Show HN: adamsreview – better multi-agent PR reviews for Claude Code!

#claudecode #ai #codereview #softwaredevelopment

Advanced Multi-Agent System for Enhanced Code Review with Claude Code

The proliferation of AI-assisted code review tools has introduced novel paradigms for identifying defects and improving code quality. While existing solutions like Claude Code's built-in /review and /ultrareview commands, alongside third-party offerings such as CodeRabbit and Greptile, provide valuable automation, they often operate under a single-pass, monolithic review model. This approach can limit their ability to perform in-depth analysis, manage complex dependencies, and effectively integrate human feedback. This article details the design and implementation of adamsreview, a Claude Code plugin engineered to address these limitations by leveraging a multi-agent, multi-stage review process.

adamsreview is conceived as a system of interconnected sub-agents, orchestrated to perform distinct analytical tasks. This architecture allows for a more granular and robust review process, moving beyond the capabilities of simpler, single-pass AI reviews. The core philosophy is to decompose the review into manageable stages, each handled by specialized agents, with explicit state management and mechanisms for human intervention and iterative refinement.

System Architecture and Core Components

The adamsreview plugin comprises six distinct Claude Code slash commands, each representing a stage or utility within the review workflow:

/review: Initiates a comprehensive, multi-stage review process.
/codex-review: Integrates with Codex CLI for an ensemble review approach, augmenting Claude's analysis.
/add: Allows for the explicit inclusion of specific files or directories in the review scope.
/promote: Facilitates the promotion of specific findings to higher stages of review or action.
/walkthrough: Engages Claude's AskUserQuestion feature to present uncertain findings or items requiring human judgment iteratively.
/fix: Orchestrates the resolution of identified issues, including group-based agent dispatch and regression testing.

A key architectural tenet is the management of review state. Unlike ephemeral review processes, adamsreview utilizes persistent JSON artifacts stored on disk. This state management is crucial for enabling multi-stage reviews where context can be cleared between stages without losing critical information. Scripts are included to manage the lifecycle of this state, ensuring data integrity and facilitating subsequent review iterations.

Multi-Stage Review Process

The primary /review command is the entry point to the multi-stage process. It initiates a series of parallel sub-agent analyses, followed by a sequential validation pass.

Parallel Sub-Agent Analysis

Upon invocation, /review triggers an array of specialized Claude Code agents to operate in parallel. These agents are tasked with specific aspects of code analysis:

Security Agent: Scans for common security vulnerabilities (e.g., SQL injection, XSS, improper authentication).
Performance Agent: Identifies potential performance bottlenecks (e.g., inefficient loops, redundant computations, suboptimal data structures).
Maintainability Agent: Assesses code readability, complexity, and adherence to design principles (e.g., SOLID, DRY).
Bug Detection Agent: Focuses on identifying logical errors, off-by-one errors, null pointer dereferences, and other common programming mistakes.
Style Agent: Enforces coding style guidelines and best practices.

Each of these agents operates independently, processing the provided code context. The results are aggregated, and a preliminary report is generated.

Sequential Validation Pass

Following the parallel analysis, a sequential validation pass is performed. This stage involves a more holistic evaluation of the aggregated findings. A dedicated "Validator Agent" reviews the output from the parallel sub-agents, looking for:

False Positives: Cross-referencing findings to identify redundant or incorrect reports.
Interdependencies: Analyzing how findings in one area might impact another.
Severity Prioritization: Assigning severity levels (e.g., Critical, High, Medium, Low) to identified issues based on potential impact.

This validation pass aims to refine the raw output from the sub-agents, producing a more coherent and actionable review report.

State Management and Context Persistence

The persistence of review state through JSON artifacts is a distinguishing feature of adamsreview. This mechanism allows for:

Intermediate State Saving: After each significant stage of the review, the state is serialized to a JSON file. This file typically includes the code diff, the aggregated findings from previous stages, and any user-provided annotations.
Contextual Clarity Between Stages: When a user invokes a subsequent command (e.g., /walkthrough after /review), the system loads the relevant JSON state. This ensures that the AI has access to the historical findings and the current state of the review, even if the intermediate Claude Code session context has been cleared.
Selective Review Scope: The /add command allows users to augment the review scope with specific files or directories. This information is appended to the persistent state, ensuring that future review stages consider the expanded scope.
State Management Scripts: Utility scripts are provided to manage the creation, updating, and clearing of these JSON state files, offering a programmatic interface for controlling the review lifecycle.

The JSON state might adopt a structure similar to this:

{
  "commit_hash": "a1b2c3d4e5f67890",
  "base_branch": "main",
  "review_files": [
    "src/utils.py",
    "src/models.py"
  ],
  "findings": [
    {
      "stage": "initial_analysis",
      "agent": "security_agent",
      "file": "src/models.py",
      "line": 42,
      "message": "Potential SQL injection vulnerability in user_query function.",
      "severity": "High",
      "details": "The user input is directly concatenated into the SQL query string without sanitization."
    },
    {
      "stage": "initial_analysis",
      "agent": "performance_agent",
      "file": "src/utils.py",
      "line": 105,
      "message": "Inefficient loop detected in data_processing function.",
      "severity": "Medium",
      "details": "Consider using a vectorized operation instead of iterating through each element."
    }
  ],
  "user_annotations": [],
  "review_status": "in_progress"
}

Human-AI Collaboration and Iterative Refinement

adamsreview places a strong emphasis on facilitating human-AI collaboration, particularly in handling uncertainty and driving towards resolution.

`/walkthrough` Command

The /walkthrough command is designed to address findings that are potentially ambiguous or require domain-specific knowledge that the AI might not fully possess. It leverages Claude's AskUserQuestion feature to interactively engage the user:

Presentation of Findings: The command iterates through the aggregated findings from the persistent state.
Interactive Querying: For each finding deemed to require human judgment (e.g., based on confidence scores or pre-defined heuristics), adamsreview uses AskUserQuestion to present the finding to the user.
User Feedback Loop: The user can then provide feedback, ask clarifying questions, or instruct the AI on how to proceed. This interaction is recorded and incorporated back into the persistent state.
Iterative Refinement: This process can be repeated, allowing users to progressively refine the review results and guide the AI's understanding.

This interactive approach transforms the review from a black-box process into a dynamic dialogue.

`/promote` Command

The /promote command allows users to explicitly elevate the importance of certain findings. This can be useful for:

Marking Critical Issues: Users can mark specific findings as "critical" or "must-fix" regardless of the AI's initial severity assessment.
Contextualizing Findings: Users can add additional context or justifications to findings, which can then be used by subsequent agents or for reporting.

The promoted findings are updated in the persistent JSON state, influencing subsequent review or fix stages.

Ensemble Review with Codex CLI

The /codex-review command introduces an ensemble approach by integrating with the Codex CLI. This offers an alternative or complementary review perspective:

Code Export: The relevant code diff or subset of files is exported in a format compatible with Codex CLI.
Codex CLI Execution: The Codex CLI is invoked with specific prompts designed to elicit code review feedback.
Result Aggregation: The output from Codex CLI is parsed and merged with the findings from Claude's native review.
Cross-Validation: This ensemble approach enables cross-validation of findings. If both Claude and Codex identify a similar issue, the confidence in that finding increases. Discrepancies can highlight areas where one model might be stronger than the other or where an issue is particularly subtle.

This strategy aims to leverage the strengths of different AI models, potentially reducing the false positive rate and increasing the detection of more nuanced bugs.

Automated Fixing and Regression Prevention

The /fix command is designed to automate the remediation of identified issues, incorporating a robust process for preventing regressions.

Per-Fix-Group Agent Dispatch

Issues are often related. For instance, a security vulnerability might necessitate changes across multiple files, or a refactoring effort might span several related functions. The /fix command groups related findings together. For each identified "fix group":

Specialized Fix Agent: A dedicated "Fix Agent" is dispatched. This agent is tasked with understanding the scope of the fix group and proposing code modifications.
Iterative Fixing: The agent may iterate on its proposed fixes, attempting to resolve all issues within the group.
Commit Planning: Proposed changes are staged for review.

Re-Review and Regression Testing

After the Fix Agent has proposed modifications, adamsreview performs a crucial re-review and regression check:

Post-Fix Review: The modified code is immediately subjected to a subset of the original review agents (particularly the bug detection and security agents). This "post-fix review" aims to identify any new issues introduced by the attempted fixes (regressions).
Unit Test Execution (Optional but Recommended): If a testing framework is integrated with the development environment, adamsreview can trigger unit tests. This provides a more direct measure of functional correctness.
Survivor Commit: Only changes that pass the post-fix review and all executed tests are committed. Findings that introduce regressions or new issues are reverted.
Iterative Fix Attempt: If fixes are reverted, the findings associated with those fixes are returned to the persistent state, potentially with updated information from the regression analysis, allowing for further attempts at remediation.

This disciplined approach ensures that automated fixes are safe and do not compromise existing code quality.

Comparison with Existing Tools

adamsreview distinguishes itself from existing solutions in several key aspects:

/review vs. /ultrareview: While /ultrareview in Claude Code offers enhanced capabilities, it draws from the "Extra Usage" pool, incurring direct costs. adamsreview operates on a standard Claude Code subscription (Max plan recommended for extensive context windows), providing a more cost-effective, deeper review.
Depth of Analysis: By employing a multi-stage, multi-agent approach with parallel sub-analyses and explicit validation, adamsreview aims for a more comprehensive detection rate of bugs and vulnerabilities compared to single-pass tools.
State Persistence: The explicit JSON state management enables multi-stage reviews and context continuity, which is not a standard feature in many AI review tools that often operate within a single conversational turn or ephemeral session.
Human-AI Collaboration: The /walkthrough command, using AskUserQuestion, provides a structured way for humans to guide and validate AI findings, fostering a more collaborative development process.
Ensemble Capabilities: The /codex-review command's integration with Codex CLI offers an ensemble review perspective, potentially improving accuracy and reducing false positives.
Automated Fix and Regression Prevention: The /fix command's structured approach to fixing issues, including post-fix re-reviews and regression checks, provides a more robust automated remediation process than simple patch generation.

Implementation Details and Usage

The adamsreview plugin is installed using Claude Code's plugin marketplace:

/plugin marketplace add adamjgmiller/adamsreview
/plugin install adamsreview@adamsreview

Example Workflow:

Initiate Review:
```
/review
```
This triggers the multi-stage analysis. Findings are stored in a JSON artifact.
Add Specific Files (Optional): If the initial review missed certain critical files, or if the user wants to ensure specific files are considered in subsequent stages:
```
/add src/config/settings.py tests/unit/test_api.py
```
The state is updated to include these files.
Interactive Walkthrough: For findings that require user input:
```
/walkthrough
```
Claude Code prompts the user with questions about specific findings. User responses update the state.
Promote a Finding: If a user identifies a finding as particularly critical:
```
/promote finding_id_123 --priority critical --comment "This is a major security flaw."
```
The finding's metadata is updated in the state.
Ensemble Review (Optional): To augment Claude's analysis with Codex:
```
/codex-review
```
Codex CLI is invoked, and its findings are merged into the state.
Automated Fix Attempt: To fix identified issues:
```
/fix
```
Agents attempt to fix issues, followed by a re-review and regression check. Commits are made only for safe fixes.
Clearing State: To start a fresh review, the JSON state file needs to be removed or managed by the utility scripts.

The recommended plan for using adamsreview effectively is Claude Code's Max plan, which typically offers larger context windows. This is beneficial for processing extensive codebases and detailed diffs, which are common in complex PRs, thereby maximizing the effectiveness of the multi-agent system.

Future Enhancements and Considerations

Customizable Agent Configurations: Allowing users to enable/disable specific sub-agents or tune their parameters.
Integration with CI/CD Pipelines: Enabling adamsreview to be triggered automatically as part of a CI/CD workflow.
Advanced Regression Detection: Incorporating more sophisticated static analysis tools or fuzzing techniques for regression detection.
Learning from User Feedback: Developing mechanisms for the AI to learn from user annotations and correction patterns over time.
Broader LLM Integration: Extending the ensemble review to include other large language models.

Conclusion

adamsreview presents a robust and extensible framework for AI-assisted code review, designed to overcome the limitations of simpler, monolithic approaches. By employing a multi-stage, multi-agent architecture with sophisticated state management, human-AI collaboration features, and automated regression prevention, it aims to deliver significantly more accurate and actionable insights than existing tools. The system's modular design allows for continuous improvement and adaptation, paving the way for more intelligent and collaborative code review processes.

For organizations seeking to enhance their code quality and streamline their development workflows through advanced AI-driven code review solutions, consulting services can be invaluable. Visit https://www.mgatc.com to explore how expert guidance can help implement and optimize such sophisticated systems within your development lifecycle.

Originally published in Spanish at www.mgatc.com/blog/adamsreview-better-multi-agent-pr-reviews-for-claude-code/

DEV Community

Show HN: adamsreview – better multi-agent PR reviews for Claude Code!