Luke Fryer

Posted on Apr 20 • Originally published at aipromptarchitect.co.uk

Building AI-Powered Code Review Workflows with Custom Prompts

#ai #promptengineering #developer #react

        <h2>Why AI Code Review Matters</h2>
        <p>Manual code review is a bottleneck. Senior developers spend <strong>20-30% of their time</strong> reviewing pull requests, yet studies show human reviewers miss approximately 50% of bugs in code under review. AI-assisted code review doesn't replace humans — it augments them by catching mechanical issues so reviewers can focus on architecture and design decisions.</p>
        <p>The key insight: the quality of AI code review is <strong>entirely determined by the prompt</strong>. A generic "review this code" instruction produces generic, surface-level feedback. A well-engineered prompt produces specific, actionable, priority-ranked issues that match your team's standards.</p>

        <h2>The Three-Layer Review Architecture</h2>
        <p>Production AI code review should operate in three distinct layers, each with a specialised prompt:</p>
        <ol>
            <li><strong>Security Layer</strong> — Scans for vulnerabilities: injection attacks, auth bypasses, data exposure, insecure dependencies</li>
            <li><strong>Quality Layer</strong> — Evaluates code quality: logic errors, edge cases, error handling, type safety, test coverage</li>
            <li><strong>Style Layer</strong> — Enforces consistency: naming conventions, documentation, architectural patterns, team standards</li>
        </ol>
        <p>Running these as separate prompts is more effective than a single "review everything" prompt because each layer has different evaluation criteria and severity scales.</p>

        <h2>Security Review Prompt</h2>
        <pre><code>System: You are a senior application security engineer performing a security-focused code review.

Context

Language: {language}
Framework: {framework}
This code handles: {description}

Security Checklist

Evaluate the code against these categories:

INJECTION: SQL injection, XSS, command injection, LDAP injection, template injection
AUTHENTICATION: Broken auth flows, session management, credential handling
AUTHORISATION: Missing access controls, IDOR, privilege escalation
DATA EXPOSURE: Sensitive data in logs, hardcoded secrets, PII leakage
CRYPTOGRAPHY: Weak algorithms, improper key management, predictable tokens
INPUT VALIDATION: Missing sanitisation, type coercion, boundary checks
DEPENDENCIES: Known CVEs, outdated packages, supply chain risks

Output Format

For each finding:

SEVERITY: CRITICAL | HIGH | MEDIUM | LOW
CWE: The relevant CWE identifier
LOCATION: File and line number
DESCRIPTION: What the vulnerability is
EXPLOIT: How an attacker could exploit it
FIX: The specific code change needed

If no security issues are found, state "No security issues identified" and explain what security measures are correctly implemented.

        <h2>Quality Review Prompt</h2>
        <pre><code>System: You are a principal software engineer reviewing code for production readiness.

Review Criteria

CORRECTNESS: Logic errors, off-by-one errors, race conditions, null handling
EDGE CASES: Empty inputs, boundary values, concurrent access, network failures
ERROR HANDLING: Uncaught exceptions, error propagation, user-facing error messages
PERFORMANCE: N+1 queries, unnecessary re-renders, memory leaks, algorithmic complexity
TESTABILITY: Tight coupling, hidden dependencies, untestable side effects
MAINTAINABILITY: Complex conditionals, deep nesting, duplicate logic, magic numbers

Constraints

Focus on substantive issues, not nitpicks
Every issue must include a concrete fix
Rate each issue: MUST_FIX | SHOULD_FIX | CONSIDER
If the code is well-written, say so and explain what makes it good

Output

Provide your review as a structured list, ordered by severity.

        <h2>Integrating AI Review into CI/CD</h2>
        <p>The most effective pattern integrates AI review directly into your pull request workflow. Here's a production architecture:</p>
        <pre><code># .github/workflows/ai-review.yml

name: AI Code Review
on:
pull_request:
types: [opened, synchronize]

jobs:
ai-review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Get changed files
id: changed
run: |
echo "files=$(git diff --name-only origin/main...HEAD | tr '\n' ' ')" >> $GITHUB_OUTPUT
- name: Run AI Security Review
run: |
for file in ${{ steps.changed.outputs.files }}; do
# Send each file to your AI review API
curl -X POST https://your-api/review \
-H "Authorization: Bearer ${{ secrets.AI_API_KEY }}" \
-d "{"file": "$(cat $file)", "layer": "security"}"
done

        <h2>Handling False Positives</h2>

        <p>AI code reviewers produce false positives. Managing them is critical for developer trust:</p>

        <ul>

            <li><strong>Calibrate severity thresholds</strong> — Start with CRITICAL and HIGH only; add lower severities once trust is established</li>

            <li><strong>Provide context</strong> — Include the project's tech stack, coding standards, and known patterns in the prompt</li>

            <li><strong>Use suppress comments</strong> — Allow developers to mark false positives with <code>// ai-review-ignore: reason</code></li>

            <li><strong>Track accuracy</strong> — Log accept/reject rates per issue category and use this data to refine your prompts</li>

            <li><strong>Feedback loop</strong> — Feed dismissed issues back into the prompt as "do not flag" examples</li>

        </ul>

    &lt;h2&gt;Diff-Based vs Full-File Review&lt;/h2&gt;
    &lt;p&gt;A common mistake is sending entire files for review. For pull requests, &lt;strong&gt;diff-based review is superior&lt;/strong&gt;:&lt;/p&gt;
    &lt;ul&gt;
        &lt;li&gt;&lt;strong&gt;Token efficiency&lt;/strong&gt; — You pay for input tokens. Sending only the diff can reduce costs by 80%+&lt;/li&gt;
        &lt;li&gt;&lt;strong&gt;Focused feedback&lt;/strong&gt; — The model focuses on what changed rather than re-reviewing existing code&lt;/li&gt;
        &lt;li&gt;&lt;strong&gt;Context window&lt;/strong&gt; — Large files may exceed the model's context window&lt;/li&gt;
    &lt;/ul&gt;
    &lt;p&gt;However, include surrounding context (10-20 lines above and below each change) so the model understands the code's environment. The optimal format:&lt;/p&gt;
    &lt;pre&gt;&lt;code&gt;## Changed File: src/auth/login.ts

Change Type: Modified

Context (lines 45-85, changed lines marked with +/-)

async function handleLogin(req: Request) {
const { email, password } = req.body;

const user = await db.query('SELECT * FROM users WHERE email = ' + email);

const user = await db.query('SELECT * FROM users WHERE email = $1', [email]);
if (!user) {
return res.status(401).json({ error: 'Invalid credentials' });
}

    <h2>Multi-Model Review Strategy</h2>
    <p>Different models have different strengths for code review:</p>
    <table>
        <tr><th>Model</th><th>Best For</th><th>Weakness</th></tr>
        <tr><td>GPT-4</td><td>Security analysis, complex logic</td><td>Can be verbose; higher cost</td></tr>
        <tr><td>Claude 3.5 Sonnet</td><td>Code quality, refactoring suggestions</td><td>May over-suggest abstractions</td></tr>
        <tr><td>Gemini Pro</td><td>Documentation review, API consistency</td><td>Less reliable on security edge cases</td></tr>
    </table>
    <p>A production system can route different review layers to different models, optimising for both quality and cost.</p>

    <h2>How AI Prompt Architect Helps</h2>
    <p>AI Prompt Architect provides pre-built <strong>code review prompt templates</strong> that are battle-tested across hundreds of repositories. Use the <strong>Generate</strong> workflow with "code review" as your task to get a structured review prompt tailored to your stack. The <strong>Refine</strong> workflow can then customise it with your team's specific coding standards and common pitfalls.</p>

This article was originally published with extended interactive STCO schemas on AI Prompt Architect.

DEV Community