DEV Community

Wilson Xu
Wilson Xu

Posted on

Building a Code Review Bot with Claude API and GitHub Actions

Building a Code Review Bot with Claude API and GitHub Actions

Automate your code reviews with AI-powered analysis that catches real bugs, security issues, and style violations — directly in your pull requests.


Every engineering team faces the same bottleneck: code reviews take time, reviewers get fatigued, and important issues slip through. What if an AI assistant could handle the first pass — flagging security vulnerabilities, catching logic errors, and enforcing style conventions — before a human even opens the PR?

In this article, you'll build a production-ready code review bot that integrates Claude's API with GitHub Actions. It will automatically analyze pull request diffs, post structured comments, and help your team ship higher-quality code faster.

What We're Building

By the end of this tutorial, you'll have:

  • A GitHub Action that triggers on every pull request
  • A Node.js script that fetches the PR diff and sends it to Claude for analysis
  • Structured review output posted as GitHub PR comments
  • Rate limiting, error handling, and cost controls built in

The bot will analyze code for:

  • Security vulnerabilities (SQL injection, XSS, hardcoded secrets)
  • Logic bugs and edge cases
  • Performance antipatterns
  • Missing error handling
  • Style and maintainability issues

Prerequisites

  • A GitHub repository
  • An Anthropic API key (get one at console.anthropic.com)
  • Node.js 18+ (for local testing)
  • Basic familiarity with GitHub Actions

Step 1: Setting Up the GitHub Action

Create .github/workflows/ai-code-review.yml in your repository:

name: AI Code Review

on:
  pull_request:
    types: [opened, synchronize, reopened]
    # Only run on PRs targeting main/master
    branches:
      - main
      - master

# Limit concurrent runs to avoid API rate limits
concurrency:
  group: ai-review-${{ github.event.pull_request.number }}
  cancel-in-progress: true

jobs:
  ai-review:
    name: Claude Code Review
    runs-on: ubuntu-latest
    # Skip draft PRs
    if: github.event.pull_request.draft == false

    permissions:
      contents: read
      pull-requests: write

    steps:
      - name: Checkout repository
        uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Set up Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'

      - name: Install dependencies
        run: |
          npm install @anthropic-ai/sdk@latest
          npm install @octokit/rest@latest

      - name: Run AI Code Review
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          PR_NUMBER: ${{ github.event.pull_request.number }}
          REPO_OWNER: ${{ github.repository_owner }}
          REPO_NAME: ${{ github.event.repository.name }}
          BASE_SHA: ${{ github.event.pull_request.base.sha }}
          HEAD_SHA: ${{ github.event.pull_request.head.sha }}
        run: node .github/scripts/ai-review.js

      - name: Upload review artifacts
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: ai-review-${{ github.event.pull_request.number }}
          path: /tmp/ai-review-*.json
          retention-days: 7
Enter fullscreen mode Exit fullscreen mode

This action:

  • Triggers on PR open, update, and reopen events
  • Skips draft PRs to avoid wasting API calls
  • Uses concurrency controls to cancel stale runs
  • Grants write access to post PR comments

Step 2: The Review Script

Create .github/scripts/ai-review.js:

import Anthropic from '@anthropic-ai/sdk';
import { Octokit } from '@octokit/rest';
import { execSync } from 'child_process';
import fs from 'fs';

// Configuration
const MAX_DIFF_LINES = 500;      // Truncate massive diffs
const MAX_FILE_SIZE = 50_000;    // Skip files over 50KB
const REVIEW_COMMENT_TAG = '<!-- AI-CODE-REVIEW -->';

// File extensions to review (skip binary, generated files)
const REVIEWABLE_EXTENSIONS = new Set([
  '.js', '.jsx', '.ts', '.tsx', '.mjs', '.cjs',
  '.py', '.rb', '.go', '.rs', '.java', '.kt',
  '.c', '.cpp', '.h', '.hpp', '.cs',
  '.php', '.swift', '.scala',
  '.sh', '.bash', '.zsh',
  '.yml', '.yaml', '.json', '.toml',
  '.sql', '.graphql',
]);

// Files/patterns to skip
const SKIP_PATTERNS = [
  /node_modules/,
  /\.min\.(js|css)$/,
  /package-lock\.json$/,
  /yarn\.lock$/,
  /pnpm-lock\.yaml$/,
  /dist\//,
  /build\//,
  /\.generated\./,
  /__generated__/,
  /migrations\/\d+/,
];

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

const octokit = new Octokit({
  auth: process.env.GITHUB_TOKEN,
});

/**
 * Fetch the PR diff using git
 */
async function getPRDiff(baseSha, headSha) {
  try {
    const diff = execSync(
      `git diff ${baseSha}..${headSha} --unified=5 --no-color`,
      { maxBuffer: 10 * 1024 * 1024 } // 10MB buffer
    ).toString();
    return diff;
  } catch (error) {
    console.error('Failed to get diff:', error.message);
    throw error;
  }
}

/**
 * Parse diff into per-file chunks
 */
function parseDiff(rawDiff) {
  const files = [];
  const fileDiffRegex = /^diff --git a\/(.*?) b\/(.*?)$/gm;
  const parts = rawDiff.split(/^diff --git /m).slice(1);

  for (const part of parts) {
    const lines = part.split('\n');
    const headerLine = lines[0]; // "a/path b/path"
    const match = headerLine.match(/a\/(.*?) b\/(.*)/);
    if (!match) continue;

    const filePath = match[2];
    const ext = '.' + filePath.split('.').pop();

    // Skip non-reviewable files
    if (!REVIEWABLE_EXTENSIONS.has(ext)) continue;
    if (SKIP_PATTERNS.some(p => p.test(filePath))) continue;

    const diffContent = 'diff --git ' + part;
    const addedLines = lines.filter(l => l.startsWith('+')).length;
    const removedLines = lines.filter(l => l.startsWith('-')).length;

    files.push({
      path: filePath,
      diff: diffContent,
      addedLines,
      removedLines,
      lineCount: lines.length,
    });
  }

  return files;
}

/**
 * Build the review prompt for Claude
 */
function buildReviewPrompt(files) {
  const fileSection = files.map(f => {
    const truncatedDiff = f.diff.split('\n').slice(0, MAX_DIFF_LINES).join('\n');
    const wasTruncated = f.diff.split('\n').length > MAX_DIFF_LINES;

    return `### File: ${f.path}
${wasTruncated ? `*(diff truncated at ${MAX_DIFF_LINES} lines)*\n` : ''}
\`\`\`diff
${truncatedDiff}
\`\`\``;
  }).join('\n\n');

  return `You are an expert code reviewer performing a thorough analysis of a pull request diff.

Review the following code changes and identify:
1. **Security vulnerabilities** — SQL injection, XSS, CSRF, path traversal, hardcoded credentials, insecure dependencies
2. **Logic bugs** — off-by-one errors, null/undefined dereferences, race conditions, incorrect conditionals
3. **Error handling gaps** — unhandled promise rejections, missing try/catch, unchecked return values
4. **Performance issues** — N+1 queries, unnecessary re-renders, missing memoization, memory leaks
5. **Maintainability** — overly complex functions, missing comments on non-obvious logic, magic numbers

For each issue found:
- Specify the **file path** and approximate **line number**
- Give a **severity**: CRITICAL, HIGH, MEDIUM, LOW
- Provide a **clear explanation** of the problem
- Suggest a **concrete fix** with example code when applicable

If the code is well-written with no significant issues, say so clearly.

Format your response as JSON matching this schema:
{
  "summary": "2-3 sentence overall assessment",
  "score": 1-10,
  "issues": [
    {
      "file": "path/to/file.js",
      "line": 42,
      "severity": "HIGH",
      "category": "security|logic|error-handling|performance|style",
      "title": "Short issue title",
      "description": "Detailed explanation",
      "suggestion": "How to fix it",
      "codeExample": "Optional corrected code snippet"
    }
  ],
  "positives": ["List of things done well"],
  "blockers": true/false
}

## Changes to Review

${fileSection}`;
}

/**
 * Call Claude API to review the code
 */
async function reviewWithClaude(prompt) {
  console.log('Sending diff to Claude for review...');

  const message = await anthropic.messages.create({
    model: 'claude-opus-4-5',
    max_tokens: 4096,
    temperature: 0.1, // Low temperature for consistent, precise analysis
    system: `You are a senior software engineer performing code review. You are thorough, precise, and constructive.
You prioritize security and correctness above all else. You always respond with valid JSON.`,
    messages: [
      {
        role: 'user',
        content: prompt,
      },
    ],
  });

  const responseText = message.content[0].text;

  // Extract JSON from response (Claude sometimes wraps in markdown)
  const jsonMatch = responseText.match(/\{[\s\S]*\}/);
  if (!jsonMatch) {
    throw new Error('Claude did not return valid JSON');
  }

  return JSON.parse(jsonMatch[0]);
}

/**
 * Format review results as a markdown comment
 */
function formatReviewComment(review, filesReviewed) {
  const severityEmoji = {
    CRITICAL: '🚨',
    HIGH: '⚠️',
    MEDIUM: '💡',
    LOW: '📝',
  };

  const scoreEmoji = review.score >= 8 ? '' : review.score >= 6 ? '🟡' : '🔴';

  let comment = `${REVIEW_COMMENT_TAG}
## 🤖 AI Code Review

${scoreEmoji} **Score: ${review.score}/10** | **Files reviewed: ${filesReviewed}** | **Blockers: ${review.blockers ? 'YES ⛔' : 'None ✅'}**

### Summary
${review.summary}

`;

  if (review.issues && review.issues.length > 0) {
    // Group by severity
    const bySeverity = { CRITICAL: [], HIGH: [], MEDIUM: [], LOW: [] };
    for (const issue of review.issues) {
      bySeverity[issue.severity]?.push(issue);
    }

    comment += `### Issues Found (${review.issues.length})\n\n`;

    for (const [severity, issues] of Object.entries(bySeverity)) {
      if (issues.length === 0) continue;
      comment += `#### ${severityEmoji[severity]} ${severity} (${issues.length})\n\n`;

      for (const issue of issues) {
        comment += `**${issue.title}** — \`${issue.file}:${issue.line}\`\n`;
        comment += `${issue.description}\n\n`;
        if (issue.suggestion) {
          comment += `> **Fix:** ${issue.suggestion}\n\n`;
        }
        if (issue.codeExample) {
          comment += `\`\`\`\n${issue.codeExample}\n\`\`\`\n\n`;
        }
      }
    }
  } else {
    comment += `### ✅ No Issues Found\n\nThis change looks good! No significant issues detected.\n\n`;
  }

  if (review.positives && review.positives.length > 0) {
    comment += `### 👍 What's Good\n`;
    for (const positive of review.positives) {
      comment += `- ${positive}\n`;
    }
    comment += '\n';
  }

  comment += `---\n*Reviewed by Claude claude-opus-4-5 · [Report an issue](https://github.com/anthropics/anthropic-sdk-python/issues)*`;

  return comment;
}

/**
 * Post or update the review comment on the PR
 */
async function postReviewComment(owner, repo, prNumber, commentBody) {
  // Find existing AI review comment to update
  const { data: comments } = await octokit.rest.issues.listComments({
    owner,
    repo,
    issue_number: prNumber,
  });

  const existingComment = comments.find(c => c.body.includes(REVIEW_COMMENT_TAG));

  if (existingComment) {
    console.log(`Updating existing review comment ${existingComment.id}`);
    await octokit.rest.issues.updateComment({
      owner,
      repo,
      comment_id: existingComment.id,
      body: commentBody,
    });
  } else {
    console.log('Creating new review comment');
    await octokit.rest.issues.createComment({
      owner,
      repo,
      issue_number: prNumber,
      body: commentBody,
    });
  }
}

/**
 * Main execution
 */
async function main() {
  const {
    REPO_OWNER: owner,
    REPO_NAME: repo,
    PR_NUMBER,
    BASE_SHA,
    HEAD_SHA,
  } = process.env;

  const prNumber = parseInt(PR_NUMBER, 10);

  console.log(`\n=== AI Code Review ===`);
  console.log(`Repository: ${owner}/${repo}`);
  console.log(`PR #${prNumber}: ${BASE_SHA}..${HEAD_SHA}\n`);

  try {
    // Step 1: Get the diff
    const rawDiff = await getPRDiff(BASE_SHA, HEAD_SHA);

    if (!rawDiff.trim()) {
      console.log('No diff found — skipping review');
      return;
    }

    // Step 2: Parse into files
    const files = parseDiff(rawDiff);
    console.log(`Found ${files.length} reviewable files`);

    if (files.length === 0) {
      console.log('No reviewable files in this PR (all binary, generated, or lock files)');
      return;
    }

    // Step 3: Build prompt and review
    const prompt = buildReviewPrompt(files);
    const review = await reviewWithClaude(prompt);

    console.log(`Review complete: score=${review.score}, issues=${review.issues?.length ?? 0}`);

    // Step 4: Save artifacts
    const artifactPath = `/tmp/ai-review-${prNumber}.json`;
    fs.writeFileSync(artifactPath, JSON.stringify({ review, files: files.map(f => f.path) }, null, 2));

    // Step 5: Post comment
    const comment = formatReviewComment(review, files.length);
    await postReviewComment(owner, repo, prNumber, comment);

    console.log('Review comment posted successfully');

    // Step 6: Fail the check if there are critical blockers
    if (review.blockers && review.issues?.some(i => i.severity === 'CRITICAL')) {
      console.error('\n❌ Critical issues found — failing the check');
      process.exit(1);
    }

  } catch (error) {
    console.error('Review failed:', error);

    // Post error comment so developers know the bot failed
    try {
      await postReviewComment(owner, repo, prNumber,
        `${REVIEW_COMMENT_TAG}\n## 🤖 AI Code Review\n\n❌ Review failed: ${error.message}\n\nThe AI review bot encountered an error. Please review manually.`
      );
    } catch (commentError) {
      console.error('Failed to post error comment:', commentError.message);
    }

    process.exit(1);
  }
}

main();
Enter fullscreen mode Exit fullscreen mode

Step 3: Adding Secrets to Your Repository

Before the action can run, add your Anthropic API key as a GitHub secret:

# Using GitHub CLI
gh secret set ANTHROPIC_API_KEY --body "sk-ant-..." --repo your-org/your-repo

# Or through the web UI:
# Settings → Secrets and variables → Actions → New repository secret
Enter fullscreen mode Exit fullscreen mode

The GITHUB_TOKEN is automatically available — no setup needed.

Step 4: Crafting Effective Review Prompts

The quality of your code reviews depends heavily on prompt design. Here are patterns that work well:

Pattern 1: Chain-of-Thought for Security Analysis

const securityPrompt = `
Analyze this code for security vulnerabilities.
Think through each of these attack vectors step by step:

1. Input validation: Is user input sanitized before use?
2. Authentication: Are auth checks applied consistently?
3. Authorization: Can a user access another user's data?
4. Injection: Any SQL, command, or template injection possibilities?
5. Secrets: Any hardcoded credentials or API keys?

For each vector, explain your reasoning before concluding.
`;
Enter fullscreen mode Exit fullscreen mode

Pattern 2: Contextual Reviews with File History

async function getFileContext(filePath, baseSha) {
  // Get recent git log for this file
  const log = execSync(
    `git log --oneline -10 ${baseSha} -- ${filePath}`
  ).toString();

  return `Recent changes to ${filePath}:\n${log}`;
}

// Inject context into prompt
const contextualPrompt = `
${await getFileContext(file.path, BASE_SHA)}

Given this history, review the following change carefully for
regressions and consistency with the existing codebase:

${file.diff}
`;
Enter fullscreen mode Exit fullscreen mode

Pattern 3: Language-Specific Rules

function getLanguageGuidelines(ext) {
  const guidelines = {
    '.ts': `
- Check for any 'any' types that should be more specific
- Verify async/await error handling
- Look for missing return types on exported functions
- Check for proper null/undefined handling with optional chaining
    `,
    '.py': `
- Check for mutable default arguments (def foo(x=[]))
- Verify exception types are specific, not bare 'except:'
- Look for string formatting using % instead of f-strings
- Check for proper resource cleanup (context managers)
    `,
    '.go': `
- Verify all errors are explicitly handled, not ignored with _
- Check for goroutine leaks (channels that are never closed)
- Look for race conditions in concurrent code
- Verify defer statements are used correctly
    `,
  };
  return guidelines[ext] || '';
}
Enter fullscreen mode Exit fullscreen mode

Step 5: Handling Large Pull Requests

Large PRs need special handling to stay within API token limits:

const MAX_TOKENS_PER_REQUEST = 100_000; // Claude's context window
const CHARS_PER_TOKEN_ESTIMATE = 4;
const MAX_CHARS = MAX_TOKENS_PER_REQUEST * CHARS_PER_TOKEN_ESTIMATE;

async function reviewLargePR(files) {
  // Sort by priority: security-sensitive files first
  const prioritized = files.sort((a, b) => {
    const securityFiles = /auth|security|crypto|token|password|secret/i;
    const aIsSecurity = securityFiles.test(a.path);
    const bIsSecurity = securityFiles.test(b.path);
    if (aIsSecurity && !bIsSecurity) return -1;
    if (!aIsSecurity && bIsSecurity) return 1;
    return b.addedLines - a.addedLines; // Then by size
  });

  const batches = [];
  let currentBatch = [];
  let currentSize = 0;

  for (const file of prioritized) {
    const fileSize = file.diff.length;
    if (currentSize + fileSize > MAX_CHARS && currentBatch.length > 0) {
      batches.push(currentBatch);
      currentBatch = [file];
      currentSize = fileSize;
    } else {
      currentBatch.push(file);
      currentSize += fileSize;
    }
  }
  if (currentBatch.length > 0) batches.push(currentBatch);

  console.log(`Splitting ${files.length} files into ${batches.length} review batches`);

  // Review each batch
  const reviews = await Promise.all(
    batches.map((batch, i) => {
      console.log(`Reviewing batch ${i + 1}/${batches.length} (${batch.length} files)`);
      return reviewWithClaude(buildReviewPrompt(batch));
    })
  );

  // Merge results
  return mergeReviews(reviews);
}

function mergeReviews(reviews) {
  const allIssues = reviews.flatMap(r => r.issues || []);
  const avgScore = reviews.reduce((sum, r) => sum + r.score, 0) / reviews.length;
  const hasBlockers = reviews.some(r => r.blockers);

  return {
    summary: reviews.map(r => r.summary).join(' '),
    score: Math.round(avgScore),
    issues: allIssues,
    positives: reviews.flatMap(r => r.positives || []),
    blockers: hasBlockers,
  };
}
Enter fullscreen mode Exit fullscreen mode

Step 6: Inline Review Comments

Instead of (or in addition to) a summary comment, you can post inline comments on specific lines:

async function postInlineComments(owner, repo, prNumber, commitSha, issues) {
  // Create a review with inline comments
  const reviewComments = issues
    .filter(issue => issue.file && issue.line)
    .map(issue => ({
      path: issue.file,
      line: issue.line,
      side: 'RIGHT',
      body: `**${issue.severity}: ${issue.title}**\n\n${issue.description}\n\n${
        issue.suggestion ? `**Fix:** ${issue.suggestion}` : ''
      }`,
    }));

  if (reviewComments.length === 0) return;

  await octokit.rest.pulls.createReview({
    owner,
    repo,
    pull_number: prNumber,
    commit_id: commitSha,
    event: 'COMMENT', // 'APPROVE' or 'REQUEST_CHANGES' for stronger feedback
    comments: reviewComments,
  });
}
Enter fullscreen mode Exit fullscreen mode

Step 7: Cost Controls

Claude API calls cost money. Here's how to keep costs predictable:

// Estimate tokens before sending
function estimateTokens(text) {
  // Rough estimate: 1 token ≈ 4 characters
  return Math.ceil(text.length / 4);
}

// Cost per 1M tokens (claude-opus-4-5 as of 2024)
const COST_PER_1M_INPUT = 15.00;
const COST_PER_1M_OUTPUT = 75.00;

function estimateCost(inputTokens, outputTokens = 2000) {
  const inputCost = (inputTokens / 1_000_000) * COST_PER_1M_INPUT;
  const outputCost = (outputTokens / 1_000_000) * COST_PER_1M_OUTPUT;
  return (inputCost + outputCost).toFixed(4);
}

// Skip review if diff is too large (likely auto-generated)
const MAX_REVIEW_COST_USD = 0.50;

async function shouldReview(files) {
  const totalChars = files.reduce((sum, f) => sum + f.diff.length, 0);
  const estimatedTokens = Math.ceil(totalChars / 4);
  const estimatedCost = parseFloat(estimateCost(estimatedTokens));

  if (estimatedCost > MAX_REVIEW_COST_USD) {
    console.warn(`Estimated cost $${estimatedCost} exceeds limit $${MAX_REVIEW_COST_USD}`);
    console.warn('Skipping full review — will review only changed files over 10 lines');
    return false;
  }

  console.log(`Estimated review cost: $${estimatedCost}`);
  return true;
}
Enter fullscreen mode Exit fullscreen mode

Advanced: Caching Reviews for Unchanged Files

If a PR is updated with only minor changes, avoid re-reviewing files that haven't changed:

import crypto from 'crypto';

function hashDiff(diff) {
  return crypto.createHash('sha256').update(diff).digest('hex').slice(0, 16);
}

// Store in GitHub Actions cache or external store
const reviewCache = new Map();

async function getCachedOrReview(file) {
  const cacheKey = hashDiff(file.diff);

  if (reviewCache.has(cacheKey)) {
    console.log(`Cache hit for ${file.path}`);
    return reviewCache.get(cacheKey);
  }

  const review = await reviewWithClaude(buildReviewPrompt([file]));
  reviewCache.set(cacheKey, review);
  return review;
}
Enter fullscreen mode Exit fullscreen mode

Putting It All Together: The Complete Workflow

Here's the full picture of what we've built:

PR Opened/Updated
      ↓
GitHub Actions Triggers
      ↓
Checkout Code + Install deps
      ↓
git diff baseSha..headSha
      ↓
Parse diff → Filter files (skip binaries, lock files, generated)
      ↓
Prioritize files (security-sensitive first)
      ↓
Batch if too large → Send to Claude API
      ↓
Parse JSON response
      ↓
Format markdown comment
      ↓
Post/Update PR comment via GitHub API
      ↓
Fail check if CRITICAL blockers found
Enter fullscreen mode Exit fullscreen mode

Real-World Prompt That Catches Real Bugs

After months of refinement, this is the prompt structure that catches the most issues:

const SYSTEM_PROMPT = `You are a senior software engineer with 15 years of experience.
You have deep expertise in security, performance, and system design.
You are thorough but not pedantic — you focus on issues that actually matter.
You always provide actionable feedback with concrete examples.
You respond only with valid JSON, never markdown prose outside the JSON.`;

const diffPrompt = (files) => `
Review the following code changes for a production system serving millions of users.

Critical areas to check:
- Any change to authentication or authorization logic
- Database queries (especially with user-supplied data)
- File system operations
- External API calls and their error handling
- Cryptographic operations
- Race conditions in async code

${buildFileSection(files)}

Remember: A false negative (missing a real bug) is far worse than a false positive.
When in doubt, flag it.
`;
Enter fullscreen mode Exit fullscreen mode

Conclusion

You now have a fully functional AI code review bot that:

  • Automatically reviews every pull request in under 2 minutes
  • Catches security vulnerabilities, logic bugs, and performance issues
  • Posts structured, actionable feedback directly in GitHub
  • Handles large PRs through intelligent batching
  • Controls costs with token estimation and caching

The key to making this bot useful rather than noisy is prompt engineering. Start with broad analysis, measure which issue types are most actionable for your team, and iteratively refine your prompts.

The complete code is available as a GitHub template repository — fork it and have reviews running in minutes.


Wilson Xu is a software engineer specializing in developer tooling and AI integrations. Find him on GitHub at @chengyixu.

Top comments (0)