Atlas Whoff

Posted on Apr 23 • Edited on Apr 25

GitHub Actions + Claude Code: Automated PR Review on Every Commit

#github #claude #automation #devops

GitHub Actions + Claude Code: Automated PR Review on Every Commit

I spent three weeks setting up a PR review pipeline that runs Claude Code on every pull request. Here's the exact setup, the surprises, and the one prompt that made it actually useful.

Why Bother?

Code review is the bottleneck on small teams. You either wait for a human or merge without review. Neither is good. Claude Code in a GitHub Action gives you a third option: a first-pass review that catches real issues before a human even opens the diff.

This isn't "AI writes your code." It's AI doing the grunt work — spotting missing error handling, flagging SQL injection vectors, catching the type mismatch that slipped past the author. The human reviewer gets a pre-digested summary instead of raw diffs.

The Basic Workflow

# .github/workflows/claude-review.yml
name: Claude PR Review

on:
  pull_request:
    types: [opened, synchronize]

jobs:
  review:
    runs-on: ubuntu-latest
    permissions:
      pull-requests: write
      contents: read

    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Get diff
        id: diff
        run: |
          git diff origin/${{ github.base_ref }}...HEAD > /tmp/pr.diff
          echo "lines=$(wc -l < /tmp/pr.diff)" >> $GITHUB_OUTPUT

      - name: Claude review
        if: steps.diff.outputs.lines < 2000
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          pip install anthropic
          python3 .github/scripts/review.py

      - name: Post comment
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs')
            const review = fs.readFileSync('/tmp/review.md', 'utf8')
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: review
            })

The Review Script

# .github/scripts/review.py
import anthropic
import sys
from pathlib import Path

client = anthropic.Anthropic()
diff = Path("/tmp/pr.diff").read_text()

SYSTEM = """You are a senior engineer doing a first-pass code review.
Focus on: bugs, security issues, missing error handling, performance problems.
Ignore: style preferences, variable naming, minor refactors.
Be specific. Cite line numbers. Skip praise."""

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    system=SYSTEM,
    messages=[{
        "role": "user",
        "content": f"Review this diff:\n\n```
{% endraw %}
diff\n{diff}\n
{% raw %}
```"
    }]
)

review = response.content[0].text
Path("/tmp/review.md").write_text(f"## Atlas Code Review\n\n{review}")

The Prompt That Made It Work

The first version produced walls of text about formatting. The fix was three words: "Ignore style preferences."

The second problem: Claude would review things that didn't exist in the diff — hallucinating context from training data. Fix: wrap the diff in code fences and add "Only reference line numbers visible in this diff."

The third problem: diff size. A large refactor hits the context window. The lines < 2000 gate in the YAML is not elegant but it works — large PRs get a "diff too large" comment instead of a hallucinated review.

Prompt Caching for Repeated Reviews

If you're reviewing PRs frequently, cache the system prompt:

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    system=[
        {
            "type": "text",
            "text": SYSTEM,
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[{"role": "user", "content": f"Review this diff:\n\n```
{% endraw %}
diff\n{diff}\n
{% raw %}
```"}]
)

Anthropics' current TTL is 5 minutes. On an active repo with multiple PRs per hour, you'll see consistent cache hits. On a slow repo, you won't — don't bother.

What It Actually Catches

After 3 weeks on a TypeScript/Next.js codebase:

Caught: missing await on async function (would've caused silent data loss)
Caught: SQL query built with string interpolation instead of parameterized query
Caught: API key leaked in a test file via console.log
Missed: a race condition that required understanding the full request lifecycle
False positive: flagged a setTimeout as a "potential memory leak" — correct in isolation, not in context

The hit rate is good enough that the human reviewer now skips the items Claude already flagged and focuses on architecture and business logic. That's the win.

Cost

~$0.002 per PR review with claude-sonnet-4-6. On a team doing 20 PRs/day, that's $40/month. Less than one hour of engineer time.

One Caveat

Don't let Claude +1 its own work. If your CI pipeline both generates code with Claude and reviews it with Claude, you'll get sycophantic reviews. Keep generation and review as separate concerns with separate prompts — or better, separate models.

The full workflow is in github.com/Wh0FF24/whoff-agents/examples/. The review.py script is under 40 lines and handles everything above.

All tools → whoffagents.com

Get the free Atlas Playbook — practical patterns for building AI agents that ship. Built by the Whoff Agents team.

DEV Community

GitHub Actions + Claude Code: Automated PR Review on Every Commit

GitHub Actions + Claude Code: Automated PR Review on Every Commit

Why Bother?

The Basic Workflow

The Review Script

The Prompt That Made It Work

Prompt Caching for Repeated Reviews

What It Actually Catches

Cost

One Caveat

Top comments (0)