GitHub Actions + Claude Code: Automated PR Review on Every Commit
I spent three weeks setting up a PR review pipeline that runs Claude Code on every pull request. Here's the exact setup, the surprises, and the one prompt that made it actually useful.
Why Bother?
Code review is the bottleneck on small teams. You either wait for a human or merge without review. Neither is good. Claude Code in a GitHub Action gives you a third option: a first-pass review that catches real issues before a human even opens the diff.
This isn't "AI writes your code." It's AI doing the grunt work — spotting missing error handling, flagging SQL injection vectors, catching the type mismatch that slipped past the author. The human reviewer gets a pre-digested summary instead of raw diffs.
The Basic Workflow
# .github/workflows/claude-review.yml
name: Claude PR Review
on:
pull_request:
types: [opened, synchronize]
jobs:
review:
runs-on: ubuntu-latest
permissions:
pull-requests: write
contents: read
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Get diff
id: diff
run: |
git diff origin/${{ github.base_ref }}...HEAD > /tmp/pr.diff
echo "lines=$(wc -l < /tmp/pr.diff)" >> $GITHUB_OUTPUT
- name: Claude review
if: steps.diff.outputs.lines < 2000
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
pip install anthropic
python3 .github/scripts/review.py
- name: Post comment
uses: actions/github-script@v7
with:
script: |
const fs = require('fs')
const review = fs.readFileSync('/tmp/review.md', 'utf8')
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: review
})
The Review Script
# .github/scripts/review.py
import anthropic
import sys
from pathlib import Path
client = anthropic.Anthropic()
diff = Path("/tmp/pr.diff").read_text()
SYSTEM = """You are a senior engineer doing a first-pass code review.
Focus on: bugs, security issues, missing error handling, performance problems.
Ignore: style preferences, variable naming, minor refactors.
Be specific. Cite line numbers. Skip praise."""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
system=SYSTEM,
messages=[{
"role": "user",
"content": f"Review this diff:\n\n```
{% endraw %}
diff\n{diff}\n
{% raw %}
```"
}]
)
review = response.content[0].text
Path("/tmp/review.md").write_text(f"## Atlas Code Review\n\n{review}")
The Prompt That Made It Work
The first version produced walls of text about formatting. The fix was three words: "Ignore style preferences."
The second problem: Claude would review things that didn't exist in the diff — hallucinating context from training data. Fix: wrap the diff in code fences and add "Only reference line numbers visible in this diff."
The third problem: diff size. A large refactor hits the context window. The lines < 2000 gate in the YAML is not elegant but it works — large PRs get a "diff too large" comment instead of a hallucinated review.
Prompt Caching for Repeated Reviews
If you're reviewing PRs frequently, cache the system prompt:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
system=[
{
"type": "text",
"text": SYSTEM,
"cache_control": {"type": "ephemeral"}
}
],
messages=[{"role": "user", "content": f"Review this diff:\n\n```
{% endraw %}
diff\n{diff}\n
{% raw %}
```"}]
)
Anthropics' current TTL is 5 minutes. On an active repo with multiple PRs per hour, you'll see consistent cache hits. On a slow repo, you won't — don't bother.
What It Actually Catches
After 3 weeks on a TypeScript/Next.js codebase:
-
Caught: missing
awaiton async function (would've caused silent data loss) - Caught: SQL query built with string interpolation instead of parameterized query
-
Caught: API key leaked in a test file via
console.log - Missed: a race condition that required understanding the full request lifecycle
-
False positive: flagged a
setTimeoutas a "potential memory leak" — correct in isolation, not in context
The hit rate is good enough that the human reviewer now skips the items Claude already flagged and focuses on architecture and business logic. That's the win.
Cost
~$0.002 per PR review with claude-sonnet-4-6. On a team doing 20 PRs/day, that's $40/month. Less than one hour of engineer time.
One Caveat
Don't let Claude +1 its own work. If your CI pipeline both generates code with Claude and reviews it with Claude, you'll get sycophantic reviews. Keep generation and review as separate concerns with separate prompts — or better, separate models.
The full workflow is in github.com/Wh0FF24/whoff-agents/examples/. The review.py script is under 40 lines and handles everything above.
All tools → whoffagents.com
Top comments (0)