Last month, our team was paying $600/month for an AI code review tool. 10 developers × $60/seat.
The tool was fine. But I started wondering: these tools just call OpenAI's API and post comments on PRs. The API costs ~$0.02 per review. We were paying $60/seat for a glorified API wrapper.
So I built our own. It took 5 minutes. It costs ~$8/month for the entire team.
Here's the exact setup.
The 30-Line Workflow
Create .github/workflows/ai-review.yml:
name: AI Code Review
on:
pull_request:
types: [opened, synchronize]
jobs:
review:
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Get changed code
id: diff
run: |
DIFF=$(git diff origin/${{ github.base_ref }}...HEAD -- \
'*.py' '*.js' '*.ts' '*.go' '*.java' '*.rs' \
':!*.min.js' ':!*lock*' ':!*.generated.*')
# Truncate to avoid token limits
echo "diff<<EOF" >> $GITHUB_OUTPUT
echo "$DIFF" | head -c 12000 >> $GITHUB_OUTPUT
echo "EOF" >> $GITHUB_OUTPUT
- name: AI Review
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
REVIEW=$(curl -s https://api.openai.com/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d "{
\"model\": \"gpt-4o-mini\",
\"messages\": [{
\"role\": \"system\",
\"content\": \"You are a senior code reviewer. Review for: 1) Bugs and logic errors 2) Security vulnerabilities (OWASP Top 10) 3) Performance issues 4) Code style violations. Be specific with line numbers. Use GitHub markdown.\"
},{
\"role\": \"user\",
\"content\": $(echo '${{ steps.diff.outputs.diff }}' | jq -Rs .)
}],
\"max_tokens\": 2000
}" | jq -r '.choices[0].message.content')
gh pr comment ${{ github.event.pull_request.number }} \
--body "## 🤖 AI Code Review
$REVIEW
---
*Powered by gpt-4o-mini · [How this works](https://1611833802030.gumroad.com/l/ai-code-review-cheat-sheet)*"
That's it. Every PR now gets an AI review automatically.
What This Actually Catches
After 3 months and ~400 PRs, here's what our AI reviewer found that humans missed:
| Issue Type | Caught | Example |
|---|---|---|
| SQL Injection | 12 | Raw string interpolation in queries |
| Missing null checks | 47 |
user.profile.name without optional chaining |
| N+1 queries | 8 | Loop inside a loop hitting the database |
| Hardcoded secrets | 3 | API keys in test files |
| Race conditions | 5 | Concurrent map access in Go |
The AI doesn't replace human reviewers. But it catches the "obvious" stuff that humans skim over when they're reviewing 15 PRs before lunch.
Cost Breakdown: DIY vs. Commercial Tools
| Solution | 10-Dev Team Cost | Per-Review Cost | Setup Time |
|---|---|---|---|
| This workflow | ~$8/mo | $0.02 | 5 minutes |
| CodeRabbit | $600/mo | Included | 10 minutes |
| Bito | $400/mo | Included | 15 minutes |
| Greptile | $500/mo | Included | 20 minutes |
| Sourcery | $480/mo | Included | 10 minutes |
Why is it so cheap? GPT-4o-mini costs $0.15/1M input tokens. A typical PR diff is ~2,000 tokens. That's $0.0003 for the input + ~$0.006 for the output. Under $0.01 per review.
Even if you review 50 PRs/day, that's $15/month. Not $600.
3 Upgrades Worth Adding
1. Security-Focused Prompt
Replace the system message for repos that handle user data:
Focus exclusively on security:
- SQL injection (parameterized queries?)
- XSS (output encoding?)
- Authentication bypass
- Hardcoded credentials
- Insecure deserialization
- SSRF potential
Flag severity as 🔴 Critical / 🟡 Warning / 🟢 Info
2. Skip Bot-Generated Code
Add a path filter to avoid reviewing auto-generated files:
- name: Get changed code
run: |
DIFF=$(git diff origin/${{ github.base_ref }}...HEAD -- \
'*.py' '*.js' '*.ts' \
':!*.generated.*' ':!*.min.*' ':!*lock*' \
':!*migration*' ':!*__snapshots__*' \
':!vendor/*' ':!node_modules/*')
3. Use Claude for Complex Reviews
For architecture reviews or large PRs, Claude Sonnet gives more nuanced feedback:
curl -s https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d "{
\"model\": \"claude-sonnet-4-20250514\",
\"max_tokens\": 2000,
\"messages\": [{
\"role\": \"user\",
\"content\": \"Review this code diff for architectural concerns, not just bugs:\n\n$DIFF\"
}]
}"
Common Pitfalls
"The diff is too big" — Truncate to 12K chars (line 14 above). For bigger PRs, split by file and review each separately. The full guide covers a chunking strategy.
"Reviews are too generic" — Add your team's coding standards to the system prompt. "We use Python 3.12+, prefer dataclasses over dicts, and require type hints on all public functions."
"It comments on every PR including dependabot" — Add a condition:
if: github.actor != 'dependabot[bot]'
"Rate limiting" — OpenAI's rate limits are generous (500 RPM for gpt-4o-mini). If you hit them, add a retry with exponential backoff.
Want the Full Playbook?
This article covers the basics. I wrote a 208-page guide that goes deep into:
- 🔧 Auto-fix PRs — AI doesn't just find bugs, it opens fix PRs automatically
- 🔒 Security automation — OWASP Top 10 scanning on every commit
- 🏢 Multi-repo strategies — monorepo and cross-repo review patterns
- 💰 Cost optimization — from $0.02/review down to $0.005/review
- 📊 ROI quantification — prove the value to your manager (real case: 1,018% ROI)
- 👥 Team standards as code — encode your style guide into AI rules
📥 Grab the free cheat sheet first — it's a one-page PDF with the workflow above plus model selection guide and cost calculator.
If you want everything: AI Code Review: The Practical Guide — $9.99
Top comments (0)