Last month, our 8-person team was paying $288/month for CodeRabbit ($19/seat × 8 devs × 2 repos).
The reviews were good. But $3,456/year for "this function is too long" and "consider adding error handling"? I started wondering what we were actually paying for.
So I built a replacement using GitHub Actions + OpenAI API. Total cost: $4.20 last month. It catches real bugs, not just style nits.
Here's the exact setup.
The 50-Line Config
Create .github/workflows/ai-review.yml:
name: AI Code Review
on:
pull_request:
types: [opened, synchronize, reopened]
permissions:
contents: read
pull-requests: write
jobs:
ai-review:
runs-on: ubuntu-latest
if: github.event.pull_request.draft == false
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Get changed files
id: diff
run: |
git diff origin/${{ github.event.pull_request.base.ref }}...HEAD \
--name-only --diff-filter=ACMR \
-- '*.ts' '*.tsx' '*.js' '*.jsx' '*.py' '*.go' '*.rs' \
| head -20 > changed_files.txt
echo "count=$(wc -l < changed_files.txt)" >> $GITHUB_OUTPUT
- name: Get diff content
if: steps.diff.outputs.count > 0
run: |
git diff origin/${{ github.event.pull_request.base.ref }}...HEAD \
-- $(cat changed_files.txt | tr '\n' ' ') > pr_diff.txt
- name: AI Review
if: steps.diff.outputs.count > 0
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
PR_TITLE: ${{ github.event.pull_request.title }}
PR_BODY: ${{ github.event.pull_request.body }}
run: |
DIFF=$(head -c 30000 pr_diff.txt)
RESPONSE=$(curl -s https://api.openai.com/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d "$(jq -n \
--arg diff "$DIFF" \
--arg title "$PR_TITLE" \
--arg body "$PR_BODY" \
'{
model: "gpt-4o-mini",
messages: [{
role: "system",
content: "You are a senior engineer doing code review. Be concise. Focus on: bugs, security issues, performance problems, and logic errors. Skip style/formatting. For each issue, cite the exact file and line. If the code is fine, say so briefly."
}, {
role: "user",
content: "PR: \($title)\n\nDescription: \($body)\n\nDiff:\n\($diff)"
}],
max_tokens: 1500,
temperature: 0.1
}')" )
echo "$RESPONSE" | jq -r '.choices[0].message.content' > review.md
- name: Post Review Comment
if: steps.diff.outputs.count > 0
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
REVIEW=$(cat review.md)
gh pr comment ${{ github.event.pull_request.number }} \
--body "## 🤖 AI Code Review
$REVIEW
---
*Powered by GPT-4o-mini · [How this works](https://1611833802030.gumroad.com/l/ai-code-review-guide)*"
That's it. Push this file, and every PR gets an AI review.
What This Actually Catches
After 3 weeks and ~120 PRs, here's what surprised me:
Real bugs it caught:
- A
Promise.allthat silently swallowed rejections (would have caused data loss) - An SQL query with string interpolation instead of parameterized queries (SQL injection)
- A race condition in a cache invalidation handler
- An off-by-one error in a pagination endpoint (returning 11 items for
?limit=10)
What it (correctly) ignores:
- Variable naming preferences
- Import ordering
- "Consider using optional chaining" type suggestions
The key is the system prompt. Notice: "Skip style/formatting". This one instruction eliminates 80% of the noise that makes commercial tools annoying.
Cost Breakdown
Here's our actual usage for March 2026:
| Metric | Value |
|---|---|
| PRs reviewed | 124 |
| Avg tokens/review | ~3,200 input + ~800 output |
| Model | GPT-4o-mini |
| Cost per review | ~$0.034 |
| Monthly total | $4.20 |
For comparison:
- CodeRabbit: $19/seat/month → $152/month (8 devs)
- Bito: $15/seat/month → $120/month
- Greptile: $24/seat/month → $192/month
- This setup: $4.20/month (flat, not per-seat)
That's a 97% reduction.
5 Optimizations That Actually Matter
1. Filter files aggressively
The --diff-filter=ACMR flag already skips deleted files. But also skip files that don't need review:
-- '*.ts' '*.tsx' '*.js' '*.jsx' '*.py' '*.go' '*.rs'
This skips .md, .json, .yaml, config files, and lock files. Saves ~40% on tokens.
2. Cap the diff size
DIFF=$(head -c 30000 pr_diff.txt)
30KB covers most PRs. For mega-PRs (1000+ lines), you'd need a chunking strategy — but honestly, mega-PRs are a process problem, not a tooling problem.
3. Use GPT-4o-mini, not GPT-4o
For code review, gpt-4o-mini catches 90%+ of what gpt-4o catches, at 1/15th the cost. The quality difference is mostly in nuanced architectural suggestions — not in bug detection.
Save gpt-4o for security-critical repos.
4. Set temperature to 0.1
"temperature": 0.1
Low temperature = consistent, deterministic reviews. You don't want creative writing in code review.
5. Skip draft PRs
if: github.event.pull_request.draft == false
Don't waste API calls on work-in-progress.
What This Doesn't Do (And Whether You Should Care)
Honest limitations:
| Feature | Commercial Tools | This Setup |
|---|---|---|
| Inline comments on specific lines | ✅ | ❌ (PR-level comment) |
| Auto-fix PRs | Some | ❌ (but achievable*) |
| Learning from your codebase | ✅ | ❌ |
| Dashboard / analytics | ✅ | ❌ |
| SOC 2 / enterprise compliance | ✅ | ❌ |
*Auto-fix is doable — you generate a patch and create a PR. I cover this in the full guide (Chapter 6).
My take: If you're a 5-50 person team, the basic setup above covers 80% of the value. Inline comments are nice but not essential. If you're 100+ engineers with compliance requirements, CodeRabbit is probably worth it.
Switching to Claude (Alternative)
Prefer Anthropic? Swap the API call:
RESPONSE=$(curl -s https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "Content-Type: application/json" \
-d "$(jq -n \
--arg diff "$DIFF" \
--arg title "$PR_TITLE" \
'{
model: "claude-sonnet-4-5-20250514",
max_tokens: 1500,
system: "You are a senior engineer doing code review. Be concise. Focus on bugs, security issues, performance problems, and logic errors. Skip style/formatting.",
messages: [{
role: "user",
content: "PR: \($title)\n\nDiff:\n\($diff)"
}]
}')")
echo "$RESPONSE" | jq -r '.content[0].text' > review.md
Claude tends to be more thorough on security issues. GPT-4o-mini is faster and cheaper for general review.
The Result
3 weeks in:
- $4.20/month vs $288/month (saved $283.80)
- 2 real bugs caught that humans missed (the SQL injection alone justified the effort)
- Review comments dropped from an average of 12 (CodeRabbit, mostly noise) to 3-4 (focused, actionable)
- Setup time: 5 minutes
The biggest win wasn't cost — it was signal-to-noise ratio. When every AI comment is worth reading, developers actually read them.
Want the Full Playbook?
This article covers the basics. The complete 208-page guide includes:
- Auto-fix PRs — AI generates patches, not just suggestions (Chapter 6)
- Security automation — OWASP Top 10 scanning on every PR (Chapter 5)
-
Team standards as code —
.review-rules.ymlthat AI enforces (Chapter 7) - Multi-repo & monorepo strategies (Chapter 8)
- Cost optimization — caching, batching, incremental review (Chapter 9)
- ROI templates for convincing your manager (real case: 1,018% ROI)
📘 AI Code Review: The Practical Guide — $9.99 on Gumroad
New to AI code review? Start with the free cheat sheet — one-page config + prompt templates.
Top comments (0)