Charlie Li

Posted on Apr 15 • Edited on Apr 18

I Replaced Our $300/month Code Review Tool with a 50-Line GitHub Action

#ai #github #codereview #devops

Last month, our 8-person team was paying $288/month for CodeRabbit ($19/seat × 8 devs × 2 repos).

The reviews were good. But $3,456/year for "this function is too long" and "consider adding error handling"? I started wondering what we were actually paying for.

So I built a replacement using GitHub Actions + OpenAI API. Total cost: $4.20 last month. It catches real bugs, not just style nits.

Here's the exact setup.

The 50-Line Config

Create .github/workflows/ai-review.yml:

name: AI Code Review

on:
  pull_request:
    types: [opened, synchronize, reopened]

permissions:
  contents: read
  pull-requests: write

jobs:
  ai-review:
    runs-on: ubuntu-latest
    if: github.event.pull_request.draft == false
    steps:
      - name: Checkout
        uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Get changed files
        id: diff
        run: |
          git diff origin/${{ github.event.pull_request.base.ref }}...HEAD \
            --name-only --diff-filter=ACMR \
            -- '*.ts' '*.tsx' '*.js' '*.jsx' '*.py' '*.go' '*.rs' \
            | head -20 > changed_files.txt
          echo "count=$(wc -l < changed_files.txt)" >> $GITHUB_OUTPUT

      - name: Get diff content
        if: steps.diff.outputs.count > 0
        run: |
          git diff origin/${{ github.event.pull_request.base.ref }}...HEAD \
            -- $(cat changed_files.txt | tr '\n' ' ') > pr_diff.txt

      - name: AI Review
        if: steps.diff.outputs.count > 0
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          PR_TITLE: ${{ github.event.pull_request.title }}
          PR_BODY: ${{ github.event.pull_request.body }}
        run: |
          DIFF=$(head -c 30000 pr_diff.txt)
          RESPONSE=$(curl -s https://api.openai.com/v1/chat/completions \
            -H "Authorization: Bearer $OPENAI_API_KEY" \
            -H "Content-Type: application/json" \
            -d "$(jq -n \
              --arg diff "$DIFF" \
              --arg title "$PR_TITLE" \
              --arg body "$PR_BODY" \
              '{
                model: "gpt-4o-mini",
                messages: [{
                  role: "system",
                  content: "You are a senior engineer doing code review. Be concise. Focus on: bugs, security issues, performance problems, and logic errors. Skip style/formatting. For each issue, cite the exact file and line. If the code is fine, say so briefly."
                }, {
                  role: "user",
                  content: "PR: \($title)\n\nDescription: \($body)\n\nDiff:\n\($diff)"
                }],
                max_tokens: 1500,
                temperature: 0.1
              }')" )
          echo "$RESPONSE" | jq -r '.choices[0].message.content' > review.md

      - name: Post Review Comment
        if: steps.diff.outputs.count > 0
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: |
          REVIEW=$(cat review.md)
          gh pr comment ${{ github.event.pull_request.number }} \
            --body "## 🤖 AI Code Review

          $REVIEW

          ---
          *Powered by GPT-4o-mini · [How this works](https://1611833802030.gumroad.com/l/ai-code-review-guide)*"

That's it. Push this file, and every PR gets an AI review.

What This Actually Catches

After 3 weeks and ~120 PRs, here's what surprised me:

Real bugs it caught:

A Promise.all that silently swallowed rejections (would have caused data loss)
An SQL query with string interpolation instead of parameterized queries (SQL injection)
A race condition in a cache invalidation handler
An off-by-one error in a pagination endpoint (returning 11 items for ?limit=10)

What it (correctly) ignores:

Variable naming preferences
Import ordering
"Consider using optional chaining" type suggestions

The key is the system prompt. Notice: "Skip style/formatting". This one instruction eliminates 80% of the noise that makes commercial tools annoying.

Cost Breakdown

Here's our actual usage for March 2026:

Metric	Value
PRs reviewed	124
Avg tokens/review	~3,200 input + ~800 output
Model	GPT-4o-mini
Cost per review	~$0.034
Monthly total	$4.20

For comparison:

CodeRabbit: $19/seat/month → $152/month (8 devs)
Bito: $15/seat/month → $120/month
Greptile: $24/seat/month → $192/month
This setup: $4.20/month (flat, not per-seat)

That's a 97% reduction.

5 Optimizations That Actually Matter

1. Filter files aggressively

The --diff-filter=ACMR flag already skips deleted files. But also skip files that don't need review:

-- '*.ts' '*.tsx' '*.js' '*.jsx' '*.py' '*.go' '*.rs'

This skips .md, .json, .yaml, config files, and lock files. Saves ~40% on tokens.

2. Cap the diff size

DIFF=$(head -c 30000 pr_diff.txt)

30KB covers most PRs. For mega-PRs (1000+ lines), you'd need a chunking strategy — but honestly, mega-PRs are a process problem, not a tooling problem.

3. Use GPT-4o-mini, not GPT-4o

For code review, gpt-4o-mini catches 90%+ of what gpt-4o catches, at 1/15th the cost. The quality difference is mostly in nuanced architectural suggestions — not in bug detection.

Save gpt-4o for security-critical repos.

4. Set temperature to 0.1

"temperature": 0.1

Low temperature = consistent, deterministic reviews. You don't want creative writing in code review.

5. Skip draft PRs

if: github.event.pull_request.draft == false

Don't waste API calls on work-in-progress.

What This Doesn't Do (And Whether You Should Care)

Honest limitations:

Feature	Commercial Tools	This Setup
Inline comments on specific lines	✅	❌ (PR-level comment)
Auto-fix PRs	Some	❌ (but achievable*)
Learning from your codebase	✅	❌
Dashboard / analytics	✅	❌
SOC 2 / enterprise compliance	✅	❌

*Auto-fix is doable — you generate a patch and create a PR. I cover this in the full guide (Chapter 6).

My take: If you're a 5-50 person team, the basic setup above covers 80% of the value. Inline comments are nice but not essential. If you're 100+ engineers with compliance requirements, CodeRabbit is probably worth it.

Switching to Claude (Alternative)

Prefer Anthropic? Swap the API call:

RESPONSE=$(curl -s https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d "$(jq -n \
    --arg diff "$DIFF" \
    --arg title "$PR_TITLE" \
    '{
      model: "claude-sonnet-4-5-20250514",
      max_tokens: 1500,
      system: "You are a senior engineer doing code review. Be concise. Focus on bugs, security issues, performance problems, and logic errors. Skip style/formatting.",
      messages: [{
        role: "user",
        content: "PR: \($title)\n\nDiff:\n\($diff)"
      }]
    }')")
echo "$RESPONSE" | jq -r '.content[0].text' > review.md

Claude tends to be more thorough on security issues. GPT-4o-mini is faster and cheaper for general review.

The Result

3 weeks in:

$4.20/month vs $288/month (saved $283.80)
2 real bugs caught that humans missed (the SQL injection alone justified the effort)
Review comments dropped from an average of 12 (CodeRabbit, mostly noise) to 3-4 (focused, actionable)
Setup time: 5 minutes

The biggest win wasn't cost — it was signal-to-noise ratio. When every AI comment is worth reading, developers actually read them.