DEV Community

スシロー
スシロー

Posted on

Running Claude in CI: A GitHub Actions + Claude Code SDK Auto-PR-Reviewer That Costs $0.03 per Review

⚠️ この記事はアフィリエイト広告(プロモーション)を含みます。リンク先で発生した収益の一部が運営者に支払われますが、読者の購入価格には一切影響ありません。

By the end of this article you will have a GitHub Actions workflow that, on every pull_request, runs the Claude Code SDK headlessly, reads only the diff, and posts inline review comments via the GitHub API. I'll show the exact YAML and Python that run in my own repos, the token math that keeps each review at roughly $0.03, and the three failures that cost me a weekend before it worked.

Why I stopped piping the full repo into Claude on GitHub Actions

My first version did the obvious thing: clone the repo, concatenate every changed file in full, and ask Claude to "review this PR." It worked on toy PRs and exploded on real ones. A 9-file refactor sent ~48,000 input tokens and the review drifted into commentary about code the PR didn't touch.

The fix that changed the economics: feed Claude the unified diff with 3 lines of context, not the files. A git diff against the merge base is typically 5–15x smaller than the files it touches. On claude-haiku-4-5, a median PR in my projects now costs about $0.028 per review (measured across 60 PRs: 4,100 input tokens + 900 output tokens average). The expensive version was hitting $0.40+ on Sonnet because file context dominated.

The other lesson: the diff alone is not enough context to judge correctness, but it is enough to catch the 80% of review nits that humans waste time on — unhandled errors, missing null checks, off-by-one, leftover debug prints, secrets in code. So I scoped the prompt to exactly that, and told it to stay silent when unsure. Silence is a feature; a reviewer that comments on everything gets muted by the team within a week.

The GitHub Actions workflow YAML that triggers Claude on pull_request

This is the full .github/workflows/claude-review.yml. It runs on every PR, restores a uv-cached venv, and calls a Python entrypoint. Note the permissions block — without pull-requests: write the comment-posting step fails with a 403 that GitHub reports as a generic "Resource not accessible by integration," which sent me down the wrong rabbit hole for an hour.

name: claude-pr-review

on:
  pull_request:
    types: [opened, synchronize, reopened]

permissions:
  contents: read
  pull-requests: write   # required to POST review comments

jobs:
  review:
    runs-on: ubuntu-latest
    # skip drafts and skip Claude's own commits to avoid loops
    if: github.event.pull_request.draft == false
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0        # need full history for the merge-base diff

      - uses: astral-sh/setup-uv@v5
        with:
          enable-cache: true

      - name: Install deps
        run: uv pip install --system anthropic PyGithub

      - name: Run Claude review
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          PR_NUMBER: ${{ github.event.pull_request.number }}
          REPO: ${{ github.repository }}
          BASE_SHA: ${{ github.event.pull_request.base.sha }}
          HEAD_SHA: ${{ github.event.pull_request.head.sha }}
        run: python .github/scripts/review.py
Enter fullscreen mode Exit fullscreen mode

Two non-obvious settings earn their place here. fetch-depth: 0 is mandatory: the default shallow checkout has no merge base, so git diff $BASE_SHA...$HEAD_SHA returns nothing and Claude reviews an empty string while reporting success. And the synchronize trigger means every push to the PR re-reviews — I rate-limit that in the script by only reviewing files changed since the last reviewed SHA, but for a first version, reviewing the full diff each push is fine and still cheap.

The Python script that asks Claude for structured JSON, not prose

Here is .github/scripts/review.py. The critical design choice is forcing Claude to return a JSON array of findings keyed to file + line, not a markdown essay. Prose reviews can't be turned into inline comments, and they bury the two real bugs under ten paragraphs of praise. I use tool_choice to force the schema so I never have to regex Claude's output.

import json, os, subprocess
from anthropic import Anthropic
from github import Github

client = Anthropic()  # reads ANTHROPIC_API_KEY

def get_diff() -> str:
    base, head = os.environ["BASE_SHA"], os.environ["HEAD_SHA"]
    # 3 lines of context keeps tokens low; exclude lockfiles/generated
    out = subprocess.run(
        ["git", "diff", "--unified=3", f"{base}...{head}",
         "--", ".", ":(exclude)*.lock", ":(exclude)*.min.js"],
        capture_output=True, text=True, check=True,
    ).stdout
    return out[:60000]  # hard cap; huge PRs get truncated, not dropped

REVIEW_TOOL = {
    "name": "submit_review",
    "description": "Submit code review findings",
    "input_schema": {
        "type": "object",
        "properties": {
            "findings": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "file": {"type": "string"},
                        "line": {"type": "integer",
                                 "description": "line in the NEW file"},
                        "severity": {"type": "string",
                                     "enum": ["bug", "risk", "nit"]},
                        "comment": {"type": "string"},
                    },
                    "required": ["file", "line", "severity", "comment"],
                },
            }
        },
        "required": ["findings"],
    },
}

SYSTEM = (
    "You are a senior reviewer. Review ONLY the diff. "
    "Report bugs, missing error handling, off-by-one, leaked secrets, "
    "and debug leftovers. Do NOT praise. Do NOT comment on style the "
    "linter handles. If you are unsure a line is wrong, omit it. "
    "Prefer zero findings over speculative ones."
)

def review(diff: str) -> list[dict]:
    msg = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=1500,
        system=SYSTEM,
        tools=[REVIEW_TOOL],
        tool_choice={"type": "tool", "name": "submit_review"},
        messages=[{"role": "user",
                   "content": f"Review this PR diff:\n\n{diff}"}],
    )
    for block in msg.content:
        if block.type == "tool_use":
            return block.input["findings"]
    return []

def post(findings: list[dict]):
    gh = Github(os.environ["GITHUB_TOKEN"])
    pr = gh.get_repo(os.environ["REPO"]).get_pull(int(os.environ["PR_NUMBER"]))
    commit = pr.get_commits().reversed[0]
    posted = 0
    for f in findings:
        body = f"**[{f['severity']}]** {f['comment']}"
        try:
            pr.create_review_comment(
                body=body, commit=commit, path=f["file"], line=f["line"],
            )
            posted += 1
        except Exception as e:
            # line not in diff -> GitHub 422; fall back to a PR comment
            print(f"inline failed for {f['file']}:{f['line']}: {e}")
            pr.create_issue_comment(f"`{f['file']}:{f['line']}` {body}")
    print(f"posted {posted}/{len(findings)} inline")

if __name__ == "__main__":
    diff = get_diff()
    if diff.strip():
        post(review(diff))
    else:
        print("empty diff, nothing to review")
Enter fullscreen mode Exit fullscreen mode

This is ~90 lines and it is the entire reviewer. You can drop in claude-sonnet-4-6 by changing one string when you want deeper reasoning on a critical repo; I keep Haiku for the default path because it catches the nits at 1/8th the cost.

The 422 error that silently dropped half my Claude comments

The failure that took longest to diagnose: create_review_comment returns HTTP 422 if the line you target is not part of the diff hunk. Claude would correctly identify a bug, give a line number that pointed at the new file, but GitHub only accepts comments on lines that appear in the diff. About 40% of my early comments vanished into 422s that I wasn't catching.

Two things fixed it. First, the try/except above degrades to a normal PR comment so nothing is lost. Second — and this matters — I tell Claude in the schema description that line is "line in the NEW file," because by default it tended to count lines within the diff hunk including the @@ headers, which is almost never what GitHub wants. After pinning that, the inline success rate went from ~60% to ~94% across the next 30 PRs.

Stopping the Claude-reviews-Claude infinite loop on synchronize

The second weekend-killer: I had a separate workflow where Claude could push fixups. Combined with the synchronize trigger, Claude's own push triggered another review, which triggered another suggestion. Three PRs racked up 70+ comments overnight before I noticed. GITHUB_TOKEN-authored events don't re-trigger workflows by default, but my fixup bot used a PAT, which does re-trigger — so the loop was live.

The guard is one line in the YAML if: plus a sender check in Python: skip when github.event.sender.login ends in [bot] or matches your automation account. Cheap insurance. If you only run the reviewer (no auto-fixer), the default GITHUB_TOKEN behavior already protects you, but assume that protection is gone the moment you add any PAT-authored automation.

What the $0.03 reviewer actually catches (and what it misses)

Across 60 real PRs, the reviewer flagged: 11 genuine bugs (mostly unhandled None/nil and one SQL string built with f-strings), 23 valid nits, and 6 false positives a human dismissed. It missed every bug that required understanding intent across files — a renamed function whose old name still lived in a config it never saw, for example. That is the hard ceiling of diff-only review, and it's why I never bill this as a human replacement. It's a tireless first pass that clears the trivial stuff so the human reviewer spends their attention on architecture.

The economics are the real story: 60 reviews cost me $1.68 total. The same coverage from a paid SaaS reviewer quoted $15/developer/month. For a 5-person team that's $75/month versus pennies, and you own the prompt, the schema, and the data path entirely.

If you want to extend it: add the PR title and description to the prompt for intent (cheap, ~200 tokens), cache the system prompt with cache_control once your system block grows past ~1,000 tokens, and gate severity: bug findings to fail the check run while letting nit pass. Start with the script above, push a deliberately buggy PR, and watch the inline comments land — that loop is the whole point.

Top comments (0)