AI Code Review Setup with Reviewdog and Local LLM

#codereview #reviewdog #localllm #ollama

This article was originally published on aicoderscope.com

Automated code review that posts inline PR comments sounds like a Copilot feature requiring a $40/month seat. It is not. With Reviewdog, a 40-line Python script, and a self-hosted Ollama instance, you can run the same pattern on every pull request — for free, with no diff leaving your infrastructure.

This is not a "someday when the tooling matures" guide. Reviewdog is production-ready, has been used at scale in large open-source projects, and the Ollama API is stable enough to build CI pipelines on. What you are building here is a working PR review system, not a proof of concept.

Honest take upfront: This setup makes strong economic sense for teams of four or more. For solo developers, the operational overhead (keeping a GPU box online, maintaining the Ollama install) probably exceeds the value. The article covers both scenarios so you can make that call for your situation.

What Reviewdog Is

Reviewdog is an open-source tool written in Go that runs any linter, parses its output, and posts the results as inline review comments on a GitHub or GitLab PR. It was built to eliminate the friction of checking CI logs and cross-referencing them with your diff manually.

The key design insight is that Reviewdog does not care what the linter is. You pipe any tool's output through Reviewdog's adapter format and it handles the GitHub API calls, comment deduplication, and PR annotation. That means the same runner can post comments from golangci-lint, ruff, eslint, or — importantly — a custom script that calls a local LLM.

Reviewdog supports several reporter modes:

github-pr-review: posts inline comments on the diff
github-pr-check: posts a GitHub Checks annotation (cleaner for CI)
local: prints to stdout, useful for local development

The project has over 8,000 GitHub stars and integrations with more than 40 linters maintained in the reviewdog/action-* family of GitHub Actions.

Why Pair It with a Local LLM

Static linters are pattern matchers. They catch undefined variables, unused imports, and style violations. What they do not catch:

Logic that is technically valid but wrong for the context ("this condition will never be true given how fetchUser works upstream")
Security smells that require understanding data flow ("this SQL is parameterized but the table name is interpolated")
Naming that is confusing relative to the surrounding codebase
Code that duplicates logic that already exists elsewhere in the repo

LLMs catch all of those — at least some of the time. The challenge with cloud LLMs is cost at team scale. If you run 50 PRs a month with diffs averaging 200 lines, a GPT-4o call per review runs to roughly $15–25/month at current pricing. Not catastrophic, but it compounds across teams and across tools. A local LLM eliminates that cost entirely after the hardware is sunk.

The other driver is privacy. If you work on proprietary code, every diff you send to OpenAI or Anthropic's API is subject to their data policies. A local Ollama instance running on your own hardware or a self-hosted VM means the code never leaves your network. That is often a hard requirement for enterprise and regulated-industry teams.

See also: Cline + Local LLM Privacy-First Setup in 2026 for a deeper treatment of the privacy argument and how local inference fits into a dev workflow beyond just code review.

Stack Overview

The complete stack has three layers:

Layer 1 — Static analysis (Reviewdog native)

golangci-lint for Go
ruff for Python
eslint for TypeScript/JavaScript

These run fast (under 30 seconds), catch deterministic issues, and do not require GPU compute. Reviewdog handles posting their output as PR comments natively.

Layer 2 — LLM diff review (custom script)
A Python script that calls a local Ollama endpoint with the full git diff and returns structured comments in Reviewdog's rdjsonl format. This is the part most setups skip over. Details below.

Layer 3 — Orchestration
A GitHub Actions workflow that wires layers 1 and 2 together on every PR. The workflow checks out the repo, installs tools, runs both layers, and posts results via the GITHUB_TOKEN.

Model Choice: Why Reasoning Quality Beats Speed Here

The instinct for CI is to pick the fastest model. For code review, that instinct is wrong.

A fast small model (Llama 3.2 3B, Phi-3 mini) produces code review comments that are syntactically plausible but often wrong or generic. "This function could be more efficient" on a function that is already O(1) is worse than no comment — it trains developers to ignore the reviewer.

For code review specifically, you want a model with enough context capacity and reasoning depth to understand what the code is trying to do and identify where it falls short. The two models that hold up at local inference scale in 2026:

Qwen2.5-Coder 32B — 32B parameters, requires ~20 GB VRAM in Q4 quantization. Strong code understanding, low hallucination rate on logic errors. If you have an RTX 4090 or a dual-GPU setup, this is the pick.

DeepSeek-Coder-V2 Lite (16B) — runs in ~10 GB VRAM, slightly more aggressive in flagging issues (occasional false positives). Good tradeoff if you are on a single 16 GB card.

For detailed model-to-VRAM mapping, see: Best Local AI Models by VRAM on runaihome.com.

Models to avoid for this task: anything under 7B. The 3B–7B range produces generic comments that add noise without value. If your hardware cannot run at least a 14B model at Q4, consider whether the LLM review layer adds enough to justify running it — or use the static analysis layer only and save the LLM review for local pre-commit use.

The LLM Review Script

This script calls a local Ollama endpoint with the staged git diff and emits Reviewdog-compatible rdjsonl output. Save it as .github/scripts/llm-review.py.


python
#!/usr/bin/env python3
"""LLM diff reviewer — emits rdjsonl for Reviewdog."""
import json
import subprocess
import sys
import urllib.request

OLLAMA_URL = "http://localhost:11434/api/generate"
MODEL = "qwen2.5-coder:32b"
MAX_DIFF_LINES = 400  # keeps context within model limits

def get_diff() -> str:
    result = subprocess.run(
        ["git", "diff", "origin/main...HEAD", "--unified=3"],
        capture_output=True, text=True, check=True
    )
    lines = result.stdout.splitlines()
    return "\n".join(lines[:MAX_DIFF_LINES])

def ask_llm(diff: str) -> list[dict]:
    prompt = f"""You are a senior code reviewer. Review the following git diff.
For each issue found, respond with a JSON object on its own line:
{{"path": "<file>", "line": <line_number>, "message": "<issue description>", "severity": "ERROR|WARNING|INFO"}}
Only output JSON lines. No explanations. No markdown.

Diff:
{diff}"""

    payload = json.dumps({"model": MODEL, "prompt": prompt, "stream": False})
    req = urllib.request.Request(
        OLLAMA_URL,
        data=payload.encode(),
        headers={"Content-Type": "application/json"},
    )
    with urllib.request.urlopen(req, timeout=120) as resp:
        body = json.loads(resp.read())
    return body.get("response", "").strip().splitlines()

def to_rdjsonl(lines: list[str]) -> None:
    for raw in lines:
        try:
            item = json.loads(raw)
            print(json.dumps({
                "message": item["message"],
                "severity": item.get("severity", "WARNING"),
                "location": {
                    "path": item["path"],
                    "range": {"start": {"line": int(item["line"]), "column": 1}}
                },
                "source": {"name": "llm-reviewer", "url": ""}
            }))
        except (json.JSONDecodeEr