Using Local AI Models for Code Review (Without Vendor Lock-in)

#ai #productivity #coding #tools

You have probably thought about it: using Claude or ChatGPT to review your code before submitting a PR. It actually works. But there is a catch — you are shipping your code to someone else server.

Not everyone is cool with that, especially if you work in regulated industries or just value privacy. Here is what changed: local AI models got good enough this year. Really good.

What is Actually Viable Now

Six months ago, running Llama 2 locally for code review was painful. Token limits, slow inference, models that hallucinate fixes that break everything. Now? You have got real options.

Ollama + Llama 3 is the easiest entry point. Download, install, run.

ollama pull llama3
ollama serve
# Now it is running on localhost:11434

That is it. Your model runs entirely on your machine. No API calls, no rate limits, no vendor watching your code.

Actual use case I tested:

POST http://localhost:11434/api/generate
{
  "model": "llama3",
  "prompt": "Review this TypeScript code for bugs and security issues:\n\n[your code here]"
}

Llama 3 caught a SQL injection vulnerability I intentionally buried in test code. Also suggested better variable names. The review was not as sharp as Claude, but it was useful, and it ran in 8 seconds on a MacBook Pro.

Real Trade-offs

Local wins:

Privacy (code never leaves your machine)
No rate limits (review as much as you want)
Works offline
Cheaper at scale (one-time hardware cost)
No vendor dependency

Cloud wins:

Better analysis (Claude sees patterns across millions of codebases)
Faster (you are not bottlenecked by your GPU)
Less fussiness (just send an API call)
Better at understanding context and architecture

Honest take: if you are doing complex architectural reviews, Claude still wins. If you are doing quick "do I have a memory leak here?" checks, local is fine.

The Setup That Actually Works

Do not try to review code by copy-pasting into a terminal. That is miserable.

Build a wrapper around Ollama:

const https = require("https");

async function reviewCode(code, language = "javascript") {
  const prompt = \`You are a code reviewer. Review this \${language} code for bugs, security issues, and style improvements. Be concise.\n\n\${code}\`;

  const data = JSON.stringify({
    model: "llama3",
    prompt: prompt,
    stream: false
  });

  return new Promise((resolve, reject) => {
    const options = {
      hostname: "localhost",
      port: 11434,
      path: "/api/generate",
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        "Content-Length": data.length
      }
    };

    const req = https.request(options, (res) => {
      let body = "";
      res.on("data", (chunk) => body += chunk);
      res.on("end", () => {
        const parsed = JSON.parse(body);
        resolve(parsed.response);
      });
    });

    req.on("error", reject);
    req.write(data);
    req.end();
  });
}

// Usage
const myCode = \`
function getUserData(id) {
  const query = "SELECT * FROM users WHERE id = " + id;
  return db.query(query);
}
\`;

reviewCode(myCode, "javascript").then(review => console.log(review));

Now you have got a local code reviewer you can call from your CI/CD, your editor, or your pre-commit hook. No API keys, no network requests, no privacy concerns.

Where This Gets Real

I am actually using this in production:

Pre-commit hook that runs local Llama on staged changes. Takes ~10 seconds. Catches about 60% of the bugs that would normally need peer review. Not perfect, but genuinely useful as a first pass.

Integration with GitHub Actions — before a PR even gets assigned to humans, a local model reviews it, leaves a comment with suggestions. Cuts down noise on smaller PRs.

Pair-programming mode — running Llama 3 in a tmux window, asking it questions about your code while you work. It is like pair programming with someone who has seen a lot of code but is not judgy.

The Real Limitation

These models do not understand your codebase. They do not know why you made a certain architectural decision or what the next 5 features need from the foundation. They are great at spotting typos, obvious bugs, and common patterns. They are not great at "should we refactor this for scalability?"

For that, you still need humans. But you can throw local models at the easy stuff first.

Getting Started This Week

Install Ollama from ollama.com (takes 2 minutes)
Pull a model: ollama pull llama3\ (grab a 4B version if storage matters)
Test it locally — just hit the API endpoint with curl
Build a wrapper for your workflow (editor plugin, pre-commit hook, whatever)

If you have got GPU memory (8GB+ is comfortable), this runs fast. If you are on CPU only, it is slower but still viable for ad-hoc reviews.

One More Thing

If you want to stay on top of AI tools that actually work and do not require selling your code to a corporation, check out LearnAI Weekly — they cover practical AI tools for developers, not just hype.

Honest question: are you already using local models, or does cloud-based review still feel safer? Hit me with your setup in the comments.