Last month I pushed a bug to production which I would have caught if I had somebody look at my code. Well, it's not like I was the only one working on the project, so I had to read it myself and then forget about it, only to spend an annoying Tuesday trying to figure out why it didn't ship.
I was thinking about Ollama. I had heard about it a couple of times but never really needed to use it for anything. So this weekend I built a GitHub bot that reviews pull requests with a local LLM, without sending any code out of the box. It makes a comment on the PR with what it finds.
Here's how I built it.
What the bot actually does
When you open a pull request, GitHub fires a webhook. The bot receives it, pulls the diff for each changed file using GitHub's REST API, sends that diff to a locally running LLM via Ollama, and posts the review back as a comment on the PR.
The output looks like this on your PR:
Project Structure:
self-hosted-ai-code-review-bot-for-github-prs/
├── src/
│ ├── server.js
│ ├── github.js
│ ├── ollama.js
│ ├── diffParser.js
│ └── chunker.js
├── private-key.pem
├── .env
├── package-lock.json
└── package.json
Setting up the GitHub App first
Before any code, you need a GitHub App. A GitHub App has its own identity, which gets installed on specific repos, and uses short-lived installation tokens instead of long-lived credentials.
Go to GitHub → Settings → Developer Settings → GitHub Apps → New GitHub App.
Permissions you need:
- Pull requests: Read & write
- Issues: Read & write
- Metadata: Read-only
Under webhook events, subscribe to Pull request. Set the webhook URL to your ngrok address + /webhook (we'll come back to that).
Generate and download the private key (.pem file). GitHub uses this to sign the installation tokens.
Clone the repo and install dependencies:
git clone <your-repo-url>
cd <repo-name>
npm install
Your .env:
PORT=3000
GITHUB_APP_ID=your_app_id
GITHUB_PRIVATE_KEY_PATH=./privatekey.pem
WEBHOOK_SECRET=your_webhook_secret
OLLAMA_MODEL=deepseek-coder # or llama3, whichever you're running
Then install the app on your repository via the Install App option.
github.js — auth and API calls
GitHub Apps don't use static tokens. You sign a JWT with your private key, exchange it for a short-lived installation token, and use that to make API calls. @octokit/auth-app handles all of this:
The auth layer
GitHub Apps use installation tokens, not static keys. You sign a JWT with your private key, exchange it for a short-lived token, use that token for API calls. @octokit/auth-app handles the whole flow:
export async function getOctokit(installationId) {
const privateKey = fs.readFileSync(
process.env.GITHUB_PRIVATE_KEY_PATH,
"utf8"
);
const auth = createAppAuth({
appId: process.env.GITHUB_APP_ID,
privateKey,
installationId
});
const installationAuth = await auth({ type: "installation" });
return new Octokit({ auth: installationAuth.token });
}
You pass the installationId from the webhook payload. GitHub includes it in every event so you always know which installation triggered it.
The other two functions in github.js are straightforward: one calls octokit.pulls.listFiles to get the changed files with their diffs, the other calls octokit.issues.createComment to post the final review.
Filtering and chunking the diff
Not every file in a PR deserves review time. package-lock.json, .vscode/ settings, minified files, we skip them. GitHub also sometimes returns files without a patch field (binary files, files too large to diff), those get skipped too.
For files that do make it through, large diffs need to be split before sending to the LLM. Context windows are limited and model quality drops toward the end of long inputs:
export function chunkDiff(diff, maxSize = 1500) {
const chunks = [];
if (!diff) return chunks;
let current = "";
const lines = diff.split("\n");
for (const line of lines) {
if ((current + line).length > maxSize) {
chunks.push(current);
current = "";
}
current += line + "\n";
}
if (current) chunks.push(current);
return chunks;
}
1500 characters per chunk works reliably with both Llama 3 and DeepSeek Coder. Each chunk gets its own Ollama call, results get concatenated per file.
The prompt is important
This is where I spent the most time, and where the whole thing either works or doesn't.
My first version was something unclear like "review this diff and point out issues." The output was not good. The model would write paragraphs explaining what the file does, invent security vulnerabilities that weren't in the code, reference issue numbers that don't exist, and end with generic advice like "make sure to add proper error handling throughout your application."
The problem is that without constraints, the model tries to be helpful in every direction at once. It doesn't know you only care about what changed. So you have to be as explicit about what you don't want as what you do:
export async function reviewWithOllama(diffChunk) {
const prompt = `
You are an AI GitHub pull request reviewer.
You are reviewing ONLY the provided git diff.
STRICT RULES:
- Review ONLY changed lines from the diff.
- Ignore unchanged code completely.
- Do NOT explain the whole application.
- Do NOT give generic software engineering advice.
- Do NOT hallucinate missing features.
- Do NOT invent security issues.
- Do NOT invent issue IDs, ticket numbers, PR references, or metadata.
- Do NOT mention issues outside the diff.
- Every issue MUST directly relate to a changed line.
- Keep response concise.
- If no real issue exists, reply exactly: "No significant issues found."
Focus ONLY on:
1. Potential bugs introduced
2. Style or readability issues introduced
3. Performance issues introduced
4. Security risks directly introduced
5. Small improvement suggestions directly related to the diff
Diff:
${diffChunk}
`;
const response = await fetch("http://localhost:11434/api/generate", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: process.env.OLLAMA_MODEL,
prompt,
stream: false,
options: { temperature: 0 }
})
});
const data = await response.json();
return data.response;
}
Once I added the negative rules, the output became useful. The model stopped inventing context and started pointing at specific changed lines.
temperature: 0 — code review isn't creative work. You want the model to be deterministic, same diff producing consistent output. It also reduces hallucinations noticeably.
The webhook handler
Express receives the event, checks if it's worth handling, and orchestrates everything:
app.post("/webhook", async (req, res) => {
try {
const event = req.headers["x-github-event"];
const action = req.body.action;
if (event !== "pull_request") return res.status(200).send("Ignored");
if (!["opened", "synchronize", "reopened"].includes(action)) {
return res.status(200).send("Ignored action");
}
const installationId = req.body.installation.id;
const owner = req.body.repository.owner.login;
const repo = req.body.repository.name;
const pullNumber = req.body.pull_request.number;
const octokit = await getOctokit(installationId);
const files = await getPullRequestFiles(octokit, owner, repo, pullNumber);
let finalReview = `# 🤖 AI Code Review\n\n`;
for (const file of files) {
if (!shouldReviewFile(file)) continue;
const chunks = chunkDiff(file.patch);
let fileReview = "";
for (const chunk of chunks) {
fileReview += await reviewWithOllama(chunk) + "\n\n";
}
finalReview += `## File: ${file.filename}\n\n${fileReview}---\n\n`;
}
await postReviewComment(octokit, owner, repo, pullNumber, finalReview);
res.status(200).send("OK");
} catch (error) {
console.error("Webhook error:", error);
res.status(500).send("Error processing webhook");
}
});
Running it
Install Ollama from ollama.com and pull a model:
ollama run llama3
# or, better for code
ollama run deepseek-coder
Start the server and expose it with ngrok:
npm run dev
ngrok http 3000
Copy the ngrok HTTPS URL, update your GitHub App's webhook URL to https://your-ngrok-url/webhook, push a branch, open a PR.
CPU vs GPU: Depending on hardware, local inference on CPU may take several seconds per diff chunk. Llama 3 8B on a laptop CPU takes 10–20 seconds per diff chunk. For a background bot that's fine, you open the PR, switch tabs, comment shows up a minute later. With an NVIDIA GPU, Ollama picks it up automatically and it's under 2 seconds.
Deploying it properly
ngrok dies when you close your terminal. For anything persistent, a cheap VPS works. DigitalOcean's $6/month droplet, small EC2 instance, whatever. Install Node.js and Ollama on it, put Nginx in front with SSL, point your webhook URL at the domain.
Tools like GitHub Copilot Code Review send your diff to external servers. With this setup the diff goes from GitHub's API to your own machine into your local model which means nothing leaves your infrastructure. For proprietary codebases that's not a minor thing.
Where the idea came from
I came across VickyBytes a few weeks back. It's a platform that posts structured project ideas in Creator Labs, where you build something, write about it, and earn from it if the content is good enough. This was one of those.
Having a concrete spec helped. "Play with Ollama" had been on my list for months and never happened. A specific project with a clear outcome actually got it done. The bot runs on my own repos now, so that worked out.
If you write technical content and want structured project ideas to build around, visit vickybytes.com.
Full source is on GitHub. The prompt is where most of the interesting experimentation happens. That's where you'll actually learn how local LLMs behave in practice.
Top comments (0)