Hopkins Jesse

Posted on Jun 6

I Automated My PR Reviews With AI — Saved 12 Hours/Week (Full Setup)

#ai #automation #productivity #tutorial

I spent three months training a custom AI to review my team's pull requests. The first month was a disaster. False positives everywhere, missed bugs, and my teammates hated it.

Then I fixed it. Here's exactly what I built and how you can copy it.

The Problem That Drove Me Crazy

My team of 6 engineers submits about 40 PRs per week. Each review takes 30-45 minutes if I'm thorough. That's 20-30 hours of code review every week.

I was burning out. My own code quality suffered because I rushed through reviews. And honestly? I was missing stuff. A production bug slipped through in January 2026 because I skimmed a 500-line PR at 11 PM.

I needed help. Not a linter, not a static analyzer, but something that actually understood context.

What I Actually Built

This isn't a ChatGPT wrapper. I built a pipeline that:

Extracts the PR diff and commit history
Feeds it into a fine-tuned Claude 3.5 model trained on our codebase
Runs 5 specific checkers in parallel
Generates a structured review with confidence scores

Here's the core setup I run on every PR:

import os
from github import Github
from anthropic import Anthropic

class PRReviewBot:
    def __init__(self, repo_name):
        self.gh = Github(os.environ['GITHUB_TOKEN'])
        self.repo = self.gh.get_repo(repo_name)
        self.claude = Anthropic(api_key=os.environ['ANTHROPIC_KEY'])

    def review_pr(self, pr_number):
        pr = self.repo.get_pull(pr_number)
        files = pr.get_files()

        checkers = {
            'logic_errors': self.check_logic,
            'security_issues': self.check_security,
            'performance_regression': self.check_performance,
            'style_consistency': self.check_style,
            'test_coverage': self.check_tests
        }

        results = {}
        for name, checker in checkers.items():
            results[name] = checker(files)

        return self.generate_summary(results)

    def check_logic(self, files):
        prompt = f"""Review this code for logic errors. 
        Our codebase uses Python 3.12 with FastAPI.
        Focus on: race conditions, null pointer issues, incorrect state mutations.
        Return a JSON array of issues with severity (critical/major/minor)."""

        # truncated for brevity

The full version is about 400 lines. I'll share the complete repo link at the end.

The Data: What Actually Improved

I tracked metrics for 8 weeks after deployment. Here's what changed:

Metric	Before AI	After AI	Change
Review time per PR	38 min	8 min	-79%
Bugs missed in review	3.2/month	0.8/month	-75%
Team satisfaction (1-10)	6.1	8.4	+38%
False flag rate	N/A	12%	Improving

The false flag rate dropped from 34% in week one to 12% by week eight. That's because I kept tuning the prompts and adding specific project context.

What Went Wrong (And How I Fixed It)

Month 1: The AI hated our coding style

It flagged our custom logging wrapper as "unnecessary abstraction" 47 times. I had to add a 200-line configuration file that defined our acceptable patterns. Painful but necessary.

Month 2: It missed SQL injection vulnerabilities

The model didn't understand our ORM layer. I had to feed it 30 example files showing safe vs unsafe patterns. After that, it caught 4 SQL issues in week 6 alone.

Month 3: False positives from test files

The AI kept complaining about test assertions being "too complex." I added a simple filter: skip any file in the tests/ directory unless it's been modified in the actual PR scope.

The Critical Pieces You Need

This isn't plug and play. Here's what made it work:

1. Project-specific context file

Create a CONTEXT.md that explains your architecture, naming conventions, and common patterns. I update this every sprint. The AI reads it before each review.

2. Confidence thresholds

Don't let the AI block PRs automatically. I set it to:

Critical issues: flag immediately, block merge
Major issues: comment, but don't block
Minor issues: ignore unless there are 5+ in one PR

3. Human override system

Every comment has a "dismiss" button that feeds back into the training data. I reviewed 100 dismissed comments manually to fix the worst patterns.

The Cost Breakdown

Running this costs about $0.12 per PR in API calls. My time savings are worth roughly $150/week at my hourly rate. Total setup took about 40 hours spread over 3 months.

I also run it on my personal projects. Costs about $3/month for 20-30 PRs.

What I'd Do Differently

Start with a smaller scope. I tried to review everything at once. Should have just done security checks for the first month, then added logic reviews, then style.

Also, get your team on board first. I deployed silently and people were confused. Now I have a `#ai-review` channel where the bot posts its findings and people can react with

💡 Further Reading: I experiment with AI automation and open-source tools. Find more guides at Pi Stack.

💰 Want to make some smart bets? I've been using Polymarket — the world's largest prediction market platform — to bet on everything from election outcomes to tech trends. Real money, real probabilities, real payouts. Unlike crypto casinos, Polymarket is a legitimate information market where your edge comes from being better informed than the crowd. I've banked some solid wins calling AI regulation timelines and crypto ETF approvals. Sign up with my referral link and start trading: Polymarket.com

DEV Community

I Automated My PR Reviews With AI — Saved 12 Hours/Week (Full Setup)

The Problem That Drove Me Crazy

What I Actually Built

The Data: What Actually Improved

What Went Wrong (And How I Fixed It)

The Critical Pieces You Need

The Cost Breakdown

What I'd Do Differently

Also, get your team on board first. I deployed silently and people were confused. Now I have a `#ai-review` channel where the bot posts its findings and people can react with

Top comments (0)

The Problem That Drove Me Crazy

What I Actually Built

The Data: What Actually Improved

What Went Wrong (And How I Fixed It)

The Critical Pieces You Need

The Cost Breakdown

What I'd Do Differently

Also, get your team on board first. I deployed silently and people were confused. Now I have a #ai-review channel where the bot posts its findings and people can react with

Also, get your team on board first. I deployed silently and people were confused. Now I have a `#ai-review` channel where the bot posts its findings and people can react with