Harshit Singh

Posted on May 1

I Built an AI-Powered Code Reviewer in Python (And What Broke Along the Way)

#ai #python #devops #opensource

As a backend engineer working on enterprise systems, I spend a lot of time thinking about security — credential rotation, access control, encryption. So when I decided to build a side project, I wanted it to be something that actually solves a real problem I face daily: catching security issues in code before they reach production.

The result: an AI-powered code review tool that automatically analyzes your git diffs and flags vulnerabilities, bugs, and bad practices — in seconds.

Here's exactly how I built it, what broke, and what I learned.

The Problem

Manual code review is slow and inconsistent. You might catch a hardcoded password on a good day and miss a SQL injection vulnerability on a bad one. I wanted a tool that:

Runs automatically after every commit
Catches security issues I might miss when tired
Gives specific, actionable feedback — not generic warnings

The Stack

Python — for scripting and LLM integration
Groq API — free LLM API running LLaMA 3.3 70B
Git — to extract code diffs automatically
GitHub Actions — to run the pipeline on every push
Docker — to containerize the whole thing

How It Works

The core idea is simple:

Run git diff HEAD~1 HEAD to get the latest code changes
Send the raw diff to an LLM with a structured prompt
Print the AI's feedback directly in the terminal

def get_diff():
    diff = subprocess.check_output(
        ['git', 'diff', 'HEAD~1', 'HEAD'],
        stderr=subprocess.STDOUT
    ).decode('utf-8')
    return diff

def review_code(diff):
    client = Groq(api_key=API_KEY)
    prompt = f"""You are a senior software engineer reviewing a pull request.
Analyze this code diff and provide feedback on:
1. Security vulnerabilities
2. Bugs or logical errors
3. Performance issues
4. Best practices violations

Code diff:
{diff}"""

    response = client.chat.completions.create(
        model="llama-3.3-70b-versatile",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=1024
    )
    return response.choices[0].message.content

The prompt engineering here matters more than you'd think. Asking the model to be "specific and actionable" and to "reference exact line changes" produces dramatically better output than a vague "review this code."

What the Output Looks Like

I tested it on a deliberately vulnerable file containing a hardcoded password and a SQL injection:

def get_user(user_id):
    password = "admin123"
    query = "SELECT * FROM users WHERE id=" + user_id
    return query

The AI caught both immediately:

Security Vulnerabilities:
1. Hardcoded password 'admin123' detected on line 2.
   Recommendation: Use environment variables or a secrets manager like HashiCorp Vault.

2. SQL Injection vulnerability on line 3.
   The query is constructed via string concatenation with user input.
   Recommendation: Use parameterized queries instead.

Exactly what a senior engineer would flag in a real review.

What Broke Along the Way

Problem 1: The Deprecated Library Trap

I started with google-generativeai for Gemini. First run threw a FutureWarning:

All support for the google.generativeai package has ended.
Please switch to the google.genai package.

Google had deprecated the library mid-project. Switched to google-genai, hit quota limits immediately. Switched to Groq. Lesson: always check the library's GitHub issues before building on a free API.

Problem 2: Windows BOM Encoding

This one took an hour to debug. My .env file was being read as None despite existing right next to my script.

Turned out Windows Notepad saves files with a hidden BOM (Byte Order Mark) — ï»¿ — prepended to the file. Python's dotenv library couldn't parse it.

Running Get-Content .env | Format-Hex revealed the culprit immediately. Fix:

[System.IO.File]::WriteAllText(
  "$PWD\.env",
  "GROQ_API_KEY=your-key`n",
  [System.Text.UTF8Encoding]::new($false)
)

The $false parameter explicitly disables BOM. Small thing, easy to miss.

Problem 3: Git Diff Returning Binary Files

PowerShell's echo command writes files in UTF-16 by default. Git treats UTF-16 files as binary and won't diff them. So my reviewer was receiving an empty diff and the AI was responding with generic advice.

Binary files a/sample.py and b/sample.py differ

Fix: use Set-Content with explicit UTF-8 encoding instead of echo.

Adding a CI/CD Pipeline

Once the core tool worked, I wired it into GitHub Actions so it runs automatically on every push:

name: CI/CD Pipeline

on:
  push:
    branches: [main]

jobs:
  build-and-push:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      - run: pip install groq python-dotenv
      - uses: docker/login-action@v3
        with:
          username: ${{ secrets.DOCKER_USERNAME }}
          password: ${{ secrets.DOCKER_PASSWORD }}
      - run: docker build -t yourusername/ai-pr-reviewer:latest .
      - run: docker push yourusername/ai-pr-reviewer:latest

Every commit now automatically builds a fresh Docker image and pushes it to Docker Hub. No manual steps.

Keeping the API Key Safe

Never hardcode API keys. Never. I learned to use a .env file excluded via .gitignore:

# .gitignore
.env
__pycache__/
*.pyc

And load it in code:

from dotenv import load_dotenv
import os

load_dotenv()
API_KEY = os.getenv("GROQ_API_KEY")

For GitHub Actions, secrets go in Settings → Secrets and variables → Actions — never in the workflow file itself.

What I'd Add Next

GitHub PR integration — post review comments directly on pull requests via the GitHub API
Severity scoring — tag findings as Critical / Warning / Info
Multi-file support — currently reviews one diff at a time
Custom rules — let teams define their own coding standards in a config file

Try It Yourself

The full project is on GitHub: github.com/harshit19424/ai-pr-reviewer

Setup takes under 5 minutes:

git clone https://github.com/harshit19424/ai-pr-reviewer.git
cd ai-pr-reviewer
pip install groq python-dotenv
# Add your Groq API key to .env
python reviewer.py

Groq is free — no credit card required. Get your key at console.groq.com.

Key Takeaways

Prompt engineering matters. Specific instructions produce specific output. Vague prompts produce generic advice.
Free APIs have hidden limits. Always test quota limits before building.
Windows encoding is a minefield. Always specify UTF-8 without BOM explicitly.
CI/CD is not optional. Automating the build and push took 30 minutes and saves time on every future commit.

If you're a backend engineer who hasn't integrated an LLM into a workflow yet — this is the simplest possible starting point. The whole core script is under 50 lines of Python.

I'm a Software Engineer at Jio Platforms working on enterprise security systems. Connect with me on LinkedIn or check out my projects on GitHub.

DEV Community