As a backend engineer working on enterprise systems, I spend a lot of time thinking about security — credential rotation, access control, encryption. So when I decided to build a side project, I wanted it to be something that actually solves a real problem I face daily: catching security issues in code before they reach production.
The result: an AI-powered code review tool that automatically analyzes your git diffs and flags vulnerabilities, bugs, and bad practices — in seconds.
Here's exactly how I built it, what broke, and what I learned.
The Problem
Manual code review is slow and inconsistent. You might catch a hardcoded password on a good day and miss a SQL injection vulnerability on a bad one. I wanted a tool that:
- Runs automatically after every commit
- Catches security issues I might miss when tired
- Gives specific, actionable feedback — not generic warnings
The Stack
- Python — for scripting and LLM integration
- Groq API — free LLM API running LLaMA 3.3 70B
- Git — to extract code diffs automatically
- GitHub Actions — to run the pipeline on every push
- Docker — to containerize the whole thing
How It Works
The core idea is simple:
- Run
git diff HEAD~1 HEADto get the latest code changes - Send the raw diff to an LLM with a structured prompt
- Print the AI's feedback directly in the terminal
def get_diff():
diff = subprocess.check_output(
['git', 'diff', 'HEAD~1', 'HEAD'],
stderr=subprocess.STDOUT
).decode('utf-8')
return diff
def review_code(diff):
client = Groq(api_key=API_KEY)
prompt = f"""You are a senior software engineer reviewing a pull request.
Analyze this code diff and provide feedback on:
1. Security vulnerabilities
2. Bugs or logical errors
3. Performance issues
4. Best practices violations
Code diff:
{diff}"""
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[{"role": "user", "content": prompt}],
max_tokens=1024
)
return response.choices[0].message.content
The prompt engineering here matters more than you'd think. Asking the model to be "specific and actionable" and to "reference exact line changes" produces dramatically better output than a vague "review this code."
What the Output Looks Like
I tested it on a deliberately vulnerable file containing a hardcoded password and a SQL injection:
def get_user(user_id):
password = "admin123"
query = "SELECT * FROM users WHERE id=" + user_id
return query
The AI caught both immediately:
Security Vulnerabilities:
1. Hardcoded password 'admin123' detected on line 2.
Recommendation: Use environment variables or a secrets manager like HashiCorp Vault.
2. SQL Injection vulnerability on line 3.
The query is constructed via string concatenation with user input.
Recommendation: Use parameterized queries instead.
Exactly what a senior engineer would flag in a real review.
What Broke Along the Way
Problem 1: The Deprecated Library Trap
I started with google-generativeai for Gemini. First run threw a FutureWarning:
All support for the google.generativeai package has ended.
Please switch to the google.genai package.
Google had deprecated the library mid-project. Switched to google-genai, hit quota limits immediately. Switched to Groq. Lesson: always check the library's GitHub issues before building on a free API.
Problem 2: Windows BOM Encoding
This one took an hour to debug. My .env file was being read as None despite existing right next to my script.
Turned out Windows Notepad saves files with a hidden BOM (Byte Order Mark) —  — prepended to the file. Python's dotenv library couldn't parse it.
Running Get-Content .env | Format-Hex revealed the culprit immediately. Fix:
[System.IO.File]::WriteAllText(
"$PWD\.env",
"GROQ_API_KEY=your-key`n",
[System.Text.UTF8Encoding]::new($false)
)
The $false parameter explicitly disables BOM. Small thing, easy to miss.
Problem 3: Git Diff Returning Binary Files
PowerShell's echo command writes files in UTF-16 by default. Git treats UTF-16 files as binary and won't diff them. So my reviewer was receiving an empty diff and the AI was responding with generic advice.
Binary files a/sample.py and b/sample.py differ
Fix: use Set-Content with explicit UTF-8 encoding instead of echo.
Adding a CI/CD Pipeline
Once the core tool worked, I wired it into GitHub Actions so it runs automatically on every push:
name: CI/CD Pipeline
on:
push:
branches: [main]
jobs:
build-and-push:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- run: pip install groq python-dotenv
- uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- run: docker build -t yourusername/ai-pr-reviewer:latest .
- run: docker push yourusername/ai-pr-reviewer:latest
Every commit now automatically builds a fresh Docker image and pushes it to Docker Hub. No manual steps.
Keeping the API Key Safe
Never hardcode API keys. Never. I learned to use a .env file excluded via .gitignore:
# .gitignore
.env
__pycache__/
*.pyc
And load it in code:
from dotenv import load_dotenv
import os
load_dotenv()
API_KEY = os.getenv("GROQ_API_KEY")
For GitHub Actions, secrets go in Settings → Secrets and variables → Actions — never in the workflow file itself.
What I'd Add Next
- GitHub PR integration — post review comments directly on pull requests via the GitHub API
- Severity scoring — tag findings as Critical / Warning / Info
- Multi-file support — currently reviews one diff at a time
- Custom rules — let teams define their own coding standards in a config file
Try It Yourself
The full project is on GitHub: github.com/harshit19424/ai-pr-reviewer
Setup takes under 5 minutes:
git clone https://github.com/harshit19424/ai-pr-reviewer.git
cd ai-pr-reviewer
pip install groq python-dotenv
# Add your Groq API key to .env
python reviewer.py
Groq is free — no credit card required. Get your key at console.groq.com.
Key Takeaways
- Prompt engineering matters. Specific instructions produce specific output. Vague prompts produce generic advice.
- Free APIs have hidden limits. Always test quota limits before building.
- Windows encoding is a minefield. Always specify UTF-8 without BOM explicitly.
- CI/CD is not optional. Automating the build and push took 30 minutes and saves time on every future commit.
If you're a backend engineer who hasn't integrated an LLM into a workflow yet — this is the simplest possible starting point. The whole core script is under 50 lines of Python.
I'm a Software Engineer at Jio Platforms working on enterprise security systems. Connect with me on LinkedIn or check out my projects on GitHub.
Top comments (0)