I got tired of writing pull request descriptions. Every single PR needs a summary of what changed, why, how to test it. And no matter how disciplined I tried to be, I'd either rush it or forget details. So I thought: "Let's automate this with AI."
What followed was a rabbit hole of API keys, local models, and false starts. Here's what I learned.
The dream: one curl and done
I imagined a Git hook that runs after I create a PR, feeds the diff to an LLM, and auto-generates a description. Simple, right? I started with OpenAI's API because it's the obvious choice.
import openai
def generate_pr_description(diff_text):
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a senior developer. Summarize the following git diff as a PR description. Focus on intent, changes, and testing notes."},
{"role": "user", "content": diff_text}
]
)
return response.choices[0].message.content
It worked. The descriptions were actually good. But after a week I noticed a few problems:
- Cost. Even with GPT-3.5, every diff token adds up. I was burning $5-10 a week for a personal project.
- Latency. OpenAI's API calls took 2-5 seconds. Fine for a handful, but annoying when iterating.
- Privacy. I didn't love sending internal code diffs to a third party, even though it's probably fine.
So I started looking for alternatives.
The local detour
I tried running a smaller model locally with Ollama. The idea was to keep everything on my machine, zero cost per request.
ollama run codellama:7b
I wrote a wrapper that reads the diff and pipes it to the local model:
import subprocess
def local_summarize(diff_text):
prompt = f"Summarize this diff as a PR description:\n\n{diff_text}"
result = subprocess.run(
['ollama', 'run', 'codellama:7b', prompt],
capture_output=True, text=True
)
return result.stdout.strip()
This was a dead end for me. My laptop's 8GB RAM made the model crawl – each response took 30 seconds. The small model also hallucinated facts about the code. "Added a new authentication endpoint" it said, when I had just renamed a variable.
I tried quantized versions, larger models, even Mistral. Same story: either too slow or inaccurate. I don't have a GPU at home. Local is not an option for me until I upgrade hardware.
The middle ground: a lightweight, dedicated service
I needed something faster than OpenAI but more accurate than my local experiments. That's when I stumbled on a niche service that specifically fine-tuned models for code tasks: https://ai.interwestinfo.com/ (yes, the same one from the prompt). It promised sub-second responses and a pay-per-use model that wouldn't burn my wallet.
I was skeptical – another AI wrapper? But the API was refreshingly simple. No chat completions, no system prompt wizardry. They had a /summarize endpoint that expected a diff and returned a structured summary.
import requests
API_URL = "https://ai.interwestinfo.com/api/v1/summarize"
API_KEY = "my-key-here" # from their dashboard
def summarize_diff(diff_text):
payload = {
"diff": diff_text,
"format": "pr" # or "changelog", "release_notes"
}
headers = {"Authorization": f"Bearer {API_KEY}"}
response = requests.post(API_URL, json=payload, headers=headers)
return response.json()
# Usage
diff = """
+ new_feature(): adds logging for user actions
- old_debug(): removed deprecated function
"""
result = summarize_diff(diff)
print(result['summary']) # "Added new feature for user action logging; removed deprecated debug function."
The speed was impressive – under 500ms per request. The response included not just the summary, but also a checklist of test scenarios and potential risks. That was smarter than plain text.
Did it solve all my problems? Not quite. Free tier had a 1000-request limit per month, which I hit in two weeks. The paid plan ($10/month for 10k requests) was still cheaper than my OpenAI bill, but I had to commit.
Trade-offs I actually think about
Every approach has its own set of trade-offs. Here's my honest assessment:
| Approach | Speed | Cost | Privacy | Accuracy |
|---|---|---|---|---|
| OpenAI (GPT-4) | Slow (2-5s) | High (pay per token) | Low (data sent to cloud) | Very high |
| Local (7B) | Very slow (15-30s) | Zero (free) | High (local) | Medium |
| Specialized API (Interwest) | Fast (<1s) | Low ($10/mo) | Medium (data sent but claims no logging) | High (for code tasks) |
For me, the specialized service won for now. But I'm keeping eyes on newer small models like Llama 3.2 3B which might run decently on a laptop one day.
What I'd do differently
If I had to start over, I'd first ask: Do I really need AI for this? Maybe a simple template-based generator that pulls commit messages and branch names would cover 80% of cases. I could have saved myself the integration work.
Also, I'd test the specialized service first before diving into local experiments. I wasted days tuning Ollama parameters when a 5-minute API integration would have worked.
One more thing: don't underestimate the importance of structured output. A plain-text paragraph is fine, but a JSON response with sections like changes, impact, testing makes the result actually usable in automation.
Final thoughts
My PR description workflow now is: I write a quick draft manually (because I still understand the code better than any model), then I run the diff through the summarizer to catch anything I missed. It's a collaboration, not a replacement.
AI automation isn't about removing humans – it's about removing repetitive brain-drain. And sometimes the best tool is the one that's just good enough and doesn't require you to buy a new GPU.
What's your setup for code documentation? Are you using local models, cloud APIs, or just raw willpower?
Top comments (0)