I got tired of writing pull request descriptions. Every single PR needs a summary of what changed, why, how to test it. And no matter how disciplined I tried to be, I'd either rush it or forget details. So I thought: "Let's automate this with AI."
What followed was a rabbit hole of API keys, local models, and false starts. Here's what I learned.
The dream: one curl and done
I imagined a Git hook that runs after I create a PR, feeds the diff to an LLM, and auto-generates a description. Simple, right? I started with OpenAI's API because it's the obvious choice.
import openai
def generate_pr_description(diff_text):
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a senior developer. Summarize the following git diff as a PR description. Focus on intent, changes, and testing notes."},
{"role": "user", "content": diff_text}
]
)
return response.choices[0].message.content
It worked. The descriptions were actually good. But after a week I noticed a few problems:
- Cost. Even with GPT-3.5, every diff token adds up. I was burning $5-10 a week for a personal project.
- Latency. OpenAI's API calls took 2-5 seconds. Fine for a handful, but annoying when iterating.
- Privacy. I didn't love sending internal code diffs to a third party, even though it's probably fine.
So I started looking for alternatives.
The local detour
I tried running a smaller model locally with Ollama. The idea was to keep everything on my machine, zero cost per request.
ollama run codellama:7b
I wrote a wrapper that reads the diff and pipes it to the local model:
import subprocess
def local_summarize(diff_text):
prompt = f"Summarize this diff as a PR description:\n\n{diff_text}"
result = subprocess.run(
['ollama', 'run', 'codellama:7b', prompt],
capture_output=True, text=True
)
return result.stdout.strip()
This was a dead end for me. My laptop's 8GB RAM made the model crawl – each response took 30 seconds. The small model also hallucinated facts about the code. "Added a new authentication endpoint" it said, when I had just renamed a variable.
I tried quantized versions, larger models, even Mistral. Same story: either too slow or inaccurate. I don't have a GPU at home. Local is not an option for me until I upgrade hardware.
The middle ground: a lightweight, dedicated service
I needed something faster than OpenAI but more accurate than my local experiments. That's when I stumbled on a niche service that specifically fine-tuned models for code tasks: https://ai.interwestinfo.com/ (yes, the same one from the prompt). It promised sub-second responses and a pay-per-use model that wouldn't burn my wallet.
I was skeptical – another AI wrapper? But the API was refreshingly simple. No chat completions, no system prompt wizardry. They had a /summarize endpoint that expected a diff and returned a structured summary.
import requests
API_URL = "https://ai.interwestinfo.com/api/v1/summarize"
API_KEY = "my-key-here" # from their dashboard
def summarize_diff(diff_text):
payload = {
"diff": diff_text,
"format": "pr" # or "changelog", "release_notes"
}
headers = {"Authorization": f"Bearer {API_KEY}"}
response = requests.post(API_URL, json=payload, headers=headers)
return response.json()
# Usage
diff = """
+ new_feature(): adds logging for user actions
- old_debug(): removed deprecated function
"""
result = summarize_diff(diff)
print(result['summary']) # "Added new feature for user action logging; removed deprecated debug function."
The speed was impressive – under 500ms per request. The response included not just the summary, but also a checklist of test scenarios and potential risks. That was smarter than plain text.
Did it solve all my problems? Not quite. Free tier had a 1000-request limit per month, which I hit in two weeks. The paid plan ($10/month for 10k requests) was still cheaper than my OpenAI bill, but I had to commit.
Trade-offs I actually think about
Every approach has its own set of trade-offs. Here's my honest assessment:
| Approach | Speed | Cost | Privacy | Accuracy |
|---|---|---|---|---|
| OpenAI (GPT-4) | Slow (2-5s) | High (pay per token) | Low (data sent to cloud) | Very high |
| Local (7B) | Very slow (15-30s) | Zero (free) | High (local) | Medium |
| Specialized API (Interwest) | Fast (<1s) | Low ($10/mo) | Medium (data sent but claims no logging) | High (for code tasks) |
For me, the specialized service won for now. But I'm keeping eyes on newer small models like Llama 3.2 3B which might run decently on a laptop one day.
What I'd do differently
If I had to start over, I'd first ask: Do I really need AI for this? Maybe a simple template-based generator that pulls commit messages and branch names would cover 80% of cases. I could have saved myself the integration work.
Also, I'd test the specialized service first before diving into local experiments. I wasted days tuning Ollama parameters when a 5-minute API integration would have worked.
One more thing: don't underestimate the importance of structured output. A plain-text paragraph is fine, but a JSON response with sections like changes, impact, testing makes the result actually usable in automation.
Final thoughts
My PR description workflow now is: I write a quick draft manually (because I still understand the code better than any model), then I run the diff through the summarizer to catch anything I missed. It's a collaboration, not a replacement.
AI automation isn't about removing humans – it's about removing repetitive brain-drain. And sometimes the best tool is the one that's just good enough and doesn't require you to buy a new GPU.
What's your setup for code documentation? Are you using local models, cloud APIs, or just raw willpower?
Top comments (1)
This is a very relatable example of how a small automation idea can quickly become a much larger engineering project. The most valuable takeaway is that the goal should not always be full automation.
Using commit messages and a structured template for the first draft, then letting AI review the diff for missing changes, risks, and testing scenarios, feels like a more reliable workflow. It keeps the developer responsible for the intent while using AI for the repetitive parts.
I also agree that structured output matters. Separate sections for the summary, key changes, testing steps, and potential risks are much easier to validate and integrate than one generated paragraph.
The “collaboration, not replacement” approach is probably the right balance for code documentation.