Every engineer loves writing code(On a good note, I believe 😉).
Not every engineer loves writing:
- commit messages
- PR descriptions
- testing notes
- release summaries
- sprint updates
In fast-moving teams, these things usually become an afterthought.
And honestly, it shows.
We’ve all seen commits like:
fixed the bug
changes into MyView file
final-final-fix
ui updates on list view
Or some PR descriptions that simply say:
“Please review.”
The bigger the codebase gets, the worse this problem becomes.
In modular enterprise applications with multiple teams working in parallel, poor PR communication slows reviews, increases confusion, and creates release risks.
So I started building something for myself:
An AI-powered Git assistant that could understand code changes and automatically generate:
- meaningful commit messages
- structured PR summaries
- testing notes
- risk indicators
- release notes
And I wanted it to work offline.
Why I Built It
Initially, this started as a tiny productivity experiment.
I simply wanted:
- cleaner commit messages
- less repetitive writing
- faster PR creation
But after using it for a few weeks, I realized something interesting:
The real value wasn’t automation.
It was reducing cognitive overhead.
After spending hours solving architectural or UI problems, context-switching into documentation mode becomes mentally expensive.
The assistant helped bridge that gap.
High-Level Architecture
The workflow is actually pretty straightforward.
Git Diff
↓
File Analysis
↓
Context Extraction
↓
Prompt Generation
↓
Local LLM
↓
Commit + PR Output
The difficult part was making the outputs:
- concise
- trustworthy
- reviewer-friendly
- non-robotic
Extracting Git Changes
The first version simply passed the raw git diff directly into the model.
That worked terribly.
Large diffs:
- exceeded context windows
- produced noisy summaries
- generated inaccurate PR descriptions
So I added preprocessing layers.
Example:
import subprocess
diff = subprocess.check_output(
["git", "diff", "--cached"],
text=True
)
print(diff[:1000])
But raw diffs were too noisy.
So the assistant started:
- grouping changes by module
- ignoring formatting-only modifications
- detecting API changes
- identifying added vs removed logic
Filtering Noise
One surprisingly important improvement was removing low-signal changes.
For example:
def should_ignore(line):
ignored_patterns = [
"import ",
"swiftlint",
"whitespace"
]
return any(p in line for p in ignored_patterns)
This dramatically improved the quality of generated summaries.
Prompt Engineering Was Harder Than Expected
One thing I underestimated was how sensitive outputs were to prompting.
A vague prompt generated vague PRs.
An overly detailed prompt generated essays nobody wanted to read.
Eventually I settled on prompts focused on:
- behavioral changes
- architectural impact
- reviewer clarity
- testing implications
Example:
prompt = f"""
You are an experienced software engineer reviewing a pull request.
Generate:
1. concise commit message
2. PR summary
3. testing notes
4. possible risks
Ignore formatting-only changes.
Git Diff:
{processed_diff}
"""
The single line:
Ignore formatting-only changes
improved results massively.
Running AI Offline
This became the most interesting part of the project.
I specifically wanted:
- local-first workflows
- zero cloud dependency
- privacy for enterprise repositories
- low latency
- no API costs
Sending proprietary diffs to external APIs was something I wanted to avoid entirely.
So I experimented with:
- Ollama
- llama.cpp
- quantized local models
- Apple Silicon optimizations
Calling the local model was surprisingly simple:
import requests
response = requests.post(
"http://localhost:11434/api/generate",
json={
"model": "mistral",
"prompt": prompt,
"stream": False
}
)
print(response.json()["response"])
For commit generation and PR summaries, smaller local models were often more than sufficient.
The Most Difficult Problems
The difficult part wasn’t generating text.
It was generating trustworthy text.
Hallucinated Features
Sometimes the model inferred functionality that didn’t exist.
Especially when refactors looked similar to feature additions.
To reduce this:
- prompts became shorter
- context became more constrained
- diffs were chunked intelligently
Huge Enterprise Diffs
Large modular repositories create massive PRs.
Passing entire diffs into the model quickly became inefficient.
So I added:
- chunking
- module prioritization
- high-signal file detection
Example:
MAX_CHUNK_SIZE = 4000
chunks = [
diff[i:i + MAX_CHUNK_SIZE]
for i in range(0, len(diff), MAX_CHUNK_SIZE)
]
Robotic Language
Early outputs sounded overly AI-generated.
Too many phrases like:
- “enhanced functionality”
- “optimized architecture”
- “improved user experience”
Real engineers don’t write like that.
So prompts were tuned toward:
- concise engineering tone
- direct wording
- reviewer readability
Features That Became Surprisingly Useful
The project slowly evolved beyond commit generation.
PR Risk Detection
The assistant flags:
- shared module changes
- navigation flow modifications
- authentication updates
- API contract changes
Automatic Testing Notes
Example output:
Tested:
- Login flow
- Session recovery
- Token refresh handling
- Deep link navigation
This alone ended up saving a surprising amount of time.
Release Notes Generation
The assistant can summarize:
- bug fixes
- user-facing improvements
- technical refactors
directly from merged commits.
Example Output
Before
fixed auth issue
After
Refactor authentication recovery flow to support token refresh handling during session expiration
What I Learned
The biggest realization from this project was:
AI works best when augmenting engineering workflows, not replacing engineering decisions.
The assistant is not writing code for me.
It is removing repetitive cognitive work around:
- communication
- formatting
- summarization
- workflow overhead
And that turns out to be incredibly valuable.
Future Improvements
There’s still a lot I want to explore:
- Xcode integration
- Git hook automation
- reviewer suggestions
- Jira linking
- architecture drift detection
- PR quality scoring
- Slack release summaries
I also want to experiment with embedding-based code understanding for better long-term context awareness.
Final Thoughts
Building AI tooling for toy projects is easy.
Building AI tooling that works reliably in large modular enterprise repositories is a completely different challenge.
But that’s also what makes it exciting.
What started as a small commit-message helper slowly evolved into a developer workflow copilot that now saves me time almost every day.
And honestly, this feels like just the beginning of what local AI can do for software engineering workflows.



Top comments (0)