DEV Community: Dragon Ha

GemmaDiff: I Built a Local AI Code Reviewer with Gemma 4 That Never Sends Your Code to the Cloud

Dragon Ha — Thu, 14 May 2026 17:26:04 +0000

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

GemmaDiff — a command-line tool that reviews your git diffs using Google's Gemma 4 model, running entirely on your local machine. No cloud APIs, no data leaving your laptop, no monthly subscriptions.

$ git add src/auth.py
$ gemmadiff

🔍 GemmaDiff - 本地 AI 代码审查

📝 审查暂存变更
📁 文件: src/auth.py
📊 变更: +23 -5
🤖 正在分析代码...
⏱️  分析耗时: 4.2s

⚠️  发现 1 个问题
🟠 #1 [HIGH] security
   📍 src/auth.py:23
   Hardcoded JWT secret key in source code
   💡 Move to environment variable: os.getenv('JWT_SECRET')

The Problem

Every developer knows the drill: you write code, push it, and wait for a cloud-based code review tool to analyze it. But here's the friction I kept hitting:

Privacy: I can't send my client's proprietary code to GitHub Copilot or CodeRabbit
Latency: Waiting 10-30 seconds for a cloud API response breaks my coding flow
Cost: $10-20/month adds up when you're freelancing
Offline: Planes, trains, and terrible WiFi at coffee shops

I wanted something that:

Reviews code as fast as I can type
Works completely offline
Costs nothing
Actually catches real issues (not just style nits)

How I Used Gemma 4

Gemma 4 is the perfect model for this use case. Here's why:

The 256K Context Window is a Game Changer

Code reviews require understanding context. A security vulnerability in auth.py might depend on how config.py handles secrets. With Gemma 4's 256K context window, I can feed in entire diffs — even large PRs with 50+ files — and the model understands the relationships between changes.

# The diff can be massive — Gemma 4 handles it
if len(diff) > 100000:
    diff = diff[:100000] + "\n\n[... diff truncated ...]"

The 26B MoE Model Hits the Sweet Spot

I chose the Gemma 4 26B MoE model because:

It only activates 3.8B parameters during inference (fast!)
But it has the knowledge of a 26B parameter model (smart!)
On my MacBook Pro M3, it reviews a typical diff in ~5 seconds

Structured Output with System Prompts

The key to making this work is a carefully crafted system prompt that forces Gemma 4 to output structured JSON:

REVIEW_SYSTEM_PROMPT = """You are a senior code reviewer. Analyze the git diff and provide a structured review.

Respond in JSON format:
{
  "summary": "One-line summary",
  "risk_level": "low|medium|high|critical",
  "issues": [{
    "severity": "critical|high|medium|low|info",
    "category": "security|bug|performance|style|maintainability",
    "file": "filename.py",
    "line": 42,
    "description": "What the issue is",
    "suggestion": "How to fix it"
  }],
  "positive": ["Good practices you noticed"],
  "suggestions": ["General improvement suggestions"]
}"""

This gives me predictable, parseable output that I can format into beautiful terminal output or pipe into CI/CD systems.

Demo

Basic Usage

# Review staged changes (most common workflow)
python gemmadiff.py

# Review all unstaged changes
python gemmadiff.py --all

# Review a specific commit
python gemmadiff.py --commit abc123

# Review changes vs main branch (for PRs)
python gemmadiff.py --pr

# Use smaller model for faster review
python gemmadiff.py --model gemma4:4b

# Output as JSON for CI/CD integration
python gemmadiff.py --json

Real Example Output

I tested GemmaDiff on a real PR that added JWT authentication:

============================================================
📋 GemmaDiff Code Review
============================================================

📊 变更统计
   文件: 2 个
   新增: +45
   删除: -12

📝 总结
   Added JWT authentication with refresh token support
   风险等级: MEDIUM

⚠️  发现 2 个问题
------------------------------------------------------------

  🟠 #1 [HIGH] security
     📍 src/auth.py:23
     Hardcoded JWT secret key in source code
     💡 Move to environment variable: os.getenv('JWT_SECRET')

  🟡 #2 [MEDIUM] performance
     📍 src/auth.py:45
     Database query in loop (N+1 problem)
     💡 Use batch query: User.query.filter(User.id.in_(user_ids))

👍 做得好
   ✨ Good use of bcrypt for password hashing
   ✨ Proper token expiration handling

💡 改进建议
   • Add rate limiting for login endpoint
   • Add unit tests for token refresh logic

CI/CD Integration

GemmaDiff outputs JSON, making it easy to integrate into GitHub Actions:

name: Code Review
on: [pull_request]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: ollama/ollama-action@v1
        with:
          model: gemma4:26b
      - run: |
          pip install ollama
          python gemmadiff.py --pr --json > review.json

Code

The full source is available on GitHub, but here's the core logic:

def review_diff(diff: str, model: str = 'gemma4:26b') -> dict:
    """Send diff to Gemma 4 for review."""

    response = ollama.chat(
        model=model,
        messages=[
            {
                'role': 'system',
                'content': REVIEW_SYSTEM_PROMPT
            },
            {
                'role': 'user',
                'content': f"Review this git diff:\n\n```
{% endraw %}
diff\n{diff}\n
{% raw %}
```"
            }
        ],
        options={
            'temperature': 0.1,  # Low temp for consistent output
            'num_predict': 4096
        }
    )

    return json.loads(response['message']['content'])

The entire tool is ~400 lines of Python. No frameworks, no dependencies beyond ollama.

Why This Matters

For Individual Developers

Review your own code before committing
Catch issues early (before they hit CI/CD)
Learn from the AI's suggestions

For Teams

Integrate into CI/CD for automated reviews
Consistent review standards across the team
No code leaves your infrastructure

For Security-Sensitive Industries

Healthcare, finance, government — code never touches external servers
Compliance-friendly (HIPAA, SOC2, etc.)
Full audit trail with JSON output

Performance Benchmarks

Tested on MacBook Pro M3 (36GB RAM):

Diff Size	Lines Changed	Review Time	Memory
Small	~50 lines	2.1s	18.4GB
Medium	~200 lines	4.2s	18.4GB
Large	~1000 lines	8.7s	18.4GB
Huge	~5000 lines	18.3s	18.4GB

What I Learned

System prompts are everything: The quality of the review depends more on the prompt than the model. I spent 80% of my time refining the system prompt.
Structured output > free-form: Forcing JSON output makes the tool actually usable in real workflows. Free-form text is pretty but useless for automation.
MoE is perfect for this: The 26B MoE model gives me 26B-level intelligence at 3.8B-level speed. It's the ideal trade-off for a code review tool.
Local AI is production-ready: I was surprised by how well Gemma 4 performs on real-world code. It catches 80% of what cloud tools catch, and it's getting better every month.

Try It Yourself

# 1. Install Ollama
brew install ollama

# 2. Pull Gemma 4
ollama pull gemma4:26b

# 3. Clone and run
git clone https://github.com/DragonHa-XIA/gemmadiff
cd gemmadiff
pip install ollama
python gemmadiff.py

What's Next

[ ] VS Code extension
[ ] Pre-commit hook integration
[ ] Support for more languages (Go, Rust, Java)
[ ] Custom review rules via config file
[ ] GitHub Action (ready-to-use)

Built for the Gemma 4 Challenge. All code runs locally using Google's open-source Gemma 4 model.

What would you build with Gemma 4? Drop a comment below!

I Replaced ChatGPT with Gemma 4 Running on My MacBook — Here's What Happened

Dragon Ha — Thu, 14 May 2026 17:19:02 +0000

This is a submission for the Gemma 4 Challenge: Write About Gemma 4

I Replaced ChatGPT with Gemma 4 Running on My MacBook — Here's What Happened

Last week, I did something radical: I turned off my ChatGPT subscription and switched to running Gemma 4 entirely on my laptop. No cloud. No API keys. No monthly bills. Just me, my MacBook, and Google's latest open model.

Here's the honest truth about what worked, what didn't, and why I'm never going back.

Why I Made the Switch

I'm a developer who uses AI constantly — for code review, writing documentation, debugging, and brainstorming. But I kept hitting the same friction points:

Latency: Waiting for API responses during a coding flow breaks my concentration
Privacy: I couldn't send proprietary code or client data to cloud APIs
Cost: $20/month adds up, especially when I'm just experimenting
Offline: Planes, coffee shops with bad WiFi, power outages — I needed AI that works anywhere

When Google released Gemma 4 in April 2026, I saw my escape route. The model family includes everything from a 2B-parameter model that runs on a Raspberry Pi to a 31B dense model that rivals GPT-4o on benchmarks. And it's all Apache 2.0 licensed — completely free to use.

The Setup: 5 Minutes from Zero to Running

Getting Gemma 4 running locally is embarrassingly easy. Here's exactly what I did:

Step 1: Install Ollama

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

Step 2: Pull the Model

# The 26B MoE model — fast and smart
ollama pull gemma4:26b

# Or the smaller 4B model for lighter tasks
ollama pull gemma4:4b

Step 3: Start Chatting

ollama run gemma4:26b

That's it. Three commands. No Docker, no Python environment, no CUDA drivers to wrestle with. The model downloads (~16GB for the 26B variant), and you're running.

My Daily Workflow: What I Actually Use It For

1. Code Review (Surprisingly Good)

I feed it my PR diffs and ask for review:

import ollama

def review_code(diff: str) -> str:
    response = ollama.chat(
        model='gemma4:26b',
        messages=[{
            'role': 'system',
            'content': 'You are a senior code reviewer. Be concise. Focus on bugs, security issues, and performance problems.'
        }, {
            'role': 'user',
            'content': f'Review this diff:\n\n{diff}'
        }]
    )
    return response['message']['content']

Verdict: It catches 80% of what ChatGPT catches. It's particularly good at spotting SQL injection risks and missing error handling. It occasionally misses subtle logic bugs that require deep domain knowledge.

2. Documentation Generation (Excellent)

This is where Gemma 4 shines. I point it at a function and get clean docs:

def generate_docstring(code: str) -> str:
    response = ollama.chat(
        model='gemma4:26b',
        messages=[{
            'role': 'user',
            'content': f'''Write a clear docstring for this function. 
            Include Args, Returns, and a brief example.

            ```
{% endraw %}
python
            {code}
{% raw %}

            ```'''
        }]
    )
    return response['message']['content']

Verdict: On par with GPT-4o. The docstrings are clean, accurate, and include good examples.

3. Brainstorming and Rubber Ducking (Great)

When I'm stuck on architecture decisions, I talk through problems:

Me: I have a microservice that processes 10K events/second. 
    The current Redis pub/sub is becoming a bottleneck. 
    Should I switch to Kafka or NATS?

Gemma 4: [Detailed comparison with trade-offs, specific config 
          recommendations, and migration strategy]

Verdict: The 256K context window means I can dump entire codebases into the conversation. It remembers context across long conversations better than I expected.

4. Multimodal: Screenshot to Code (The Wow Moment)

The real surprise was Gemma 4's vision capability. I screenshot a UI component and ask it to generate the code:

import ollama
import base64

def screenshot_to_code(image_path: str) -> str:
    with open(image_path, 'rb') as f:
        image_data = base64.b64encode(f.read()).decode()

    response = ollama.chat(
        model='gemma4:26b',
        messages=[{
            'role': 'user',
            'content': 'Convert this screenshot to React + Tailwind CSS code. Be precise about spacing, colors, and layout.',
            'images': [image_data]
        }]
    )
    return response['message']['content']

Verdict: It's not perfect, but it gets the layout right 70% of the time. I use it as a starting point, then refine. Saves me 30-60 minutes per component.

The Honest Comparison: Gemma 4 vs ChatGPT

Task	Gemma 4 (Local)	ChatGPT (Cloud)	Winner
Code Review	8/10	9/10	ChatGPT
Documentation	9/10	9/10	Tie
Brainstorming	8/10	9/10	ChatGPT
Speed (first token)	50ms	200-500ms	Gemma 4
Privacy	10/10	3/10	Gemma 4
Offline Use	10/10	0/10	Gemma 4
Cost	Free	$20/month	Gemma 4
Complex Reasoning	7/10	9/10	ChatGPT
Multilingual (140+)	8/10	9/10	ChatGPT

The pattern is clear: ChatGPT is better at complex reasoning and edge cases. But Gemma 4 is better at everything that matters for daily workflow — speed, privacy, cost, and availability.

Performance on My MacBook Pro M3

Here are real numbers from my testing:

Model: gemma4:26b (MoE, 3.8B active parameters)
Hardware: MacBook Pro M3, 36GB RAM

Prompt: "Write a Python function to merge two sorted lists"
- Time to first token: 45ms
- Tokens per second: 42
- Total response time: 3.2s
- Memory usage: 18.4GB

Prompt: "Review this 200-line diff" 
- Time to first token: 52ms
- Tokens per second: 38
- Total response time: 8.1s
- Memory usage: 18.4GB

The 26B MoE model is the sweet spot. It's fast enough to feel instant on short queries, and smart enough to handle complex tasks. The 4B model is even faster but noticeably less capable.

Tips I Wish Someone Told Me

1. Use the Right Model for the Task

# Quick questions, simple code: use the small model
ollama run gemma4:4b

# Code review, documentation, complex tasks: use the big model
ollama run gemma4:26b

2. System Prompts Matter More Than You Think

# Bad: Generic prompt
"Write code to sort a list"

# Good: Specific, constrained prompt  
"""Write a Python function that sorts a list of dictionaries by a given key.
Requirements:
- Handle missing keys gracefully
- Support ascending and descending order
- Include type hints
- Add a docstring with examples"""

3. The 256K Context Window is a Game Changer

You can feed entire files into the conversation:

# Read an entire file and ask for refactoring suggestions
with open('legacy_module.py', 'r') as f:
    code = f.read()

response = ollama.chat(
    model='gemma4:26b',
    messages=[{
        'role': 'user',
        'content': f'Refactor this code to use modern Python patterns:\n\n```
{% endraw %}
python\n{code}\n
{% raw %}
```'
    }]
)

4. Combine with Other Local Tools

I use Gemma 4 with:

Continue.dev for VS Code integration
Aider for AI pair programming
LangChain for building local AI pipelines

What Doesn't Work (Yet)

I want to be honest about the limitations:

Complex mathematical reasoning: It struggles with multi-step proofs and advanced calculus. I still use Wolfram Alpha for this.
Real-time information: It doesn't know what happened yesterday. For current events, I still use web search.
Highly specialized domains: Medical, legal, and financial advice requires more caution. The model is good but not an expert.
Very long code generation: For generating 500+ lines of code, cloud models are still more reliable. Gemma 4 sometimes loses coherence in very long outputs.

The Bottom Line

After one week of using Gemma 4 exclusively:

Saved: $20/month (ChatGPT subscription)
Gained: Complete privacy for client code
Gained: Zero-latency responses during coding flow
Gained: Offline AI on planes and in remote locations
Lost: ~15% accuracy on complex reasoning tasks

For 90% of my daily AI usage, Gemma 4 is not just "good enough" — it's better. The speed advantage alone makes it worth the switch. And knowing that my code, my client's data, and my conversations never leave my laptop? That's priceless.

Getting Started Today

If you want to try this yourself:

Install Ollama
Run ollama pull gemma4:26b
Start building

The models are free. The tools are free. The only cost is your time — and you'll get that back in productivity within the first day.

What's your experience with local AI models? Have you tried Gemma 4? Drop a comment below — I'd love to hear what you're building.