<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Dragon Ha</title>
    <description>The latest articles on DEV Community by Dragon Ha (@dragon_ha_700d0d2f551032c).</description>
    <link>https://dev.to/dragon_ha_700d0d2f551032c</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3931721%2F11a56d61-30a1-4693-83b4-c88a69c07971.png</url>
      <title>DEV Community: Dragon Ha</title>
      <link>https://dev.to/dragon_ha_700d0d2f551032c</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dragon_ha_700d0d2f551032c"/>
    <language>en</language>
    <item>
      <title>GemmaDiff: I Built a Local AI Code Reviewer with Gemma 4 That Never Sends Your Code to the Cloud</title>
      <dc:creator>Dragon Ha</dc:creator>
      <pubDate>Thu, 14 May 2026 17:26:04 +0000</pubDate>
      <link>https://dev.to/dragon_ha_700d0d2f551032c/gemmadiff-i-built-a-local-ai-code-reviewer-with-gemma-4-that-never-sends-your-code-to-the-cloud-2eja</link>
      <guid>https://dev.to/dragon_ha_700d0d2f551032c/gemmadiff-i-built-a-local-ai-code-reviewer-with-gemma-4-that-never-sends-your-code-to-the-cloud-2eja</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Build with Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GemmaDiff&lt;/strong&gt; — a command-line tool that reviews your git diffs using Google's Gemma 4 model, running entirely on your local machine. No cloud APIs, no data leaving your laptop, no monthly subscriptions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;git add src/auth.py
&lt;span class="nv"&gt;$ &lt;/span&gt;gemmadiff

🔍 GemmaDiff - 本地 AI 代码审查

📝 审查暂存变更
📁 文件: src/auth.py
📊 变更: +23 &lt;span class="nt"&gt;-5&lt;/span&gt;
🤖 正在分析代码...
⏱️  分析耗时: 4.2s

⚠️  发现 1 个问题
🟠 &lt;span class="c"&gt;#1 [HIGH] security&lt;/span&gt;
   📍 src/auth.py:23
   Hardcoded JWT secret key &lt;span class="k"&gt;in &lt;/span&gt;&lt;span class="nb"&gt;source &lt;/span&gt;code
   💡 Move to environment variable: os.getenv&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'JWT_SECRET'&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Every developer knows the drill: you write code, push it, and wait for a cloud-based code review tool to analyze it. But here's the friction I kept hitting:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Privacy&lt;/strong&gt;: I can't send my client's proprietary code to GitHub Copilot or CodeRabbit&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency&lt;/strong&gt;: Waiting 10-30 seconds for a cloud API response breaks my coding flow&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost&lt;/strong&gt;: $10-20/month adds up when you're freelancing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Offline&lt;/strong&gt;: Planes, trains, and terrible WiFi at coffee shops&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I wanted something that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reviews code as fast as I can type&lt;/li&gt;
&lt;li&gt;Works completely offline&lt;/li&gt;
&lt;li&gt;Costs nothing&lt;/li&gt;
&lt;li&gt;Actually catches real issues (not just style nits)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How I Used Gemma 4
&lt;/h2&gt;

&lt;p&gt;Gemma 4 is the perfect model for this use case. Here's why:&lt;/p&gt;

&lt;h3&gt;
  
  
  The 256K Context Window is a Game Changer
&lt;/h3&gt;

&lt;p&gt;Code reviews require understanding context. A security vulnerability in &lt;code&gt;auth.py&lt;/code&gt; might depend on how &lt;code&gt;config.py&lt;/code&gt; handles secrets. With Gemma 4's 256K context window, I can feed in entire diffs — even large PRs with 50+ files — and the model understands the relationships between changes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# The diff can be massive — Gemma 4 handles it
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;100000&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;diff&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;100000&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;[... diff truncated ...]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The 26B MoE Model Hits the Sweet Spot
&lt;/h3&gt;

&lt;p&gt;I chose the &lt;strong&gt;Gemma 4 26B MoE&lt;/strong&gt; model because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It only activates &lt;strong&gt;3.8B parameters&lt;/strong&gt; during inference (fast!)&lt;/li&gt;
&lt;li&gt;But it has the knowledge of a 26B parameter model (smart!)&lt;/li&gt;
&lt;li&gt;On my MacBook Pro M3, it reviews a typical diff in &lt;strong&gt;~5 seconds&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Structured Output with System Prompts
&lt;/h3&gt;

&lt;p&gt;The key to making this work is a carefully crafted system prompt that forces Gemma 4 to output structured JSON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;REVIEW_SYSTEM_PROMPT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a senior code reviewer. Analyze the git diff and provide a structured review.

Respond in JSON format:
{
  &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;One-line summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
  &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;risk_level&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;low|medium|high|critical&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
  &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;issues&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: [{
    &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;severity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;critical|high|medium|low|info&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
    &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;category&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;security|bug|performance|style|maintainability&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
    &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filename.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
    &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;line&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: 42,
    &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What the issue is&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
    &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;suggestion&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How to fix it&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
  }],
  &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;positive&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: [&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Good practices you noticed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;],
  &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;suggestions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: [&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;General improvement suggestions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;]
}&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives me predictable, parseable output that I can format into beautiful terminal output or pipe into CI/CD systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Basic Usage
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Review staged changes (most common workflow)&lt;/span&gt;
python gemmadiff.py

&lt;span class="c"&gt;# Review all unstaged changes&lt;/span&gt;
python gemmadiff.py &lt;span class="nt"&gt;--all&lt;/span&gt;

&lt;span class="c"&gt;# Review a specific commit&lt;/span&gt;
python gemmadiff.py &lt;span class="nt"&gt;--commit&lt;/span&gt; abc123

&lt;span class="c"&gt;# Review changes vs main branch (for PRs)&lt;/span&gt;
python gemmadiff.py &lt;span class="nt"&gt;--pr&lt;/span&gt;

&lt;span class="c"&gt;# Use smaller model for faster review&lt;/span&gt;
python gemmadiff.py &lt;span class="nt"&gt;--model&lt;/span&gt; gemma4:4b

&lt;span class="c"&gt;# Output as JSON for CI/CD integration&lt;/span&gt;
python gemmadiff.py &lt;span class="nt"&gt;--json&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Real Example Output
&lt;/h3&gt;

&lt;p&gt;I tested GemmaDiff on a real PR that added JWT authentication:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;============================================================
📋 GemmaDiff Code Review
============================================================

📊 变更统计
   文件: 2 个
   新增: +45
   删除: -12

📝 总结
   Added JWT authentication with refresh token support
   风险等级: MEDIUM

⚠️  发现 2 个问题
------------------------------------------------------------

  🟠 #1 [HIGH] security
     📍 src/auth.py:23
     Hardcoded JWT secret key in source code
     💡 Move to environment variable: os.getenv('JWT_SECRET')

  🟡 #2 [MEDIUM] performance
     📍 src/auth.py:45
     Database query in loop (N+1 problem)
     💡 Use batch query: User.query.filter(User.id.in_(user_ids))

👍 做得好
   ✨ Good use of bcrypt for password hashing
   ✨ Proper token expiration handling

💡 改进建议
   • Add rate limiting for login endpoint
   • Add unit tests for token refresh logic
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  CI/CD Integration
&lt;/h3&gt;

&lt;p&gt;GemmaDiff outputs JSON, making it easy to integrate into GitHub Actions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Code Review&lt;/span&gt;
&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;review&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama/ollama-action@v1&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gemma4:26b&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;pip install ollama&lt;/span&gt;
          &lt;span class="s"&gt;python gemmadiff.py --pr --json &amp;gt; review.json&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;The full source is available on GitHub, but here's the core logic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;review_diff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gemma4:26b&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Send diff to Gemma 4 for review.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;REVIEW_SYSTEM_PROMPT&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Review this git diff:&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;```
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endraw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
diff&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;diff&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
```&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Low temp for consistent output
&lt;/span&gt;            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;num_predict&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4096&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The entire tool is ~400 lines of Python. No frameworks, no dependencies beyond &lt;code&gt;ollama&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;h3&gt;
  
  
  For Individual Developers
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Review your own code before committing&lt;/li&gt;
&lt;li&gt;Catch issues early (before they hit CI/CD)&lt;/li&gt;
&lt;li&gt;Learn from the AI's suggestions&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  For Teams
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Integrate into CI/CD for automated reviews&lt;/li&gt;
&lt;li&gt;Consistent review standards across the team&lt;/li&gt;
&lt;li&gt;No code leaves your infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  For Security-Sensitive Industries
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Healthcare, finance, government — code never touches external servers&lt;/li&gt;
&lt;li&gt;Compliance-friendly (HIPAA, SOC2, etc.)&lt;/li&gt;
&lt;li&gt;Full audit trail with JSON output&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Performance Benchmarks
&lt;/h2&gt;

&lt;p&gt;Tested on MacBook Pro M3 (36GB RAM):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Diff Size&lt;/th&gt;
&lt;th&gt;Lines Changed&lt;/th&gt;
&lt;th&gt;Review Time&lt;/th&gt;
&lt;th&gt;Memory&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Small&lt;/td&gt;
&lt;td&gt;~50 lines&lt;/td&gt;
&lt;td&gt;2.1s&lt;/td&gt;
&lt;td&gt;18.4GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;~200 lines&lt;/td&gt;
&lt;td&gt;4.2s&lt;/td&gt;
&lt;td&gt;18.4GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Large&lt;/td&gt;
&lt;td&gt;~1000 lines&lt;/td&gt;
&lt;td&gt;8.7s&lt;/td&gt;
&lt;td&gt;18.4GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Huge&lt;/td&gt;
&lt;td&gt;~5000 lines&lt;/td&gt;
&lt;td&gt;18.3s&lt;/td&gt;
&lt;td&gt;18.4GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;System prompts are everything&lt;/strong&gt;: The quality of the review depends more on the prompt than the model. I spent 80% of my time refining the system prompt.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Structured output &amp;gt; free-form&lt;/strong&gt;: Forcing JSON output makes the tool actually usable in real workflows. Free-form text is pretty but useless for automation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;MoE is perfect for this&lt;/strong&gt;: The 26B MoE model gives me 26B-level intelligence at 3.8B-level speed. It's the ideal trade-off for a code review tool.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Local AI is production-ready&lt;/strong&gt;: I was surprised by how well Gemma 4 performs on real-world code. It catches 80% of what cloud tools catch, and it's getting better every month.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Install Ollama&lt;/span&gt;
brew &lt;span class="nb"&gt;install &lt;/span&gt;ollama

&lt;span class="c"&gt;# 2. Pull Gemma 4&lt;/span&gt;
ollama pull gemma4:26b

&lt;span class="c"&gt;# 3. Clone and run&lt;/span&gt;
git clone https://github.com/DragonHa-XIA/gemmadiff
&lt;span class="nb"&gt;cd &lt;/span&gt;gemmadiff
pip &lt;span class="nb"&gt;install &lt;/span&gt;ollama
python gemmadiff.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;[ ] VS Code extension&lt;/li&gt;
&lt;li&gt;[ ] Pre-commit hook integration&lt;/li&gt;
&lt;li&gt;[ ] Support for more languages (Go, Rust, Java)&lt;/li&gt;
&lt;li&gt;[ ] Custom review rules via config file&lt;/li&gt;
&lt;li&gt;[ ] GitHub Action (ready-to-use)&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Built for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge&lt;/a&gt;. All code runs locally using Google's open-source Gemma 4 model.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What would you build with Gemma 4? Drop a comment below!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
      <category>python</category>
    </item>
    <item>
      <title>I Replaced ChatGPT with Gemma 4 Running on My MacBook — Here's What Happened</title>
      <dc:creator>Dragon Ha</dc:creator>
      <pubDate>Thu, 14 May 2026 17:19:02 +0000</pubDate>
      <link>https://dev.to/dragon_ha_700d0d2f551032c/i-replaced-chatgpt-with-gemma-4-running-on-my-macbook-heres-what-happened-4j1c</link>
      <guid>https://dev.to/dragon_ha_700d0d2f551032c/i-replaced-chatgpt-with-gemma-4-running-on-my-macbook-heres-what-happened-4j1c</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Write About Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  I Replaced ChatGPT with Gemma 4 Running on My MacBook — Here's What Happened
&lt;/h1&gt;

&lt;p&gt;Last week, I did something radical: I turned off my ChatGPT subscription and switched to running Gemma 4 entirely on my laptop. No cloud. No API keys. No monthly bills. Just me, my MacBook, and Google's latest open model.&lt;/p&gt;

&lt;p&gt;Here's the honest truth about what worked, what didn't, and why I'm never going back.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I Made the Switch
&lt;/h2&gt;

&lt;p&gt;I'm a developer who uses AI constantly — for code review, writing documentation, debugging, and brainstorming. But I kept hitting the same friction points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Latency&lt;/strong&gt;: Waiting for API responses during a coding flow breaks my concentration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Privacy&lt;/strong&gt;: I couldn't send proprietary code or client data to cloud APIs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost&lt;/strong&gt;: $20/month adds up, especially when I'm just experimenting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Offline&lt;/strong&gt;: Planes, coffee shops with bad WiFi, power outages — I needed AI that works anywhere&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When Google released Gemma 4 in April 2026, I saw my escape route. The model family includes everything from a 2B-parameter model that runs on a Raspberry Pi to a 31B dense model that rivals GPT-4o on benchmarks. And it's all Apache 2.0 licensed — completely free to use.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup: 5 Minutes from Zero to Running
&lt;/h2&gt;

&lt;p&gt;Getting Gemma 4 running locally is embarrassingly easy. Here's exactly what I did:&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Install Ollama
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# macOS&lt;/span&gt;
brew &lt;span class="nb"&gt;install &lt;/span&gt;ollama

&lt;span class="c"&gt;# Linux&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Pull the Model
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# The 26B MoE model — fast and smart&lt;/span&gt;
ollama pull gemma4:26b

&lt;span class="c"&gt;# Or the smaller 4B model for lighter tasks&lt;/span&gt;
ollama pull gemma4:4b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Start Chatting
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama run gemma4:26b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Three commands. No Docker, no Python environment, no CUDA drivers to wrestle with. The model downloads (~16GB for the 26B variant), and you're running.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Daily Workflow: What I Actually Use It For
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Code Review (Surprisingly Good)
&lt;/h3&gt;

&lt;p&gt;I feed it my PR diffs and ask for review:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;review_code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gemma4:26b&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;You are a senior code reviewer. Be concise. Focus on bugs, security issues, and performance problems.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Review this diff:&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;diff&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Verdict&lt;/strong&gt;: It catches 80% of what ChatGPT catches. It's particularly good at spotting SQL injection risks and missing error handling. It occasionally misses subtle logic bugs that require deep domain knowledge.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Documentation Generation (Excellent)
&lt;/h3&gt;

&lt;p&gt;This is where Gemma 4 shines. I point it at a function and get clean docs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_docstring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gemma4:26b&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'''&lt;/span&gt;&lt;span class="s"&gt;Write a clear docstring for this function. 
            Include Args, Returns, and a brief example.

            ```
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endraw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
python
            &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

            ```&lt;/span&gt;&lt;span class="sh"&gt;'''&lt;/span&gt;
        &lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Verdict&lt;/strong&gt;: On par with GPT-4o. The docstrings are clean, accurate, and include good examples.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Brainstorming and Rubber Ducking (Great)
&lt;/h3&gt;

&lt;p&gt;When I'm stuck on architecture decisions, I talk through problems:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Me: I have a microservice that processes 10K events/second. 
    The current Redis pub/sub is becoming a bottleneck. 
    Should I switch to Kafka or NATS?

Gemma 4: [Detailed comparison with trade-offs, specific config 
          recommendations, and migration strategy]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Verdict&lt;/strong&gt;: The 256K context window means I can dump entire codebases into the conversation. It remembers context across long conversations better than I expected.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Multimodal: Screenshot to Code (The Wow Moment)
&lt;/h3&gt;

&lt;p&gt;The real surprise was Gemma 4's vision capability. I screenshot a UI component and ask it to generate the code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;screenshot_to_code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;image_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;b64encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gemma4:26b&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Convert this screenshot to React + Tailwind CSS code. Be precise about spacing, colors, and layout.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;images&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;image_data&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Verdict&lt;/strong&gt;: It's not perfect, but it gets the layout right 70% of the time. I use it as a starting point, then refine. Saves me 30-60 minutes per component.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Honest Comparison: Gemma 4 vs ChatGPT
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Gemma 4 (Local)&lt;/th&gt;
&lt;th&gt;ChatGPT (Cloud)&lt;/th&gt;
&lt;th&gt;Winner&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Code Review&lt;/td&gt;
&lt;td&gt;8/10&lt;/td&gt;
&lt;td&gt;9/10&lt;/td&gt;
&lt;td&gt;ChatGPT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Documentation&lt;/td&gt;
&lt;td&gt;9/10&lt;/td&gt;
&lt;td&gt;9/10&lt;/td&gt;
&lt;td&gt;Tie&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Brainstorming&lt;/td&gt;
&lt;td&gt;8/10&lt;/td&gt;
&lt;td&gt;9/10&lt;/td&gt;
&lt;td&gt;ChatGPT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Speed (first token)&lt;/td&gt;
&lt;td&gt;50ms&lt;/td&gt;
&lt;td&gt;200-500ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Gemma 4&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Privacy&lt;/td&gt;
&lt;td&gt;10/10&lt;/td&gt;
&lt;td&gt;3/10&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Gemma 4&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Offline Use&lt;/td&gt;
&lt;td&gt;10/10&lt;/td&gt;
&lt;td&gt;0/10&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Gemma 4&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;$20/month&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Gemma 4&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Complex Reasoning&lt;/td&gt;
&lt;td&gt;7/10&lt;/td&gt;
&lt;td&gt;9/10&lt;/td&gt;
&lt;td&gt;ChatGPT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multilingual (140+)&lt;/td&gt;
&lt;td&gt;8/10&lt;/td&gt;
&lt;td&gt;9/10&lt;/td&gt;
&lt;td&gt;ChatGPT&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The pattern is clear: ChatGPT is better at complex reasoning and edge cases. But Gemma 4 is better at everything that matters for daily workflow — speed, privacy, cost, and availability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance on My MacBook Pro M3
&lt;/h2&gt;

&lt;p&gt;Here are real numbers from my testing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Model: gemma4:26b (MoE, 3.8B active parameters)
Hardware: MacBook Pro M3, 36GB RAM

Prompt: "Write a Python function to merge two sorted lists"
- Time to first token: 45ms
- Tokens per second: 42
- Total response time: 3.2s
- Memory usage: 18.4GB

Prompt: "Review this 200-line diff" 
- Time to first token: 52ms
- Tokens per second: 38
- Total response time: 8.1s
- Memory usage: 18.4GB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The 26B MoE model is the sweet spot. It's fast enough to feel instant on short queries, and smart enough to handle complex tasks. The 4B model is even faster but noticeably less capable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tips I Wish Someone Told Me
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Use the Right Model for the Task
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Quick questions, simple code: use the small model&lt;/span&gt;
ollama run gemma4:4b

&lt;span class="c"&gt;# Code review, documentation, complex tasks: use the big model&lt;/span&gt;
ollama run gemma4:26b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. System Prompts Matter More Than You Think
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Bad: Generic prompt
&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write code to sort a list&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Good: Specific, constrained prompt  
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Write a Python function that sorts a list of dictionaries by a given key.
Requirements:
- Handle missing keys gracefully
- Support ascending and descending order
- Include type hints
- Add a docstring with examples&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. The 256K Context Window is a Game Changer
&lt;/h3&gt;

&lt;p&gt;You can feed entire files into the conversation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Read an entire file and ask for refactoring suggestions
&lt;/span&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;legacy_module.py&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gemma4:26b&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Refactor this code to use modern Python patterns:&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;```
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endraw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
python&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
```&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Combine with Other Local Tools
&lt;/h3&gt;

&lt;p&gt;I use Gemma 4 with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Continue.dev&lt;/strong&gt; for VS Code integration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Aider&lt;/strong&gt; for AI pair programming&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LangChain&lt;/strong&gt; for building local AI pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Doesn't Work (Yet)
&lt;/h2&gt;

&lt;p&gt;I want to be honest about the limitations:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Complex mathematical reasoning&lt;/strong&gt;: It struggles with multi-step proofs and advanced calculus. I still use Wolfram Alpha for this.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Real-time information&lt;/strong&gt;: It doesn't know what happened yesterday. For current events, I still use web search.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Highly specialized domains&lt;/strong&gt;: Medical, legal, and financial advice requires more caution. The model is good but not an expert.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Very long code generation&lt;/strong&gt;: For generating 500+ lines of code, cloud models are still more reliable. Gemma 4 sometimes loses coherence in very long outputs.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;After one week of using Gemma 4 exclusively:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Saved&lt;/strong&gt;: $20/month (ChatGPT subscription)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gained&lt;/strong&gt;: Complete privacy for client code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gained&lt;/strong&gt;: Zero-latency responses during coding flow&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gained&lt;/strong&gt;: Offline AI on planes and in remote locations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lost&lt;/strong&gt;: ~15% accuracy on complex reasoning tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For 90% of my daily AI usage, Gemma 4 is not just "good enough" — it's better. The speed advantage alone makes it worth the switch. And knowing that my code, my client's data, and my conversations never leave my laptop? That's priceless.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started Today
&lt;/h2&gt;

&lt;p&gt;If you want to try this yourself:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install &lt;a href="https://ollama.com" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;ollama pull gemma4:26b&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Start building&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The models are free. The tools are free. The only cost is your time — and you'll get that back in productivity within the first day.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What's your experience with local AI models? Have you tried Gemma 4? Drop a comment below — I'd love to hear what you're building.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
