<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Enny Rodríguez</title>
    <description>The latest articles on DEV Community by Enny Rodríguez (@theelmix).</description>
    <link>https://dev.to/theelmix</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3920032%2Fa3392222-c55f-426f-9b9e-3f6e7f8340ca.png</url>
      <title>DEV Community: Enny Rodríguez</title>
      <link>https://dev.to/theelmix</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/theelmix"/>
    <language>en</language>
    <item>
      <title>My Local Copilot: Gemma 4 + Open WebUI + OpenHands for Coding Without Leaving My Machine</title>
      <dc:creator>Enny Rodríguez</dc:creator>
      <pubDate>Fri, 08 May 2026 22:42:23 +0000</pubDate>
      <link>https://dev.to/theelmix/my-local-copilot-gemma-4-open-webui-openhands-for-coding-without-leaving-my-machine-180j</link>
      <guid>https://dev.to/theelmix/my-local-copilot-gemma-4-open-webui-openhands-for-coding-without-leaving-my-machine-180j</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Write About Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: this post describes a real local architecture I use for development. Exact model names in Ollama, Hugging Face or Kaggle may vary depending on the runtime you use. The important part is not memorizing one command, but understanding how to separate chat, reasoning, multimodal context, code execution and repositories on your own machine.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  My Local Copilot: Gemma 4 + Open WebUI + OpenHands for Coding Without Leaving My Machine
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flfa2clwdus9nw1reo9mz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flfa2clwdus9nw1reo9mz.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For a long time, I used local models as if they were just another chat window.&lt;/p&gt;

&lt;p&gt;I pasted an error, copied the answer, went back to my editor, ran tests, copied the next error, and repeated the loop.&lt;/p&gt;

&lt;p&gt;That works, but it leaves a lot on the table.&lt;/p&gt;

&lt;p&gt;What makes Gemma 4 interesting to me is not only that it is an open model family with multimodal capabilities and variants that can target different hardware profiles. What makes it interesting is that it lets me think about a different kind of setup: an environment where the model is not isolated in a tab, but connected to a local development workflow.&lt;/p&gt;

&lt;p&gt;My goal for this experiment was simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I want a development copilot that runs locally, can reason with me, can understand visual and textual context, can read project files, and, when it makes sense, can act on a repository without sending my entire codebase to an external service.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To do that, I built a stack with a few pieces that complement each other:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Gemma 4&lt;/strong&gt; as the open model family for reasoning, explanation and assistance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ollama&lt;/strong&gt; running natively on macOS to use the local hardware efficiently.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open WebUI&lt;/strong&gt; as the general interface for chat, model comparison, multimodal input and image generation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenHands&lt;/strong&gt; as the development agent that can read files, use a terminal and work on repositories.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub and GitLab&lt;/strong&gt; as the real source of issues, pull requests, merge requests and product context.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The main idea is this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Gemma 4 becomes much more useful when it stops being "a model in a box" and becomes part of a local development architecture.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;The stack separates responsibilities. Ollama runs on the host because, on macOS Apple Silicon, that is the practical path for taking advantage of the local runtime. The interfaces run in Docker. Open WebUI is where I think, compare, inspect visual context and generate supporting images. OpenHands is where I move from conversation to action.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxovt4ivj5a9jzy3b7mfv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxovt4ivj5a9jzy3b7mfv.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That separation changes the experience:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If I need to think, summarize, compare approaches or work with images, I use Open WebUI.&lt;/li&gt;
&lt;li&gt;If I need the model to read files, propose changes and run commands, I use OpenHands.&lt;/li&gt;
&lt;li&gt;If the task comes from real work, I start from GitHub or GitLab and bring the context into my local workspace.&lt;/li&gt;
&lt;li&gt;If I want to change the model, I do it through Ollama without redesigning the rest of the stack.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why Gemma 4 Fits This Workflow
&lt;/h2&gt;

&lt;p&gt;Google introduced Gemma 4 as an open model family with variants for different hardware and use cases. That matters for local development because not every task needs the same model.&lt;/p&gt;

&lt;p&gt;For my workflow, four capabilities are especially relevant.&lt;/p&gt;

&lt;p&gt;First, &lt;strong&gt;model size becomes a routing decision&lt;/strong&gt;. Sometimes I want a quick answer about a function. Sometimes I want a deeper review of a multi-module change. Those are not the same task.&lt;/p&gt;

&lt;p&gt;Second, &lt;strong&gt;longer context changes how a model can work with code&lt;/strong&gt;. A useful coding assistant needs to understand conventions, nearby files, previous decisions and test structure.&lt;/p&gt;

&lt;p&gt;Third, &lt;strong&gt;agents need more than good text generation&lt;/strong&gt;. A coding agent has to hold instructions, use tools, read results and correct itself. The model matters, but the surrounding architecture matters too.&lt;/p&gt;

&lt;p&gt;Fourth, &lt;strong&gt;multimodality changes how software tasks are described&lt;/strong&gt;. Sometimes the context is not in a &lt;code&gt;.py&lt;/code&gt; or &lt;code&gt;.ts&lt;/code&gt; file. It is a broken UI screenshot, a diagram, a wireframe, a generated asset, a chart or a screenshot of an error. Open WebUI gives me a natural entry point for that material before turning it into a development task.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Local Setup
&lt;/h2&gt;

&lt;p&gt;My setup uses Docker Compose for the interfaces and keeps Ollama running directly on the host.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faotep940wltgfoscs5zu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faotep940wltgfoscs5zu.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The key detail is that OpenHands talks to Ollama through the OpenAI-compatible endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[llm]&lt;/span&gt;
&lt;span class="py"&gt;model&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"openai/gemma4:e4b"&lt;/span&gt;
&lt;span class="py"&gt;base_url&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"http://host.docker.internal:11434/v1"&lt;/span&gt;
&lt;span class="py"&gt;ollama_base_url&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"http://host.docker.internal:11434"&lt;/span&gt;
&lt;span class="py"&gt;api_key&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"local-llm"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In my configuration repo, this same pattern already works with other local models. For Gemma 4, the conceptual change is to replace the model with the variant I want to test: a smaller one for latency, a stronger one for planning, or a larger one for architectural review.&lt;/p&gt;

&lt;p&gt;I also keep multiple models available in OpenHands. I do not use one model for everything. I can start with a fast variant for inspection, move to a stronger variant for implementation planning, and reserve a larger variant for decisions where the cost of being wrong is higher.&lt;/p&gt;

&lt;h2&gt;
  
  
  Open WebUI as the Multimodal Lane
&lt;/h2&gt;

&lt;p&gt;I do not use Open WebUI only as a nicer chat UI. In my workflow it has three roles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Technical chat:&lt;/strong&gt; discuss a bug, explain a module, compare implementation approaches.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multimodal input:&lt;/strong&gt; upload screenshots, diagrams, error captures, UI images or visual material that helps describe a task.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image generation:&lt;/strong&gt; create quick assets, documentation visuals, cover images or architecture illustrations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhxc3ydpxcx8v7tw9497t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhxc3ydpxcx8v7tw9497t.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is useful because many real tasks start visually:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"This component looks broken."&lt;/li&gt;
&lt;li&gt;"This onboarding flow is confusing."&lt;/li&gt;
&lt;li&gt;"This chart does not explain the data."&lt;/li&gt;
&lt;li&gt;"This error appears on screen after checkout."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of manually translating all of that into text, I use Open WebUI to turn visual material into actionable context.&lt;/p&gt;

&lt;p&gt;For images, my stack can use Ollama's OpenAI-compatible API from Open WebUI. I also keep a separate ComfyUI lane for more controlled image workflows. I do not mix that with OpenHands: multimodal reasoning and image generation live in Open WebUI; code editing lives in OpenHands.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Workflow
&lt;/h2&gt;

&lt;p&gt;The pattern that works best for me is not asking the agent to do everything at once. I use an explicit workflow, and it often starts in GitHub or GitLab.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv79jbsi3i39n88kvb9t3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv79jbsi3i39n88kvb9t3.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Open WebUI and OpenHands do not play the same role.&lt;/p&gt;

&lt;p&gt;Open WebUI is my reasoning and multimodal context table. OpenHands is my workbench. GitHub and GitLab are the real task queue.&lt;/p&gt;

&lt;h2&gt;
  
  
  GitHub and GitLab as Workflow Inputs
&lt;/h2&gt;

&lt;p&gt;There is a big difference between "trying a model" and "working with a copilot." The difference is where tasks come from.&lt;/p&gt;

&lt;p&gt;In my case, many tasks already exist as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub issues;&lt;/li&gt;
&lt;li&gt;GitLab issues;&lt;/li&gt;
&lt;li&gt;pull requests with pending review comments;&lt;/li&gt;
&lt;li&gt;merge requests with feedback;&lt;/li&gt;
&lt;li&gt;bugs reported with screenshots;&lt;/li&gt;
&lt;li&gt;technical discussions that need to become code changes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The flow looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fovqnfctnhpsq9k6ftaq6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fovqnfctnhpsq9k6ftaq6.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This helps me avoid vague prompts. Instead of telling the agent "improve this project," I start from a concrete task that already has social and product context: who asked for it, why it matters, what was discussed, which files it may touch and how it will be reviewed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example: From Bug Report to Local Patch
&lt;/h2&gt;

&lt;p&gt;Suppose I have this bug:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The search endpoint returns duplicate results when the user sends the same filter with different casing.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In Open WebUI, I start broadly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I am working on a backend with search endpoints.
There is a bug: if the user sends repeated filters with different casing,
the endpoint returns duplicate results.

Before touching code, give me an investigation plan:
- which files would you look for
- which tests would you expect to find
- which edge cases should be covered
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Gemma 4 does not need to touch the repository yet. I only want help thinking.&lt;/p&gt;

&lt;p&gt;Then I move to OpenHands with a more concrete task:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Work in /workspace/my-repo.

Goal:
Fix the bug where repeated filters with different casing generate duplicate results.

Constraints:
- Do not change the public API.
- Keep the existing project style.
- Add or adjust focused tests.
- Run the relevant suite before finishing.

Deliverable:
- Summary of changed files.
- Short explanation of the fix.
- Commands executed and their result.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That prompt change is intentional. I do not say "fix it" in a generic way. I give context, boundaries and a verifiable deliverable.&lt;/p&gt;

&lt;p&gt;If the bug comes from GitHub or GitLab, I add one more layer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Remote context:
- Issue: https://github.com/org/repo/issues/123
- Base branch: main
- Suggested work branch: fix/search-filter-deduplication

Read the issue as the functional specification.
If there is ambiguity between the issue and the current code,
prioritize existing behavior and call out the question in the final summary.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the issue includes screenshots, I inspect them first in Open WebUI with Gemma 4. That lets me turn visual evidence into acceptance criteria before asking OpenHands to edit files.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Choose a Gemma 4 Variant
&lt;/h2&gt;

&lt;p&gt;I do not think about models as a ladder where "bigger always wins." I think in lanes.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task type&lt;/th&gt;
&lt;th&gt;Gemma 4 variant I would try first&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Quick chat, classification, short summaries&lt;/td&gt;
&lt;td&gt;E2B&lt;/td&gt;
&lt;td&gt;Low latency and a good fit for simple tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Screenshots, diagrams, UI explanation, task drafting&lt;/td&gt;
&lt;td&gt;E4B&lt;/td&gt;
&lt;td&gt;Good balance for multimodal reasoning and general assistance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Explaining code, reviewing functions, drafting tests&lt;/td&gt;
&lt;td&gt;E4B / 26B A4B&lt;/td&gt;
&lt;td&gt;Depends on the size of the change and the context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Medium refactors, multi-file debugging&lt;/td&gt;
&lt;td&gt;26B A4B&lt;/td&gt;
&lt;td&gt;More capacity without always jumping to the heaviest model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Architecture review, long context, complex decisions&lt;/td&gt;
&lt;td&gt;31B&lt;/td&gt;
&lt;td&gt;When quality matters more than latency&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This table is not a universal truth. It is a practical starting point. Local hardware, quantization, runtime and configured context size can change the experience a lot.&lt;/p&gt;

&lt;p&gt;In OpenHands, I like having more than one option configured because the agent's behavior changes with the model. A smaller variant may be enough for short inspection tasks. For multi-module planning, I prefer a stronger one. For architectural review, I accept more latency if the answer is more careful.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Prompt Template for Local Agents
&lt;/h2&gt;

&lt;p&gt;This is the structure I use most often with OpenHands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Context:
I am in an existing repository. Read before editing.
The task comes from [GitHub/GitLab issue or PR/MR].

Goal:
[describe the expected result in one sentence]

Constraints:
- Keep existing patterns.
- Do not do unrelated refactors.
- Do not change global configuration unless required.
- If there is ambiguity, explain the decision.

Verification:
- Run the related tests.
- If something cannot be run, explain why.

Deliverable:
- Changed files.
- Summary of the change.
- Commands executed.
- Link or reference to the remote task.
- Risks or follow-ups.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With local models, this structure helps a lot. It reduces ambiguity and pushes the agent to behave like a software collaborator instead of a text generator.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Cycle I Use
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;stateDiagram-v2
    [*] --&amp;gt; Think
    Think: Open WebUI\nunderstand problem\ntext + images
    Think --&amp;gt; Scope
    Scope: small task\nissue/PR/MR + constraints
    Scope --&amp;gt; Act
    Act: OpenHands\nselected Gemma model\nread edit run
    Act --&amp;gt; Review
    Review: inspect diff\nvalidate tests
    Review --&amp;gt; Commit: if good
    Review --&amp;gt; Scope: if context is missing
    Commit --&amp;gt; [*]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key is keeping tasks small. A local agent can be very useful, but it is still probabilistic software. My rule is simple: if I could not review the diff in a few minutes, the task is too large.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Worked Well
&lt;/h2&gt;

&lt;p&gt;The best part of the setup is the feeling of control.&lt;/p&gt;

&lt;p&gt;I can start the local stack, switch models, test prompts, share only the folders I want and shut everything down when I am done. For private projects, prototypes and learning, that reduced friction matters.&lt;/p&gt;

&lt;p&gt;I also like having separate modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multimodal conversation mode:&lt;/strong&gt; I think with Gemma 4 in Open WebUI using text, images, screenshots and diagrams.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Visual generation mode:&lt;/strong&gt; I create images or supporting assets from Open WebUI when a post, documentation page or product task needs them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Action mode:&lt;/strong&gt; I delegate a concrete task to OpenHands and choose the Gemma model that best fits.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Repository mode:&lt;/strong&gt; I bring context from GitHub or GitLab and turn it into a local branch with a reviewable diff.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That boundary prevents every conversation from becoming an execution. Not every prompt deserves filesystem access.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Still Requires Care
&lt;/h2&gt;

&lt;p&gt;Not everything is automatic.&lt;/p&gt;

&lt;p&gt;Local agents are sensitive to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prompt quality;&lt;/li&gt;
&lt;li&gt;configured context size;&lt;/li&gt;
&lt;li&gt;quantization choices;&lt;/li&gt;
&lt;li&gt;hardware latency;&lt;/li&gt;
&lt;li&gt;runtime stability;&lt;/li&gt;
&lt;li&gt;the model's ability to follow tool instructions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I also learned that it is useful to keep fallback models. In my stack, I keep coding-specialized models next to the general model. That lets me compare answers or switch lanes if a specific task gets stuck.&lt;/p&gt;

&lt;p&gt;Another lesson: connected repositories speed things up, but they also require discipline. A GitHub or GitLab issue can carry a lot of context, but not all of that context is specification. Sometimes it includes opinions, old assumptions or contradictory comments. That is why I like passing through Open WebUI first to synthesize acceptance criteria before opening the OpenHands lane.&lt;/p&gt;

&lt;h2&gt;
  
  
  Local Security: Not Magic, But Better Boundaries
&lt;/h2&gt;

&lt;p&gt;Running locally does not automatically mean "secure." It means I have more control over where the code lives and which processes can read it.&lt;/p&gt;

&lt;p&gt;My basic rules are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;expose Open WebUI and OpenHands only on &lt;code&gt;127.0.0.1&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;mount a scoped working directory, not the whole disk;&lt;/li&gt;
&lt;li&gt;review diffs before committing;&lt;/li&gt;
&lt;li&gt;do not give real secrets to the agent;&lt;/li&gt;
&lt;li&gt;use GitHub/GitLab tokens with minimum required permissions when needed;&lt;/li&gt;
&lt;li&gt;avoid mounting global credentials into the sandbox;&lt;/li&gt;
&lt;li&gt;use disposable repositories for aggressive experiments;&lt;/li&gt;
&lt;li&gt;keep logs and configuration outside the application repository.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Privacy does not come from one tool. It comes from designing the workflow with clear limits.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;The discussion around open models often stays at the benchmark level. Benchmarks matter, but as a developer I care about a more practical question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What can I do today, on my own machine, with enough quality and control to actually change my development workflow?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Gemma 4 points directly at that question. Not because it automatically replaces every closed model, but because it makes a category of local setups more viable: assistants that can reason over text and images, generate supporting material, work with repositories and integrate with open tools.&lt;/p&gt;

&lt;p&gt;For me, the near future is not one giant cloud copilot. It is a combination of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;open models;&lt;/li&gt;
&lt;li&gt;local runtimes;&lt;/li&gt;
&lt;li&gt;hackable interfaces;&lt;/li&gt;
&lt;li&gt;multimodal inputs;&lt;/li&gt;
&lt;li&gt;agents with limited permissions;&lt;/li&gt;
&lt;li&gt;repositories connected to real tasks;&lt;/li&gt;
&lt;li&gt;developers who understand their own architecture.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Gemma 4 fits that direction well.&lt;/p&gt;

&lt;h2&gt;
  
  
  Base Commands for the Stack
&lt;/h2&gt;

&lt;p&gt;My local flow starts with Ollama on the host:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;OLLAMA_CONTEXT_LENGTH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;32768 &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nv"&gt;OLLAMA_KEEP_ALIVE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;30m &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nv"&gt;OLLAMA_HOST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0.0.0.0:11434 &lt;span class="se"&gt;\&lt;/span&gt;
ollama serve
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then I pull the model I want to test:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull gemma4:e4b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I can also keep multiple variants available and choose by task:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull gemma4:e2b
ollama pull gemma4:e4b
ollama pull gemma4:26b-a4b
ollama pull gemma4:31b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your runtime publishes the variants under different names, replace those identifiers with the correct names for Ollama, Hugging Face or Kaggle.&lt;/p&gt;

&lt;p&gt;For image generation from Open WebUI, my stack uses a local OpenAI-compatible endpoint. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull x/flux2-klein:4b
ollama pull x/z-image-turbo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then I start the interfaces:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt; open-webui openhands comfyui
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Local URLs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open WebUI: &lt;code&gt;http://localhost:3000&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;OpenHands: &lt;code&gt;http://localhost:3001&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;ComfyUI: &lt;code&gt;http://localhost:8188&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Ollama API: &lt;code&gt;http://localhost:11434&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To bring in tasks and branches from remote repositories:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone git@github.com:org/repo.git
git clone git@gitlab.com:org/repo.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also use &lt;code&gt;gh&lt;/code&gt; or &lt;code&gt;glab&lt;/code&gt; to fetch issues, check out PRs/MRs or inspect review comments from the terminal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Minimal OpenHands Configuration
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[core]&lt;/span&gt;

&lt;span class="nn"&gt;[llm]&lt;/span&gt;
&lt;span class="py"&gt;model&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"openai/gemma4:e4b"&lt;/span&gt;
&lt;span class="py"&gt;base_url&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"http://host.docker.internal:11434/v1"&lt;/span&gt;
&lt;span class="py"&gt;ollama_base_url&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"http://host.docker.internal:11434"&lt;/span&gt;
&lt;span class="py"&gt;api_key&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"local-llm"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To switch models, I keep explicit &lt;code&gt;model&lt;/code&gt; values for the task:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="c"&gt;# Fast inspection&lt;/span&gt;
&lt;span class="py"&gt;model&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"openai/gemma4:e2b"&lt;/span&gt;

&lt;span class="c"&gt;# General balance&lt;/span&gt;
&lt;span class="py"&gt;model&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"openai/gemma4:e4b"&lt;/span&gt;

&lt;span class="c"&gt;# More complex changes&lt;/span&gt;
&lt;span class="py"&gt;model&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"openai/gemma4:26b-a4b"&lt;/span&gt;

&lt;span class="c"&gt;# Deeper review&lt;/span&gt;
&lt;span class="py"&gt;model&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"openai/gemma4:31b"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In Docker Compose, the important part is mounting the workspace and pointing OpenHands to the local endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;openhands&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker.openhands.dev/openhands/openhands:1.6&lt;/span&gt;
  &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;127.0.0.1:3001:3000"&lt;/span&gt;
  &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;RUNTIME&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;docker"&lt;/span&gt;
    &lt;span class="na"&gt;LLM_MODEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai/gemma4:e4b"&lt;/span&gt;
    &lt;span class="na"&gt;LLM_BASE_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://host.docker.internal:11434/v1"&lt;/span&gt;
    &lt;span class="na"&gt;LLM_OLLAMA_BASE_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://host.docker.internal:11434"&lt;/span&gt;
    &lt;span class="na"&gt;LLM_API_KEY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;local-llm"&lt;/span&gt;
  &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;/var/run/docker.sock:/var/run/docker.sock&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./workspace:/workspace:rw&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;/Users/me/projects:/workspace/host-projects:rw&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Image Generation in Open WebUI
&lt;/h2&gt;

&lt;p&gt;In Open WebUI, I enable image generation against my local endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ENABLE_IMAGE_GENERATION=true
IMAGE_GENERATION_ENGINE=openai
IMAGES_OPENAI_API_BASE_URL=http://host.docker.internal:11434/v1
IMAGES_OPENAI_API_KEY=ollama
IMAGE_GENERATION_MODEL=x/flux2-klein:4b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Final Mental Model
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fudgy025xx7ys2cfjkqfl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fudgy025xx7ys2cfjkqfl.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The most important part of the diagram is the last one: &lt;strong&gt;developer judgment&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The model accelerates. The agent executes. But the engineering judgment is still mine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;Gemma 4 is exciting because it lowers the barrier for building more useful local assistants. Not just chatbots. Not just demos. Real workflows where an open model can help understand text and images, generate supporting assets, modify code and validate software inside a machine I control.&lt;/p&gt;

&lt;p&gt;My conclusion after building this setup is simple: the leap is not only in the model. It is in connecting the model to a well-designed workflow.&lt;/p&gt;

&lt;p&gt;Gemma 4 + Open WebUI + OpenHands + GitHub/GitLab is one concrete way to do that.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
      <category>ai</category>
    </item>
    <item>
      <title>Gemma Local Code Mentor: A Local-First VS Code AI Assistant Powered by Gemma 4</title>
      <dc:creator>Enny Rodríguez</dc:creator>
      <pubDate>Fri, 08 May 2026 21:12:08 +0000</pubDate>
      <link>https://dev.to/theelmix/i-built-a-local-first-vscode-code-mentor-with-gemma-4-your-code-never-leaves-your-machine-143c</link>
      <guid>https://dev.to/theelmix/i-built-a-local-first-vscode-code-mentor-with-gemma-4-your-code-never-leaves-your-machine-143c</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Build with Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Most AI coding tools ask for the same tradeoff:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Give me your code, and I'll give you help."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I wanted to try the opposite.&lt;/p&gt;

&lt;p&gt;What if a coding mentor lived inside VS Code, understood your repository, helped with real developer tasks, and kept your code on your own machine by default?&lt;/p&gt;

&lt;p&gt;So I built &lt;strong&gt;Gemma Local Code Mentor&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Gemma Local Code Mentor&lt;/strong&gt; is a local-first VS Code extension powered by Gemma 4.&lt;/p&gt;

&lt;p&gt;It can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Explain selected code&lt;/li&gt;
&lt;li&gt;Suggest refactors&lt;/li&gt;
&lt;li&gt;Generate tests&lt;/li&gt;
&lt;li&gt;Summarize files&lt;/li&gt;
&lt;li&gt;Summarize repository architecture&lt;/li&gt;
&lt;li&gt;Answer questions about the repo&lt;/li&gt;
&lt;li&gt;Run through a local FastAPI backend&lt;/li&gt;
&lt;li&gt;Use Ollama as the default local model runtime&lt;/li&gt;
&lt;li&gt;Keep Local Only Mode enabled by default&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No telemetry.&lt;br&gt;
No cloud fallback.&lt;br&gt;
No external API calls while Local Only Mode is on.&lt;br&gt;
Your code stays where it belongs: on your machine.&lt;/p&gt;
&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I built a VS Code extension plus a Dockerized FastAPI backend for developers who want AI help without sending private code to a remote API.&lt;/p&gt;

&lt;p&gt;The workflow is simple:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Select code in VS Code.&lt;/li&gt;
&lt;li&gt;Run a &lt;code&gt;Gemma:&lt;/code&gt; command.&lt;/li&gt;
&lt;li&gt;The extension sends context to &lt;code&gt;127.0.0.1:8765&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The backend builds a task-specific prompt.&lt;/li&gt;
&lt;li&gt;Gemma 4 responds through a local provider.&lt;/li&gt;
&lt;li&gt;The result appears in a VS Code side panel.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The extension currently includes these commands:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Gemma: Explain Selection&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Gemma: Refactor Selection&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Gemma: Generate Tests&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Gemma: Summarize File&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Gemma: Summarize Architecture&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Gemma: Ask Repository&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Gemma: Toggle Local Only Mode&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Gemma: Open Panel&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not just a chat box glued into an editor. The backend has structured prompt builders, response parsing, provider routing, tests, repository context handling, and privacy checks.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why I Built It
&lt;/h2&gt;

&lt;p&gt;There are many AI coding assistants now, but the privacy model often feels backwards.&lt;/p&gt;

&lt;p&gt;For open source code, cloud tools are usually fine.&lt;/p&gt;

&lt;p&gt;For client code, internal company projects, security-sensitive prototypes, or early startup ideas, uploading code somewhere else can be a blocker.&lt;/p&gt;

&lt;p&gt;I wanted a coding assistant with different defaults:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Typical Cloud Assistant&lt;/th&gt;
&lt;th&gt;Gemma Local Code Mentor&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Runs in VS Code&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Explains code&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Generates tests&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Refactors code&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sends code to cloud&lt;/td&gt;
&lt;td&gt;Often&lt;/td&gt;
&lt;td&gt;No by default&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Works with local models&lt;/td&gt;
&lt;td&gt;Usually no&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Has a local-only switch&lt;/td&gt;
&lt;td&gt;Rare&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Can be hacked by contributors&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Fully open source&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The goal is not to beat every commercial coding assistant.&lt;/p&gt;

&lt;p&gt;The goal is to prove that a useful AI coding mentor can be local-first from day one.&lt;/p&gt;
&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;Suggested demo flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open a real code file in VS Code.&lt;/li&gt;
&lt;li&gt;Select a function.&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;Gemma: Explain Selection&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;Gemma: Generate Tests&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Ask a repository-level question.&lt;/li&gt;
&lt;li&gt;Show the side panel with &lt;code&gt;Local Only Mode: ON&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Show the backend running locally.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;Repository:&lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/ennydev-2026" rel="noopener noreferrer"&gt;
        ennydev-2026
      &lt;/a&gt; / &lt;a href="https://github.com/ennydev-2026/GemmaLocalCodeMentor" rel="noopener noreferrer"&gt;
        GemmaLocalCodeMentor
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      GLCM -  Gemma Local Code Mentor
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;Gemma Local Code Mentor&lt;/h1&gt;
&lt;/div&gt;
&lt;p&gt;Gemma Local Code Mentor is a local-first VSCode extension and Dockerized FastAPI backend for explaining, refactoring, testing, and summarizing code with local Gemma models.&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;What It Does&lt;/h2&gt;
&lt;/div&gt;
&lt;p&gt;The project runs on the developer's machine:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;VSCode extension in TypeScript.&lt;/li&gt;
&lt;li&gt;Local FastAPI backend on &lt;code&gt;127.0.0.1:8765&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Ollama as the default local model runtime.&lt;/li&gt;
&lt;li&gt;Local sample provider for development and tests without installed models.&lt;/li&gt;
&lt;li&gt;Double-model routing
&lt;ul&gt;
&lt;li&gt;Fast model for short explanations and lightweight chat.&lt;/li&gt;
&lt;li&gt;Deep model for refactors, tests, architecture, and larger context.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Local Only Mode enabled by default.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Architecture&lt;/h2&gt;

&lt;/div&gt;

  &lt;div class="js-render-enrichment-target"&gt;
    &lt;div class="render-plaintext-hidden"&gt;
      &lt;pre&gt;flowchart LR
    A["VSCode Extension"] --&amp;gt; B["FastAPI Backend :8765"]
    B --&amp;gt; C["Prompt Orchestrator"]
    B --&amp;gt; D["Repo Context Builder"]
    B --&amp;gt; E["Local Index Store"]
    B --&amp;gt; F["Model Router"]
    F --&amp;gt; G["Fast Gemma Model"]
    F --&amp;gt; H["Deep Gemma Model"]
    G --&amp;gt; I["Ollama"]
    H --&amp;gt; I
    B --&amp;gt; J["Response Parser"]
    J --&amp;gt; A
&lt;/pre&gt;
    &lt;/div&gt;
  &lt;/div&gt;
  &lt;span class="js-render-enrichment-loader d-flex flex-justify-center flex-items-center width-full"&gt;
    &lt;span&gt;
  
    
    
    &lt;span class="sr-only"&gt;Loading&lt;/span&gt;
&lt;/span&gt;
  &lt;/span&gt;


&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Commands&lt;/h2&gt;

&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;gemma.explainSelection&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gemma.refactorSelection&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gemma.generateTests&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gemma.summarizeFile&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gemma.summarizeArchitecture&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gemma.askRepo&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gemma.togglePrivacyMode&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gemma.openPanel&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="markdown-heading"&gt;…&lt;/div&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/ennydev-2026/GemmaLocalCodeMentor" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;Direct link:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/ennydev-2026/GemmaLocalCodeMentor" rel="noopener noreferrer"&gt;https://github.com/ennydev-2026/GemmaLocalCodeMentor&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Used Gemma 4
&lt;/h2&gt;

&lt;p&gt;I used Gemma 4 as the reasoning layer behind the local code mentor.&lt;/p&gt;

&lt;p&gt;The project is designed around two model roles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Gemma 4 E4B&lt;/strong&gt; for fast tasks like short explanations and lightweight chat&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemma 4 31B Dense&lt;/strong&gt; for deeper tasks like refactoring, test generation, architecture summaries, and larger context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That choice was intentional.&lt;/p&gt;

&lt;p&gt;A code mentor should not use the largest model for every single request. If I ask what a small function does, I want a fast answer. If I ask for tests, architecture, or a refactor, I want deeper reasoning.&lt;/p&gt;

&lt;p&gt;So the backend includes a model router:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;fast&lt;/code&gt; mode uses the fast model&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;deep&lt;/code&gt; mode uses the deep model&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;auto&lt;/code&gt; mode chooses based on task type and context size&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This makes Gemma 4 feel more like a practical local development tool instead of a single hardcoded model call.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Gemma 4 specifically
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Explain selection&lt;/td&gt;
&lt;td&gt;Gemma 4 E4B&lt;/td&gt;
&lt;td&gt;Fast local response&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Generate tests&lt;/td&gt;
&lt;td&gt;Gemma 4 31B Dense&lt;/td&gt;
&lt;td&gt;More reasoning depth&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Architecture summary&lt;/td&gt;
&lt;td&gt;Gemma 4 31B Dense&lt;/td&gt;
&lt;td&gt;Larger context and better synthesis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ask repo&lt;/td&gt;
&lt;td&gt;Auto router&lt;/td&gt;
&lt;td&gt;Chooses by task/context size&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyvdfm6yovralalqz9idr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyvdfm6yovralalqz9idr.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;VS Code extension in TypeScript&lt;/li&gt;
&lt;li&gt;FastAPI backend in Python&lt;/li&gt;
&lt;li&gt;Ollama as the default local runtime&lt;/li&gt;
&lt;li&gt;Docker support&lt;/li&gt;
&lt;li&gt;Mock provider for development and tests&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.gemmaignore&lt;/code&gt; support&lt;/li&gt;
&lt;li&gt;Local URL safety checks&lt;/li&gt;
&lt;li&gt;Backend test coverage with &lt;code&gt;pytest&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Local-First Is a Product Decision
&lt;/h2&gt;

&lt;p&gt;The privacy layer is not just a README promise.&lt;/p&gt;

&lt;p&gt;The repo includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Local Only Mode enabled by default&lt;/li&gt;
&lt;li&gt;Backend URL validation&lt;/li&gt;
&lt;li&gt;No telemetry&lt;/li&gt;
&lt;li&gt;No cloud fallback&lt;/li&gt;
&lt;li&gt;No external API calls while local-only is enabled&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.gemmaignore&lt;/code&gt; for excluding sensitive files&lt;/li&gt;
&lt;li&gt;Mock mode so contributors can work without installing a model first&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That matters because local AI changes who can safely use these tools.&lt;/p&gt;

&lt;p&gt;A freelancer can use it on client code.&lt;br&gt;
A company can test AI workflows without sending source code away.&lt;br&gt;
A student can learn from a mentor without paying API costs.&lt;br&gt;
An open-source maintainer can customize the whole stack.&lt;/p&gt;
&lt;h2&gt;
  
  
  Run It Locally
&lt;/h2&gt;

&lt;p&gt;Backend:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;backend
python &lt;span class="nt"&gt;-m&lt;/span&gt; venv .venv
&lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
uvicorn app.main:app &lt;span class="nt"&gt;--host&lt;/span&gt; 127.0.0.1 &lt;span class="nt"&gt;--port&lt;/span&gt; 8765 &lt;span class="nt"&gt;--reload&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Mock mode, no model required:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;backend
&lt;span class="nv"&gt;GEMMA_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mock uvicorn app.main:app &lt;span class="nt"&gt;--host&lt;/span&gt; 127.0.0.1 &lt;span class="nt"&gt;--port&lt;/span&gt; 8765 &lt;span class="nt"&gt;--reload&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Extension:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;extension
npm &lt;span class="nb"&gt;install
&lt;/span&gt;npm run compile
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then open the project in VS Code, press &lt;code&gt;F5&lt;/code&gt;, and run any &lt;code&gt;Gemma:&lt;/code&gt; command.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Want Help With
&lt;/h2&gt;

&lt;p&gt;This is where I want the community involved.&lt;/p&gt;

&lt;p&gt;I would love contributors for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Better repository indexing&lt;/li&gt;
&lt;li&gt;Smarter prompt templates&lt;/li&gt;
&lt;li&gt;More language-aware code analysis&lt;/li&gt;
&lt;li&gt;Inline code actions&lt;/li&gt;
&lt;li&gt;Diff previews before applying refactors&lt;/li&gt;
&lt;li&gt;Local embeddings for repo search&lt;/li&gt;
&lt;li&gt;Better test framework detection&lt;/li&gt;
&lt;li&gt;llama.cpp provider support&lt;/li&gt;
&lt;li&gt;MLX provider support&lt;/li&gt;
&lt;li&gt;A polished marketplace-ready VSIX&lt;/li&gt;
&lt;li&gt;UI improvements for the side panel&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you care about local AI, open models, privacy-respecting devtools, or VS Code extensions, jump in.&lt;/p&gt;

&lt;p&gt;Fork it.&lt;br&gt;
Open an issue.&lt;br&gt;
Try another Gemma 4 model.&lt;br&gt;
Add a provider.&lt;br&gt;
Improve the prompts.&lt;br&gt;
Make the UX better.&lt;/p&gt;
&lt;h2&gt;
  
  
  Install the VS Code Extension
&lt;/h2&gt;

&lt;p&gt;You can install the extension directly in VS Code using this identifier:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ennydev-2026.gemma-local-code-mentor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;AI coding tools are becoming part of the daily developer workflow.&lt;/p&gt;

&lt;p&gt;That means defaults matter.&lt;/p&gt;

&lt;p&gt;The default should not always be:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Upload your code first."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Sometimes the best place for your code is exactly where it already is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;on your machine.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;What would you add to a local-first VS Code code mentor?&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
      <category>vscode</category>
    </item>
  </channel>
</rss>
