<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Harsha.B.M</title>
    <description>The latest articles on DEV Community by Harsha.B.M (@harshabm_e558522b28f940).</description>
    <link>https://dev.to/harshabm_e558522b28f940</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3936446%2Fe45d7662-1eb5-497d-84bb-2fcf9810cabe.jpg</url>
      <title>DEV Community: Harsha.B.M</title>
      <link>https://dev.to/harshabm_e558522b28f940</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/harshabm_e558522b28f940"/>
    <language>en</language>
    <item>
      <title>From Prompts to Action: What Gemini 3.5 Flash and the Agentic Stack Mean for Developers</title>
      <dc:creator>Harsha.B.M</dc:creator>
      <pubDate>Sun, 24 May 2026 07:33:31 +0000</pubDate>
      <link>https://dev.to/harshabm_e558522b28f940/from-prompts-to-action-what-gemini-35-flash-and-the-agentic-stack-mean-for-developers-i49</link>
      <guid>https://dev.to/harshabm_e558522b28f940/from-prompts-to-action-what-gemini-35-flash-and-the-agentic-stack-mean-for-developers-i49</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-io-writing-2026-05-19"&gt;Google I/O Writing Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;There's a phrase Google kept repeating throughout the I/O 2026 keynotes: &lt;strong&gt;"from prompts to action."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At first, it sounds like marketing. But after sitting with the full set of announcements — Gemini 3.5 Flash, Managed Agents, Antigravity 2.0, WebMCP — I think it's actually a precise description of where we are right now as developers. And it's worth unpacking seriously, because the implications for how we build software are bigger than any single model release.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Headline: Gemini 3.5 Flash Beats Last Year's Pro
&lt;/h2&gt;

&lt;p&gt;Let's start with the model itself, because the benchmark story is genuinely interesting.&lt;/p&gt;

&lt;p&gt;Gemini 3.5 Flash outperforms Gemini 3.1 Pro across almost all benchmarks — including challenging agentic benchmarks like Terminal-Bench 2.1 (76.2%) and MCP Atlas (83.6%) — while running &lt;strong&gt;four times faster&lt;/strong&gt; than comparable frontier models. It's available today via the Gemini API, AI Studio, and Android Studio.&lt;/p&gt;

&lt;p&gt;This matters for a specific reason: historically, you traded speed for intelligence. Flash was fast and cheap; Pro was smart but slow. That trade-off shaped how we architected agentic systems — you'd use Flash for quick tool calls and route harder reasoning to Pro.&lt;/p&gt;

&lt;p&gt;3.5 Flash collapses that boundary. A model at Flash speed that thinks like a Pro model changes the economics and architecture of every agent loop you're building.&lt;/p&gt;

&lt;p&gt;Pricing sits at $1.50 input / $9.00 output per million tokens, with a 1M token context window. Dynamic thinking is on by default.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Story: Google Shipped a Vertical Stack
&lt;/h2&gt;

&lt;p&gt;Here's what I think most post-event coverage is underweighting: &lt;strong&gt;Google didn't just ship a model. They shipped a production pipeline.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Lay it out end to end:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Gemini 3.5 Flash&lt;/strong&gt; — the fast, frontier-grade model powering every layer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Managed Agents in the Gemini API&lt;/strong&gt; — a single API call that spins up an isolated Linux sandbox, where an agent can reason, use tools, execute code, manage files, and browse the web, with persistent state across calls&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Antigravity 2.0&lt;/strong&gt; — a standalone desktop app for orchestrating agents, with parallel subagent execution, scheduled background tasks, and integrations across AI Studio, Android, and Firebase&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Antigravity CLI + SDK&lt;/strong&gt; — command-line and programmatic access to the same agent harness&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WebMCP&lt;/strong&gt; — a proposed open web standard that lets you expose JavaScript functions and HTML forms as structured tools to browser-based agents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Modern Web Guidance&lt;/strong&gt; — curated, expert-vetted skills that guide AI coding tools across common use cases, defined in simple markdown files like &lt;code&gt;AGENTS.md&lt;/code&gt; and &lt;code&gt;SKILL.md&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not a model + plugin. It's a full vertical from model inference to production deployment, with Google owning Chrome, Android, Play, and the web standards process at the edges. That's a meaningfully different competitive posture.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Managed Agents Actually Unlocks
&lt;/h2&gt;

&lt;p&gt;The feature I keep coming back to is Managed Agents, and I think it deserves a closer look.&lt;/p&gt;

&lt;p&gt;Previously, building a stateful agent workflow meant managing your own execution environment: provisioning compute, handling context across turns, wiring up tools, and keeping state between calls. A lot of the complexity in agentic systems wasn't AI logic — it was infrastructure plumbing.&lt;/p&gt;

&lt;p&gt;Managed Agents changes this. One API call provisions an isolated cloud Linux environment. The agent has tools, can execute code, browse, manage files. Subsequent API calls resume the same session with all state intact — no reinitializing context on every turn. Google describes it as multi-turn agentic workflows that just work.&lt;/p&gt;

&lt;p&gt;For developers who've spent time building agent infrastructure from scratch, this is the kind of abstraction that genuinely saves weeks.&lt;/p&gt;




&lt;h2&gt;
  
  
  One Honest Caveat on Developer Experience
&lt;/h2&gt;

&lt;p&gt;I want to flag something that the official announcements gloss over.&lt;/p&gt;

&lt;p&gt;If you're migrating from &lt;code&gt;gemini-3-flash-preview&lt;/code&gt; to &lt;code&gt;gemini-3.5-flash&lt;/code&gt;, there's a silent breaking change: &lt;strong&gt;the default &lt;code&gt;thinking_level&lt;/code&gt; is now &lt;code&gt;medium&lt;/code&gt;, not &lt;code&gt;high&lt;/code&gt;&lt;/strong&gt;. A straight copy-paste port will produce different outputs without any obvious error.&lt;/p&gt;

&lt;p&gt;Also worth knowing: if you're running agent workflows through GitHub Copilot, each Flash call meters at 14x premium requests. For serious agentic work, the direct API path through the Antigravity SDK or Vertex AI is dramatically cheaper — roughly 37x cheaper at scale.&lt;/p&gt;

&lt;p&gt;These are the kinds of details that matter when you're building in production, and I wish they were more prominent in the launch documentation.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bigger Shift Worth Paying Attention To
&lt;/h2&gt;

&lt;p&gt;Here's what I think I/O 2026 signals at the macro level.&lt;/p&gt;

&lt;p&gt;We spent the last two years asking "how smart is the model?" That question is becoming less useful. 3.5 Flash beating 3.1 Pro on agentic benchmarks while running faster is partly a story about model capability — but it's mostly a story about &lt;strong&gt;optimization for a specific use case&lt;/strong&gt;: multi-step, tool-heavy, real-world agent loops.&lt;/p&gt;

&lt;p&gt;The new question developers need to be asking is: &lt;strong&gt;what is the execution surface?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Google's answer is clear: the execution surface is the agent harness, and they want it to be Antigravity — running in their cloud, on their desktop app, through their API, deployed to Android through their studio. AppFunctions on Android lets apps expose capabilities directly to intelligent agents. WebMCP brings the same primitive to the browser.&lt;/p&gt;

&lt;p&gt;This is Google saying: the next layer of developer platform isn't a runtime or a framework. It's an agent execution environment. And they're racing to own it end-to-end.&lt;/p&gt;

&lt;p&gt;Whether that's exciting or concerning probably depends on your appetite for platform consolidation. But either way, it's the most coherent platform story I've seen from Google in years.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I'm Watching Next
&lt;/h2&gt;

&lt;p&gt;A few things I'll be paying close attention to in the weeks ahead:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemini 3.5 Pro&lt;/strong&gt; is confirmed in development and expected to roll out next month (June 2026). If it extends the 3.5 Flash pattern — frontier reasoning at improved speed — that's a significant shift in the model tier structure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;WebMCP adoption&lt;/strong&gt; will be the real test of whether Google can make agent-native web a standard rather than a proprietary feature. Open standards only work when other browsers and toolchains adopt them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Managed Agents in production&lt;/strong&gt; — I want to see real developer reports on latency, reliability, and cost at scale before recommending it for production workloads. The abstraction is elegant; the question is whether the infrastructure behind it delivers.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Take
&lt;/h2&gt;

&lt;p&gt;Google I/O 2026 wasn't a "look how smart our model is" event. It was a platform architecture announcement dressed up as a model launch.&lt;/p&gt;

&lt;p&gt;The Gemini 3.5 Flash numbers are real and impressive. But the more important thing Google shipped is a complete vertical stack for agent development — from a fast, frontier-grade model to managed execution environments to desktop tooling to web standards. That's infrastructure, not just AI.&lt;/p&gt;

&lt;p&gt;For developers, the immediate practical wins are clear: faster and cheaper inference for agentic workflows, and a significantly lower infrastructure burden if you're building stateful agents. The longer arc — whether Google's agentic platform becomes the dominant execution layer for the next generation of applications — is a bigger question, and one that's going to be answered by what gets built on it.&lt;/p&gt;

&lt;p&gt;That's the part I find most worth watching.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have you tried Gemini 3.5 Flash or Managed Agents yet? I'd love to hear what you're building in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>googleiochallenge</category>
      <category>ai</category>
      <category>gemini</category>
    </item>
    <item>
      <title>Which Gemma 4 Model Should You Actually Use? A Developer’s Honest Guide</title>
      <dc:creator>Harsha.B.M</dc:creator>
      <pubDate>Sun, 24 May 2026 07:13:34 +0000</pubDate>
      <link>https://dev.to/harshabm_e558522b28f940/which-gemma-4-model-should-you-actually-use-a-developers-honest-guide-339m</link>
      <guid>https://dev.to/harshabm_e558522b28f940/which-gemma-4-model-should-you-actually-use-a-developers-honest-guide-339m</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Write About Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  Which Gemma 4 Model Should You Actually Use? A Developer's Honest Guide
&lt;/h1&gt;

&lt;p&gt;When Google DeepMind dropped Gemma 4 on April 2, 2026, the community response was immediate — 207,000 Ollama pulls in 48 hours, front page of Hacker News, and a same-day Ollama update to support all four variants. The hype was real. But so was the confusion.&lt;/p&gt;

&lt;p&gt;Four models. Three naming conventions. Two architectures. One question every developer is quietly Googling:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Which one do I actually run?&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is that answer — practical, specific, with no benchmark-pasting.&lt;/p&gt;




&lt;h2&gt;
  
  
  First, Decode the Names
&lt;/h2&gt;

&lt;p&gt;The naming is the first thing that trips people up. Let's fix that.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;What the name means&lt;/th&gt;
&lt;th&gt;Architecture&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;E2B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;em&gt;Effective&lt;/em&gt; 2 Billion parameters&lt;/td&gt;
&lt;td&gt;Dense + Per-Layer Embeddings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;E4B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;em&gt;Effective&lt;/em&gt; 4 Billion parameters&lt;/td&gt;
&lt;td&gt;Dense + Per-Layer Embeddings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;26B A4B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;26B total, &lt;em&gt;4B Active&lt;/em&gt; per token&lt;/td&gt;
&lt;td&gt;Mixture of Experts (MoE)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;31B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;31 Billion parameters, all of them&lt;/td&gt;
&lt;td&gt;Dense&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;E&lt;/code&gt; in E2B and E4B stands for &lt;strong&gt;effective&lt;/strong&gt; — not just a raw parameter count. These models use Per-Layer Embeddings (PLE), an architectural trick that lets them punch above their weight on constrained hardware. The &lt;code&gt;A&lt;/code&gt; in 26B A4B stands for &lt;strong&gt;active&lt;/strong&gt; — only 4 billion of those 26 billion parameters fire for any given token. That's the magic of Mixture of Experts.&lt;/p&gt;

&lt;p&gt;If the names still feel weird, read them like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;E2B: "tiny but smart for its size"&lt;/li&gt;
&lt;li&gt;E4B: "the everyday laptop model"&lt;/li&gt;
&lt;li&gt;26B A4B: "26B quality, 4B speed" ← the sleeper pick&lt;/li&gt;
&lt;li&gt;31B: "no compromises"&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Hardware Reality
&lt;/h2&gt;

&lt;p&gt;Before picking a model, be honest about your machine:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;E2B&lt;/strong&gt; — ~2–3 GB storage, runs on phones, Raspberry Pi, and anything with a CPU. If you're deploying to edge devices or need zero-latency local inference on minimal hardware, this is it. Don't use it for complex reasoning — it'll disappoint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;E4B&lt;/strong&gt; — ~9.6 GB download via Ollama. This is the default &lt;code&gt;ollama pull gemma4&lt;/code&gt; variant for a reason. Runs comfortably on a 16 GB MacBook (M1 or later). Fast enough for interactive use. Good enough for most real tasks. If you're not sure which to pick, this is your answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;26B A4B&lt;/strong&gt; — The one most people overlook. You need around 24 GB of RAM (or a 24 GB GPU like an RTX 3090 or 4090). But what you get is near-31B quality at roughly E4B inference speed, because MoE only activates 3.8B parameters per token. Apple Silicon Mac with 32 GB unified memory? This is your best model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;31B Dense&lt;/strong&gt; — 20 GB minimum RAM/VRAM, 24 GB recommended. Every single one of those 31 billion parameters fires for every token. No shortcuts. It currently sits at &lt;strong&gt;#3 among all open models globally&lt;/strong&gt; on the Arena AI leaderboard. If you have a 4090 or an M2 Ultra, run this.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Setup (Ollama, 5 Minutes)
&lt;/h2&gt;

&lt;p&gt;Ollama is the fastest path from zero to running. Make sure you have Ollama 0.22 or newer — earlier versions don't handle Gemma 4 properly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check your version&lt;/span&gt;
ollama &lt;span class="nt"&gt;--version&lt;/span&gt;

&lt;span class="c"&gt;# Pull the model that matches your hardware&lt;/span&gt;
ollama pull gemma4:e2b    &lt;span class="c"&gt;# phones, Pi, CPU-only machines&lt;/span&gt;
ollama pull gemma4        &lt;span class="c"&gt;# E4B — 16 GB laptops (default)&lt;/span&gt;
ollama pull gemma4:26b    &lt;span class="c"&gt;# 24 GB RAM — MoE, best quality/speed&lt;/span&gt;
ollama pull gemma4:31b    &lt;span class="c"&gt;# 24 GB+ VRAM — maximum quality&lt;/span&gt;

&lt;span class="c"&gt;# Run it&lt;/span&gt;
ollama run gemma4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  One Critical Fix You Need to Make
&lt;/h3&gt;

&lt;p&gt;Ollama's default context window for Gemma 4 is set to &lt;strong&gt;4K tokens&lt;/strong&gt; — but the actual models support 128K (E2B/E4B) and 256K (26B/31B). That default silently cripples long-context work. Fix it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create a Modelfile with the right context&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;' &amp;gt; Modelfile
FROM gemma4
PARAMETER num_ctx 32768
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Build a custom named model&lt;/span&gt;
ollama create gemma4-32k &lt;span class="nt"&gt;-f&lt;/span&gt; Modelfile
ollama run gemma4-32k
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For LM Studio users: search for Gemma 4 GGUF builds and use &lt;code&gt;Q4_K_M&lt;/code&gt; quantization — it's the sweet spot between quality and RAM usage. &lt;code&gt;Q5&lt;/code&gt; if you have headroom to spare.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Gemma 4 Actually Gets Right
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Multimodal is native, not bolted on
&lt;/h3&gt;

&lt;p&gt;Every Gemma 4 model handles text and images in a single model call — no separate vision pipeline, no switching endpoints. The E2B and E4B models go further and support audio input natively (up to 30 seconds), and the 26B/31B models handle video up to 60 seconds at 1fps. This isn't a demo feature. It's built into the base architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  128K context is usable in practice
&lt;/h3&gt;

&lt;p&gt;A lot of models &lt;em&gt;claim&lt;/em&gt; long context and then quietly degrade in quality past a few thousand tokens. Gemma 4 uses a hybrid attention mechanism — interleaving local sliding window attention with full global attention — specifically designed to maintain coherence at long range. For RAG pipelines, codebase analysis, or long-document work, this matters.&lt;/p&gt;

&lt;h3&gt;
  
  
  The license is actually open
&lt;/h3&gt;

&lt;p&gt;Apache 2.0. Not Google's previous custom Gemma license. You can use it commercially, modify it, fine-tune it, and deploy it in products — no restrictions, no royalties. For developers building on top of a local model, this changes the calculus entirely.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Decision Tree
&lt;/h2&gt;

&lt;p&gt;Stop overthinking it. Use this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;What hardware do you have?
│
├─ Phone / Raspberry Pi / CPU-only → E2B
│
├─ 16 GB laptop (Mac, Windows, Linux) → E4B (ollama pull gemma4)
│
├─ 32 GB Apple Silicon or RTX 3090/4090 → 26B A4B ← don't skip this one
│
└─ 64 GB+ Mac or RTX 4090 and you need maximum quality → 31B Dense

What are you building?
│
├─ Mobile / edge app → E2B or E4B
│
├─ Local dev tool, coding assistant, RAG → E4B or 26B A4B
│
├─ Long-context document analysis, codebase reasoning → 26B or 31B (+ increase num_ctx)
│
└─ Fine-tuning for a specific domain → Start with 26B A4B
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What This Actually Means
&lt;/h2&gt;

&lt;p&gt;Here's the thing worth sitting with for a moment.&lt;/p&gt;

&lt;p&gt;The 31B Dense model — the one that ranks third among all open models on Earth — runs on a consumer GPU. A single RTX 4090, the kind of card a serious gamer or developer might already own, is sufficient. No cluster. No cloud bill. No API rate limits. No data leaving your machine.&lt;/p&gt;

&lt;p&gt;Two years ago, a model this capable required either a research institution's compute budget or a cloud provider's infrastructure. Today you pull it with one terminal command and it runs on hardware you might already own.&lt;/p&gt;

&lt;p&gt;The E4B model — the &lt;em&gt;second-smallest&lt;/em&gt; in the family — handles image input, supports 128K context, reasons in 140+ languages, and fits in 16 GB of RAM. That's a family phone or a mid-range MacBook.&lt;/p&gt;

&lt;p&gt;Developers who internalize this shift will build very differently from those who don't. When inference is local and free, the calculus around what's worth building changes. Offline-first AI features stop being a niche edge case and start being a design choice. Privacy-sensitive applications that couldn't viably use cloud AI now have a real path.&lt;/p&gt;

&lt;p&gt;That's what Gemma 4 is: not just a better model, but a different kind of constraint on what's possible.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Reference
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;E2B&lt;/th&gt;
&lt;th&gt;E4B&lt;/th&gt;
&lt;th&gt;26B A4B&lt;/th&gt;
&lt;th&gt;31B Dense&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best for&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Edge, mobile&lt;/td&gt;
&lt;td&gt;Everyday dev&lt;/td&gt;
&lt;td&gt;Quality + speed&lt;/td&gt;
&lt;td&gt;Max quality&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RAM needed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;4 GB&lt;/td&gt;
&lt;td&gt;8–16 GB&lt;/td&gt;
&lt;td&gt;24 GB&lt;/td&gt;
&lt;td&gt;20–24 GB+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Context&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;128K&lt;/td&gt;
&lt;td&gt;128K&lt;/td&gt;
&lt;td&gt;256K&lt;/td&gt;
&lt;td&gt;256K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multimodal&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Text + Image + Audio&lt;/td&gt;
&lt;td&gt;Text + Image + Audio&lt;/td&gt;
&lt;td&gt;Text + Image + Video&lt;/td&gt;
&lt;td&gt;Text + Image + Video&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ollama tag&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;gemma4:e2b&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;gemma4&lt;/code&gt; (default)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;gemma4:26b&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;gemma4:31b&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;License&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Apache 2.0&lt;/td&gt;
&lt;td&gt;Apache 2.0&lt;/td&gt;
&lt;td&gt;Apache 2.0&lt;/td&gt;
&lt;td&gt;Apache 2.0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;Pick the model that matches your hardware. Fix the &lt;code&gt;num_ctx&lt;/code&gt; default. Build something real.&lt;/p&gt;

&lt;p&gt;That's it.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
      <category>ai</category>
    </item>
    <item>
      <title>GemmaLens — AI-Powered Repository Understanding Engine Built with Gemma 4 31B</title>
      <dc:creator>Harsha.B.M</dc:creator>
      <pubDate>Sun, 24 May 2026 06:48:08 +0000</pubDate>
      <link>https://dev.to/harshabm_e558522b28f940/gemmalens-ai-powered-repository-understanding-engine-built-with-gemma-4-31b-3ol3</link>
      <guid>https://dev.to/harshabm_e558522b28f940/gemmalens-ai-powered-repository-understanding-engine-built-with-gemma-4-31b-3ol3</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Build with Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GemmaLens&lt;/strong&gt; — an AI-powered repository understanding and architectural memory engine that lets you drop in any GitHub URL and walk away actually understanding the codebase.&lt;/p&gt;

&lt;p&gt;Most developers know the pain: you inherit a repo, join a new team, or revisit old code — and you're stuck archaeologically digging through folders, grepping for entry points, manually tracing imports. GemmaLens eliminates that cold-start problem entirely.&lt;/p&gt;

&lt;p&gt;Here's what happens when you paste a GitHub URL:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Real cloning&lt;/strong&gt; — the backend clones the repo via GitPython (no scraping, no fake data)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deep scanning&lt;/strong&gt; — detects languages (20+), frameworks, package managers, and parses actual dependencies from &lt;code&gt;package.json&lt;/code&gt;, &lt;code&gt;requirements.txt&lt;/code&gt;, &lt;code&gt;Cargo.toml&lt;/code&gt;, &lt;code&gt;pom.xml&lt;/code&gt;, and &lt;code&gt;composer.json&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Architecture graph&lt;/strong&gt; — traces Python and JS/TS imports across the codebase, builds a NetworkX directed graph, and renders it live with React Flow&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemma AI summary&lt;/strong&gt; — the full context (file tree, imports, key file contents, dependency list) is passed to Gemma 4, which produces a grounded, non-hallucinated architectural summary&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GemmaChat&lt;/strong&gt; — ask anything: &lt;em&gt;"explain the authentication flow"&lt;/em&gt;, &lt;em&gt;"what services depend on the database?"&lt;/em&gt;, &lt;em&gt;"where is the API defined?"&lt;/em&gt; — all answers are grounded in the real scanned context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Documentation generation&lt;/strong&gt; — one click produces full markdown docs from the actual analysis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LensContext export&lt;/strong&gt; — a structured JSON file you can paste into Claude, Cursor, ChatGPT, or any AI tool to give it instant repo awareness&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Tech stack:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Frontend: Next.js + Tailwind CSS + React Flow (@xyflow/react) + react-markdown&lt;/li&gt;
&lt;li&gt;Backend: FastAPI + GitPython + NetworkX + httpx&lt;/li&gt;
&lt;li&gt;AI: Gemma 4 31B Dense via OpenRouter&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;🚧 &lt;em&gt;Deployment in progress — video walkthrough coming shortly.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Example flow on a real repo (&lt;code&gt;https://github.com/fastapi/fastapi&lt;/code&gt;):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Overview tab&lt;/strong&gt;: Gemma's summary correctly identifies the ASGI framework, Starlette dependency, Pydantic integration, and test structure — all from scanned files&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Architecture Graph&lt;/strong&gt;: nodes for &lt;code&gt;fastapi/&lt;/code&gt;, &lt;code&gt;tests/&lt;/code&gt;, &lt;code&gt;docs_src/&lt;/code&gt; with real import edges between modules&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GemmaChat&lt;/strong&gt;: asking &lt;em&gt;"how does dependency injection work here?"&lt;/em&gt; returns an accurate answer citing actual files&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Export&lt;/strong&gt;: a JSON blob ready to paste into any AI assistant for instant context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;📁 &lt;strong&gt;GitHub Repository&lt;/strong&gt;: &lt;a href="https://github.com/HarshaBM-25/gemmalens" rel="noopener noreferrer"&gt;github.com/HarshaBM-25/gemmalens&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;Full source on GitHub: &lt;strong&gt;&lt;a href="https://github.com/HarshaBM-25/gemmalens" rel="noopener noreferrer"&gt;https://github.com/HarshaBM-25/gemmalens&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Project structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gemmalens/
├── backend/
│   ├── app/
│   │   ├── main.py               # FastAPI app + CORS
│   │   ├── models/               # Pydantic request models
│   │   ├── routes/
│   │   │   ├── analyze.py        # Clone + scan endpoint
│   │   │   ├── chat.py           # GemmaChat endpoint
│   │   │   ├── docs.py           # Documentation generation
│   │   │   └── export.py         # LensContext JSON export
│   │   └── services/
│   │       ├── analyzer.py       # GitPython, file scanning, NetworkX graph
│   │       └── gemma.py          # OpenRouter / Gemma 4 integration
│   └── requirements.txt
└── frontend/
    ├── app/
    │   ├── page.tsx              # Homepage with repo input
    │   └── analyze/[repoId]/
    │       └── page.tsx          # 5-tab analysis dashboard
    └── components/
        └── GraphView.tsx         # React Flow architecture graph
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To run locally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Backend&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;backend
python &lt;span class="nt"&gt;-m&lt;/span&gt; venv venv &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;source &lt;/span&gt;venv/bin/activate
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENROUTER_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_key_here
uvicorn app.main:app &lt;span class="nt"&gt;--reload&lt;/span&gt; &lt;span class="nt"&gt;--port&lt;/span&gt; 8000

&lt;span class="c"&gt;# Frontend&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;frontend
npm &lt;span class="nb"&gt;install
&lt;/span&gt;npm run dev
&lt;span class="c"&gt;# Open http://localhost:3000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  How I Used Gemma 4
&lt;/h2&gt;

&lt;p&gt;I chose &lt;strong&gt;Gemma 4 31B Dense&lt;/strong&gt; via OpenRouter, and the reasoning was specific — not arbitrary.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why 31B Dense over E2B or E4B?
&lt;/h3&gt;

&lt;p&gt;The core challenge in repository understanding isn't code generation — it's &lt;strong&gt;coherent reasoning across a large, heterogeneous context&lt;/strong&gt;. A single repository analysis call sends Gemma:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The full file tree (up to 200 files)&lt;/li&gt;
&lt;li&gt;Detected languages, frameworks, and all dependencies&lt;/li&gt;
&lt;li&gt;Module-to-module import relationships&lt;/li&gt;
&lt;li&gt;Key file contents (README, entrypoints, config files) — up to ~20,000 tokens of real code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Smaller models struggle to maintain coherence when the context mixes directory trees, JSON dependency lists, Python imports, and raw source code simultaneously. The &lt;strong&gt;31B Dense architecture&lt;/strong&gt; handles this without losing the thread — it correctly identifies which file is the entrypoint, how modules relate, and what design patterns are in use.&lt;/p&gt;

&lt;p&gt;The E2B and E4B MoE variants, while efficient, trade depth for speed in ways that matter here. When a user asks &lt;em&gt;"explain how authentication is implemented"&lt;/em&gt;, the answer needs to correctly synthesize information from potentially 5–8 different files. That requires the full model capacity of 31B Dense.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where Gemma 4 runs in GemmaLens
&lt;/h3&gt;

&lt;p&gt;Gemma 4 powers four distinct features, each with a purpose-built system prompt:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Repository Summarization&lt;/strong&gt;&lt;br&gt;
After the backend scans the repo, Gemma receives the full context and produces a structured architectural summary — identifying purpose, key design decisions, module relationships, and entry points. The system prompt explicitly instructs it to ground every claim in the provided data and flag anything it cannot verify.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. GemmaChat Q&amp;amp;A&lt;/strong&gt;&lt;br&gt;
Every chat message is sent with the full repository context prepended. Gemma answers questions like &lt;em&gt;"where is the database connection configured?"&lt;/em&gt; by reasoning over the real file tree and import map — not hallucinating. The system prompt holds it accountable: &lt;em&gt;"if it's not in the data, say so."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Documentation Generation&lt;/strong&gt;&lt;br&gt;
Gemma generates a full markdown README — Overview, Architecture, Directory Structure, Dependencies, Getting Started, Key Modules — from the real scanned data. No template filling; actual synthesis.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Architecture Explanation&lt;/strong&gt;&lt;br&gt;
When users ask about specific modules in GemmaChat, Gemma explains the module's role, its dependents, and its dependencies using the real NetworkX graph data passed as context.&lt;/p&gt;

&lt;h3&gt;
  
  
  The key design decision
&lt;/h3&gt;

&lt;p&gt;I deliberately did &lt;strong&gt;not&lt;/strong&gt; pre-summarize or compress the context before sending it to Gemma. The raw file tree, raw dependency list, raw import relationships — all of it goes in. This is where Gemma 4 31B Dense earns its place: it handles the noise, finds the signal, and produces answers that are genuinely useful rather than generically plausible.&lt;/p&gt;

&lt;p&gt;The result is an AI that actually &lt;em&gt;knows&lt;/em&gt; the repository — not one that performs knowing it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built for the Gemma 4 Challenge. No mock data, no fake dashboards, no hardcoded outputs — everything you see comes from real repository analysis.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
    </item>
  </channel>
</rss>
