<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: LightShield</title>
    <description>The latest articles on DEV Community by LightShield (@lightshield).</description>
    <link>https://dev.to/lightshield</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3948252%2F2d7a51ba-53ad-4795-959e-be6eeb09204a.png</url>
      <title>DEV Community: LightShield</title>
      <link>https://dev.to/lightshield</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/lightshield"/>
    <language>en</language>
    <item>
      <title>Guild - A Free Autonomous Coding Agent That Escalates Through Gemma 4 Models</title>
      <dc:creator>LightShield</dc:creator>
      <pubDate>Sun, 24 May 2026 14:59:28 +0000</pubDate>
      <link>https://dev.to/lightshield/guild-a-free-autonomous-coding-agent-that-escalates-through-gemma-4-models-3dma</link>
      <guid>https://dev.to/lightshield/guild-a-free-autonomous-coding-agent-that-escalates-through-gemma-4-models-3dma</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Build with Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Guild&lt;/strong&gt; is a free, locally-running autonomous coding agent that works while you're away and backs off when you're present.&lt;/p&gt;

&lt;p&gt;The problem: cloud-based AI coding agents (Copilot, Cursor, Claude Code) require paid APIs, constant babysitting, and hog your machine. I wanted an agent that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Runs for free on local hardware&lt;/li&gt;
&lt;li&gt;Works autonomously on tasks without me watching&lt;/li&gt;
&lt;li&gt;Knows when I'm using the machine and throttles itself&lt;/li&gt;
&lt;li&gt;Gets smarter over time by learning from its own mistakes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Guild solves this with an &lt;strong&gt;escalation-first architecture&lt;/strong&gt;: start with the cheapest Gemma 4 model, and only move up when the agent gets stuck. Most tasks don't need the biggest model — but when they do, the system adapts automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Features
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Escalation chain&lt;/strong&gt;: Gemma 4 E4B → Gemma 4 31B Dense → CLI tools → human (last resort)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Visual flow composer&lt;/strong&gt;: drag-and-drop web UI to design multi-agent workflows, save reusable blocks, expand to inspect internals&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"Good neighbor" mode&lt;/strong&gt;: detects user activity via CPU/input monitoring, throttles to zero when you're working, runs full-speed when idle&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Truly autonomous&lt;/strong&gt;: survives reboots, sleep/wake cycles, crashes — picks up where it left off&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-improving&lt;/strong&gt;: extracts learnings from completed tasks, injects them into future sessions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-agent teams&lt;/strong&gt;: decompose complex tasks into blocks, each running its own Gemma 4 instance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Permission tiers&lt;/strong&gt;: nothing / ask / scoped / autopilot — with a hardcoded-never safety layer&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Design → Run → Monitor (all in one UI)
&lt;/h3&gt;

&lt;p&gt;Guild's web interface lets you design multi-agent workflows visually, then run them with one click:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Composer Studio — Design your agent team:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq89e9p71xqu1dbvbtceh.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq89e9p71xqu1dbvbtceh.jpeg" alt="Flow Composer showing Python Dev Loop with expanded TDD block"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The "Python Dev Loop" preset: requirements→architect feed into verifiers, which gate a &lt;code&gt;tdd_implementer&lt;/code&gt; block. Click to expand that block and see the internal pipeline (planner→test_writer→implementer→refactorer). The edit panel on the right shows agent configuration — name, role, Gemma 4 model selection, instructions, and ports.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzmf6pqqm20dvohnpf9e1.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzmf6pqqm20dvohnpf9e1.jpeg" alt="Composer Studio with live execution status"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Same composer, now running a live workflow: "write me a hello app in assembly 8086". The planner block completed, coder is currently executing. Status badges update in real-time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Workflow Execution — Watch blocks run in sequence:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftvxgf9d6tycnjz58qfw3.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftvxgf9d6tycnjz58qfw3.jpeg" alt="Workflow detail view showing planner completed with assembly code output, coder running"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The planner agent (powered by Gemma 4) decomposed the task and produced assembly instructions. The coder block is now executing those instructions. Each block's output is visible in real-time, with a timeline showing the full execution history.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Task Management — Launch and monitor agents:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff9jdjf4ruo8zslnh73sf.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff9jdjf4ruo8zslnh73sf.jpeg" alt="Tasks view showing running workflow blocks with status badges"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Launch workflows or individual agents from the UI. Filter by status, inspect execution details, stop running tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  "Good Neighbor" Mode — Resource Awareness
&lt;/h3&gt;

&lt;p&gt;Guild detects when you're using the machine and throttles itself:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Without Good Neighbor&lt;/th&gt;
&lt;th&gt;With Good Neighbor&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6j1qt2m9ithp9pzjecdq.jpeg" alt="Ollama using 10GB RAM, 87% memory"&gt;&lt;/td&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk3u39zwt1peshvxqw1m7.jpeg" alt="Ollama using 7.5GB RAM, 69% memory"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ollama consumes 10.2 GB, system at 87% memory&lt;/td&gt;
&lt;td&gt;Throttled down — 7.5 GB, system at 69% memory&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;When you're gaming, browsing, or coding — Guild backs off automatically. When you're away, it ramps back up.&lt;/p&gt;

&lt;h3&gt;
  
  
  CLI — For When You Prefer the Terminal
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install and initialize&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;".[dev]"&lt;/span&gt;
guild init

&lt;span class="c"&gt;# Configure Gemma 4 escalation chain&lt;/span&gt;
guild config &lt;span class="nt"&gt;--set&lt;/span&gt; provider.model&lt;span class="o"&gt;=&lt;/span&gt;gemma4-e4b
guild config &lt;span class="nt"&gt;--set&lt;/span&gt; escalation.escalation_chain&lt;span class="o"&gt;=&lt;/span&gt;gemma4-31b-dense

&lt;span class="c"&gt;# Run a task — watch it escalate when needed&lt;/span&gt;
guild task &lt;span class="s2"&gt;"Refactor the auth module to use JWT tokens instead of sessions"&lt;/span&gt;

&lt;span class="c"&gt;# Or run in background while you work&lt;/span&gt;
guild task &lt;span class="s2"&gt;"Add comprehensive error handling to the API layer"&lt;/span&gt; &lt;span class="nt"&gt;--background&lt;/span&gt;
guild ps  &lt;span class="c"&gt;# check progress anytime&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Real Example: Tinnitus Therapy Music Player
&lt;/h3&gt;

&lt;p&gt;My father has tinnitus. One treatment is "notched music" — removing the phantom frequency from music over time. I had Guild build the player instead of doing it myself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Team:&lt;/strong&gt; Gemma 4 E4B (coder) + Gemma 4 31B Dense (verifier)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Iteration 1:
  [coder/E4B]     Wrote music_player.py (7455 chars)
                  — real-time IIR notch filter via scipy.signal
                  — sounddevice OutputStream callback
                  — keyboard thread for live frequency control
  [verifier/31B]  Running verification:
                  ✓ Files exist
                  ✓ Syntax valid
                  ✗ FAIL: lfilter called with wrong argument order
                    (data passed before coefficients)

Iteration 2:
  [coder/E4B]     Fixed lfilter call order, added zi state persistence
  [verifier/31B]  Running verification:
                  ✓ Files exist
                  ✓ Syntax valid
                  ✓ API usage correct (iirnotch + lfilter + lfilter_zi)
                  ✓ 3-second playback test — no errors
                  PASS (score: 90)

[guild] Team completed. Learning: "lfilter(b, a, x, zi=zi) — coefficients first, not data"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The verifier caught a subtle API misuse that would have caused silent audio corruption. Without the verification loop, the bug ships. With it, the coder gets specific feedback and fixes it in one turn.&lt;/p&gt;

&lt;p&gt;Full source + execution trace: &lt;a href="https://github.com/LightShield/guild/tree/main/examples/music-player-poc" rel="noopener noreferrer"&gt;&lt;code&gt;examples/music-player-poc/&lt;/code&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Repository:&lt;/strong&gt; &lt;a href="https://github.com/LightShield/guild" rel="noopener noreferrer"&gt;github.com/LightShield/guild&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture (3 layers)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Layer 1 — Harness
├── Process lifecycle (daemon, sleep/wake, crash recovery)
├── Resource monitor (CPU throttling, "good neighbor")
├── Tools (file_read, file_write, shell, search, spawn_agent)
├── Storage (SQLite: tasks, messages, learnings, audit)
└── Permissions (4-tier + hardcoded-never)

Layer 2 — Agent Behaviors
├── Core loop (call model → execute tools → repeat)
├── Stuck detection (repeated errors, no-progress, loops)
├── Escalation chain (weak model → strong model → tools → human)
├── Self-review (adversarial check after task completion)
└── Learning extraction (confidence-scored insights)

Layer 3 — Orchestration
├── Team runner (multi-block task decomposition)
├── Message bus (agent-to-agent communication)
├── Agent spawner (sub-agents as tool calls)
└── Block definitions (TOML-based composable roles)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Web UI — Visual Flow Composer
&lt;/h3&gt;

&lt;p&gt;Guild includes a web-based flow composer (&lt;code&gt;guild serve&lt;/code&gt;) for designing multi-agent teams visually:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dark-mode canvas&lt;/strong&gt; powered by xyflow — drag agents from palette, connect with edges&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reusable blocks&lt;/strong&gt; — multi-select agents, save as a named block, drag it back as a single node&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inline expansion&lt;/strong&gt; — click a block to expand it on the canvas showing internal nodes and dashed connection lines&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verifier decorators&lt;/strong&gt; — attach approval loops to any agent (loop until verifier passes, max N iterations)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Preset flows&lt;/strong&gt; — one-click "Full Development" loads a complete requirements→architecture→TDD→review→verification pipeline&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Stats
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;106 source modules&lt;/strong&gt; across 20 domain-grouped packages&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2383 Python tests&lt;/strong&gt; + &lt;strong&gt;246 Playwright E2E tests&lt;/strong&gt; (2629 total)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;100% branch coverage&lt;/strong&gt; on Python code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;213 requirements&lt;/strong&gt; with full acceptance criteria traceability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;0 semantic lies&lt;/strong&gt; — all tests adversarially verified for honesty&lt;/li&gt;
&lt;li&gt;Pure Python 3.11+, async throughout, zero cloud dependency&lt;/li&gt;
&lt;li&gt;Built using a self-improving development system with gated flows (see &lt;a href="https://github.com/LightShield/Guidelines" rel="noopener noreferrer"&gt;Guidelines&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How I Used Gemma 4
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Model Selection: Why Gemma 4?
&lt;/h3&gt;

&lt;p&gt;Gemma 4 is the ideal model family for Guild because:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Runs locally via Ollama&lt;/strong&gt; — zero API cost, complete privacy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multiple size tiers (E2B, E4B, 31B Dense)&lt;/strong&gt; — enables the escalation architecture&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;128K context window&lt;/strong&gt; — can hold entire codebases in context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strong code reasoning&lt;/strong&gt; — particularly the 31B Dense variant&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  The Escalation Architecture
&lt;/h3&gt;

&lt;p&gt;The core insight: &lt;strong&gt;most agent turns don't need the 31B Dense model&lt;/strong&gt;. Reading a file, running a test, writing a simple function — Gemma 4 E4B handles these fine. But when the agent encounters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Repeated failures (same error 3+ times)&lt;/li&gt;
&lt;li&gt;Complex multi-file reasoning&lt;/li&gt;
&lt;li&gt;Architectural decisions requiring broad context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;...it automatically escalates to Gemma 4 31B Dense, which has the reasoning depth to break through. And if even that isn't enough — the chain continues to cloud providers (Claude, Codex) as a final tier before asking a human.&lt;/p&gt;

&lt;p&gt;This gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;80% of turns&lt;/strong&gt; at E4B speed (fast, local, free)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;15% of turns&lt;/strong&gt; at 31B Dense quality (complex local reasoning)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5% of turns&lt;/strong&gt; at cloud tier (when local models genuinely can't solve it)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Near-zero cost&lt;/strong&gt; — cloud is only used as last resort&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Full Escalation Chain
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;When&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Ollama (local)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Gemma 4 E2B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Routing, permission checks, trivial ops&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Ollama (local)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Gemma 4 E4B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Default — file ops, shell, simple code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Ollama (local)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Gemma 4 31B Dense&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Complex reasoning, architecture, debugging&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Cloud&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Claude / Codex&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;When local models are stuck (3+ failures)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Human&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Truly irreversible decisions only&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Teams can also mix providers per-block — e.g., Gemma 4 E4B as the fast coder, Claude as the strict reviewer. Each block in a workflow defines its own provider independently.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Configure the escalation chain&lt;/span&gt;
guild config &lt;span class="nt"&gt;--set&lt;/span&gt; provider.provider_name&lt;span class="o"&gt;=&lt;/span&gt;ollama
guild config &lt;span class="nt"&gt;--set&lt;/span&gt; provider.model&lt;span class="o"&gt;=&lt;/span&gt;gemma4-4b-dense-med
guild config &lt;span class="nt"&gt;--set&lt;/span&gt; escalation.escalation_chain&lt;span class="o"&gt;=&lt;/span&gt;gemma4-31b-dense,claude
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why Not Just Use the Big Model?
&lt;/h3&gt;

&lt;p&gt;Three reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Resource contention&lt;/strong&gt; — 31B Dense uses significant RAM/VRAM. The "good neighbor" philosophy means minimizing resource usage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Speed&lt;/strong&gt; — E4B responds in 1-2 seconds; 31B Dense takes 10-15 seconds. For simple file reads, that latency is wasted.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Autonomy duration&lt;/strong&gt; — when running overnight on a coding task, token efficiency means more work done per charge cycle.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The escalation chain is configurable. If you have the hardware, run 31B Dense all the time. If you're on a laptop, start at E4B and let Guild decide when to bring in the heavy model. If you need cloud power for the hardest problems, add Claude/Codex to the chain — Guild will only use them when local models are genuinely stuck.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
    </item>
  </channel>
</rss>
