<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rob</title>
    <description>The latest articles on DEV Community by Rob (@carryologist).</description>
    <link>https://dev.to/carryologist</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3884903%2Ff7cf0bfd-0b92-4dca-9095-683af23a19e3.png</url>
      <title>DEV Community: Rob</title>
      <link>https://dev.to/carryologist</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/carryologist"/>
    <language>en</language>
    <item>
      <title>From Idea to Infrastructure: Standing Up a Self-Hosted AI Dev Environment</title>
      <dc:creator>Rob</dc:creator>
      <pubDate>Thu, 07 May 2026 23:37:37 +0000</pubDate>
      <link>https://dev.to/carryologist/from-idea-to-infrastructure-standing-up-a-self-hosted-ai-dev-environment-1902</link>
      <guid>https://dev.to/carryologist/from-idea-to-infrastructure-standing-up-a-self-hosted-ai-dev-environment-1902</guid>
      <description>&lt;p&gt;The first wave of content here on Vibes Coder was meta by design: a blog about building a blog, from a cabana in Cabo, on an iPhone. But that was always just the foundation. The thing I actually want to explore is local and self-hosted AI, and that starts with infrastructure.&lt;/p&gt;

&lt;p&gt;This post is the journey from "I should build a home lab" to a fully running Coder server with GitHub integration, workspace templates, multi-user support, and AI agents that are genuinely useful out of the box. Everything here was done conversationally through &lt;a href="https://coder.com/agents" rel="noopener noreferrer"&gt;Coder Agents&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Self-Hosted, Why Now
&lt;/h2&gt;

&lt;p&gt;We're in the middle of an explosion in local hardware capabilities. Apple's shipped insanely powerful M-series silicon for generations. Qualcomm's latest Snapdragon Elite processors are serious. NVIDIA keeps pushing consumer GPUs with more VRAM, and is now getting into CPUs with the N1 chips. The combination of CPUs, GPUs, and NPUs available today far exceeds what standard productivity apps actually require.&lt;/p&gt;

&lt;p&gt;It's pretty clear where this is heading: sophisticated LLMs running directly on our devices. I genuinely believe the future is Siri interfacing with a local model on an iPhone. A self-hosted home lab is the best approximation for testing that future before on-device capabilities go mainstream.&lt;/p&gt;

&lt;p&gt;So I broke this into three phases:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Get the hardware&lt;/strong&gt; capable of real inference&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build the dev environment&lt;/strong&gt; to work with it (Coder, agents, templates)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run local models&lt;/strong&gt; and wire them into the coding workflow&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This post covers phases 1 and 2. Phase 3 posts next.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hardware Hack: Buy a Gaming PC
&lt;/h2&gt;

&lt;p&gt;How do you get a machine powerful enough for serious AI work when RAM, storage, and GPU prices are brutal?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Buy a pre-built gaming PC.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Individually sourcing components means paying extreme markups, thanks to AI's ripple effects on GPUs, memory, and storage. But a gaming PCs built a few months ago with all the latest parts are just sitting on shelves at Best Buy, Newegg, and Micro Center. These complete systems are actually worth more parted out than what they're selling for. The pricing is inverted.&lt;/p&gt;

&lt;p&gt;I picked up a rig from Newegg. Here's what's inside:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Spec&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CPU&lt;/td&gt;
&lt;td&gt;AMD Ryzen 9 9950X3D&lt;/td&gt;
&lt;td&gt;16-core Zen 5, 5.75 GHz boost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPU&lt;/td&gt;
&lt;td&gt;Zotac RTX 5090&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;32 GB GDDR7&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RAM&lt;/td&gt;
&lt;td&gt;G.Skill Trident Z5 RGB&lt;/td&gt;
&lt;td&gt;64 GB (2x32 GB) DDR5-6000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage&lt;/td&gt;
&lt;td&gt;Samsung 9100 Pro&lt;/td&gt;
&lt;td&gt;2 TB Gen5 NVMe&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PSU&lt;/td&gt;
&lt;td&gt;Thermaltake Toughpower GT&lt;/td&gt;
&lt;td&gt;1200W 80+ Gold ATX 3.1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OS&lt;/td&gt;
&lt;td&gt;Ubuntu 24.04 LTS&lt;/td&gt;
&lt;td&gt;NVIDIA driver 590.48.01&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The spec that matters most for local LLMs is VRAM. The 32 GB on the RTX 5090 is the sweet spot: enough to run 27B-35B parameter models at full quality, or 70B models at aggressive quantization. The 64 GB system RAM provides headroom for KV cache spillover, and the 2 TB NVMe means models load fast and you can store plenty without worry. More on all of that in the next post. Now I have a capable AI workstation, in all it's RGB puke glory.&lt;/p&gt;

&lt;p&gt;But a powerful machine sitting in a closet isn't useful until you can actually develop on it. That's where Coder comes in.&lt;/p&gt;

&lt;h2&gt;
  
  
  Standing Up Coder
&lt;/h2&gt;

&lt;p&gt;I installed Ubuntu on the workstation. Why Ubuntu? It has the most documentation and is often what surfaces first in searches. Basically, it's the most agent-friendly distro. I didn't want troubleshooting my deployment with an agent to conflate Mint or Pop!_OS solutions. This was pretty straight forward minus a snafu getting the RTX 5090 drivers. Ends up you have to the install the open ones, and not the NVIDIA proprietary ones. Thankfully my motherboard had a built-in HDMI port I could use with the Ryzen's iGPU. &lt;/p&gt;

&lt;p&gt;15 minutes later I connected my Ubuntu workstation via a Coder tunnel. This gives me a full cloud development environment accessible from anywhere, including my phone. Workspaces run as Docker containers on the machine, each with its own isolated environment, tools, and credentials.&lt;/p&gt;

&lt;p&gt;The goal: anyone who creates a workspace from the template gets GitHub access, a full toolchain, and AI agents that know how to use everything, automatically. No manual setup.&lt;/p&gt;

&lt;h3&gt;
  
  
  GitHub Auth: The Long Way Around
&lt;/h3&gt;

&lt;p&gt;The first task was connecting Coder workspaces to GitHub so agents could clone repos, commit, push, and create PRs without manual token management.&lt;/p&gt;

&lt;p&gt;I explored three options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Personal Access Tokens&lt;/strong&gt; — works but doesn't scale to multiple users&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SSH keys&lt;/strong&gt; — same problem&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Coder External Auth (OAuth)&lt;/strong&gt; — configure once on the server, every user authenticates through the browser with their own GitHub account&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Chose option 3. Created a GitHub OAuth App, configured the callback URLs, and started fighting with the server configuration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The first struggle&lt;/strong&gt;: Coder wasn't running in Docker (just using Docker for workspaces). It was running as a manual &lt;code&gt;coder server&lt;/code&gt; process. The config file at &lt;code&gt;/etc/coder.d/coder.env&lt;/code&gt; existed but wasn't being loaded because the file uses &lt;code&gt;VAR=value&lt;/code&gt; format without &lt;code&gt;export&lt;/code&gt;, and &lt;code&gt;source&lt;/code&gt; reads the file but doesn't export to child processes. Had to &lt;code&gt;export&lt;/code&gt; the variables directly in the shell before running the server.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The plot twist&lt;/strong&gt;: After all the OAuth App setup, I discovered the Coder version had a &lt;strong&gt;built-in default GitHub provider&lt;/strong&gt; that was already enabled. Navigating to &lt;code&gt;/external-auth/github&lt;/code&gt; just worked. Didn't even need the custom OAuth App.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson&lt;/strong&gt;: Check &lt;code&gt;coder server --help&lt;/code&gt; before manually configuring things. Or, realistically, ask your agent to do it for you. The answer was in the flags the whole time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Wiring GitHub Into the Workspace Template
&lt;/h3&gt;

&lt;p&gt;Even after authenticating, workspaces didn't automatically have GitHub credentials available. The external auth token existed but nothing told &lt;code&gt;git&lt;/code&gt; or &lt;code&gt;gh&lt;/code&gt; to use it.&lt;/p&gt;

&lt;p&gt;The fix was template changes to &lt;code&gt;main.tf&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="s2"&gt;"coder_external_auth"&lt;/span&gt; &lt;span class="s2"&gt;"github"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"github"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Plus injecting &lt;code&gt;GITHUB_TOKEN&lt;/code&gt; into the agent's environment variables and adding a startup script that configures the git credential helper and installs the GitHub CLI.&lt;/p&gt;

&lt;p&gt;The template workflow I learned:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; ~/coder-templates/docker
&lt;span class="nb"&gt;cd&lt;/span&gt; ~/coder-templates/docker
coder templates pull docker &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;span class="c"&gt;# edit main.tf&lt;/span&gt;
coder templates push docker
coder update my-workspace  &lt;span class="c"&gt;# critical — stop/start alone reuses the old version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That last line is a gotcha worth highlighting: &lt;strong&gt;stopping and starting a workspace doesn't update the template version.&lt;/strong&gt; You must run &lt;code&gt;coder update&lt;/code&gt; to apply new template changes to an existing workspace.&lt;/p&gt;

&lt;h3&gt;
  
  
  System Instructions That Actually Work
&lt;/h3&gt;

&lt;p&gt;With GitHub fully wired up, agents still had a problem: they'd ask users to authenticate or provide tokens. They didn't know the environment was pre-configured.&lt;/p&gt;

&lt;p&gt;The fix was adding system instructions in the Coder admin panel (Agents &amp;gt; Settings &amp;gt; Behavior) that apply to all users. The key points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub access is pre-configured. Never ask users to authenticate.&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;gh&lt;/code&gt; CLI for all GitHub operations.&lt;/li&gt;
&lt;li&gt;Always commit and push. Workspaces are ephemeral; GitHub is the source of truth.&lt;/li&gt;
&lt;li&gt;Bias toward action. Build first, ask questions only when genuinely ambiguous.&lt;/li&gt;
&lt;li&gt;Do the full loop: write code, install deps, test, commit, push.&lt;/li&gt;
&lt;li&gt;Install tools with &lt;code&gt;sudo&lt;/code&gt; as needed without asking permission.&lt;/li&gt;
&lt;li&gt;Don't ask "would you like me to..." for obvious next steps.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the difference between an agent that's technically capable and one that's actually useful. Without these instructions, every session started with five minutes of the agent asking permission to do things it already had access to do.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Vibe Coding Toolchain
&lt;/h3&gt;

&lt;p&gt;The base Docker image was missing most of what a modern coding session needs. Added to the startup script:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GitHub CLI (&lt;code&gt;gh&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Repo management, PRs, issues from the terminal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Node.js + npm&lt;/td&gt;
&lt;td&gt;Most web projects need it&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vercel CLI&lt;/td&gt;
&lt;td&gt;Deploy directly from the workspace&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;uv&lt;/td&gt;
&lt;td&gt;Fast Python package manager for new projects&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;zip, unzip, sqlite3&lt;/td&gt;
&lt;td&gt;Common utilities that were missing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All installs are idempotent (&lt;code&gt;if ! command -v ... &amp;amp;&amp;gt; /dev/null&lt;/code&gt;) so they only run on first boot.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-User Setup
&lt;/h3&gt;

&lt;p&gt;The real test: could my partner use the same server with her own account and GitHub credentials?&lt;/p&gt;

&lt;p&gt;Setup was three steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;coder users create&lt;/code&gt; on the host&lt;/li&gt;
&lt;li&gt;She logs in, creates a workspace from the Docker template&lt;/li&gt;
&lt;li&gt;Visits &lt;code&gt;/external-auth/github&lt;/code&gt; once to link her GitHub account&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Everything else (gh, git credentials, Vercel, system instructions) was automatic from the template. That's the whole point of doing this at the template level rather than per-workspace.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;Here's what we ended up with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Ubuntu AI Workstation (home lab)
├── coder server (running via tunnel)
│   ├── Built-in GitHub OAuth provider
│   ├── Agents with system instructions
│   └── Docker template
│       ├── GITHUB_TOKEN auto-injected per user
│       ├── gh CLI pre-installed
│       ├── Node.js + npm + Vercel CLI
│       ├── Python 3.12 + uv
│       └── code-server (VS Code in browser)
├── Docker (runs workspace containers)
└── Coder tunnel (*.try.coder.app)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every user gets their own isolated workspace with full GitHub integration, a complete toolchain, and AI agents that know how to use all of it. The server handles auth, templates handle environment setup, and system instructions handle agent behavior.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gotchas Worth Knowing
&lt;/h2&gt;

&lt;p&gt;A few things that cost us time:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;source&lt;/code&gt; vs &lt;code&gt;export&lt;/code&gt;&lt;/strong&gt;: &lt;code&gt;source /etc/coder.d/coder.env&lt;/code&gt; reads the file but doesn't export variables to child processes. If your env file doesn't use &lt;code&gt;export&lt;/code&gt; statements, child processes (like &lt;code&gt;coder server&lt;/code&gt;) won't see the values.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Template versioning&lt;/strong&gt;: Stopping and starting a workspace reuses the old template version. You must run &lt;code&gt;coder update &amp;lt;workspace&amp;gt;&lt;/code&gt; to pick up new template changes. This one bit us three times before it stuck.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Agents settings vs Deployment settings&lt;/strong&gt;: They're in completely different places in the Coder UI. Agents settings control AI behavior; deployment settings control server config. Easy to confuse.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The built-in GitHub provider&lt;/strong&gt;: We spent time creating a custom OAuth App before discovering Coder ships with a default GitHub provider that was already enabled. The &lt;code&gt;--help&lt;/code&gt; output had the answer all along.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Agent session refresh&lt;/strong&gt;: After template changes that modify environment variables, you need a fresh Agents session. The running session won't pick up the new values.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What's Next: Local LLMs
&lt;/h2&gt;

&lt;p&gt;The hardware is ready. The dev environment is running. But right now, all the AI work is still going through cloud APIs: Claude for blog generation, Claude for coding agents.&lt;/p&gt;

&lt;p&gt;Tomorrow, we change that.&lt;/p&gt;

&lt;p&gt;The RTX 5090's 32 GB of VRAM is sitting idle, and there's an entire ecosystem of open-source models that can run locally on this hardware. We're going to install Ollama, pull a stack of models purpose-built for different coding tasks, and start wiring local inference into the development workflow.&lt;/p&gt;

&lt;p&gt;If you've ever wondered what it takes to run a 35-billion-parameter model on consumer hardware, or whether local models can actually keep up with cloud APIs for real coding work, that's what we're testing next.&lt;/p&gt;

&lt;h2&gt;
  
  
  By the Numbers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;1 gaming PC purchased from Newegg&lt;/li&gt;
&lt;li&gt;1 Coder server running via tunnel&lt;/li&gt;
&lt;li&gt;1 GitHub OAuth integration (built-in, no custom app needed)&lt;/li&gt;
&lt;li&gt;1 workspace template with 6 pre-installed tools&lt;/li&gt;
&lt;li&gt;2 users configured&lt;/li&gt;
&lt;li&gt;3 template pushes to get everything right&lt;/li&gt;
&lt;li&gt;~15 minutes debugging &lt;code&gt;export&lt;/code&gt; vs &lt;code&gt;source&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;0 lines of code written outside of Coder Agents&lt;/li&gt;
&lt;li&gt;32 GB of VRAM waiting for local models&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>coder</category>
      <category>agents</category>
      <category>homelab</category>
    </item>
    <item>
      <title>Putting the GPU to Work: Running Local LLMs on a Home Lab</title>
      <dc:creator>Rob</dc:creator>
      <pubDate>Thu, 07 May 2026 23:37:04 +0000</pubDate>
      <link>https://dev.to/carryologist/putting-the-gpu-to-work-running-local-llms-on-a-home-lab-2abp</link>
      <guid>https://dev.to/carryologist/putting-the-gpu-to-work-running-local-llms-on-a-home-lab-2abp</guid>
      <description>&lt;p&gt;&lt;a href="https://dev.to/posts/from-idea-to-infrastructure-standing-up-a-self-hosted-ai-dev-environment"&gt;Yesterday&lt;/a&gt; we went from a gaming PC on a shelf to a fully configured Coder server with GitHub integration, workspace templates, and AI agents. The dev environment is running. But the RTX 5090's 32 GB of VRAM has been sitting idle, and all the AI work is still going through cloud APIs.&lt;/p&gt;

&lt;p&gt;Today, we change that. This session was about installing Ollama, choosing the right models for different coding tasks, getting local inference running on the workstation, and then wiring it all into Coder Agents so local models show up right alongside Anthropic in the model selector. Everything here was done conversationally through &lt;a href="https://coder.com/agents" rel="noopener noreferrer"&gt;Coder Agents&lt;/a&gt;, same as always.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why VRAM Is the Only Spec That Matters
&lt;/h2&gt;

&lt;p&gt;Before pulling any models, it helps to understand the constraint you're optimizing around. For local LLMs, that constraint is VRAM. Not CPU cores, not system RAM, not disk speed. VRAM determines what models you can run, and model size determines how useful they are.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;VRAM&lt;/th&gt;
&lt;th&gt;What You Can Run&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;8-12 GB&lt;/td&gt;
&lt;td&gt;7B models (Qwen3:8b, DeepSeek-Coder 6.7B)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16 GB&lt;/td&gt;
&lt;td&gt;14B-20B models (DeepSeek R1 14B, Codestral 25.12)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;24-32 GB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;27B-35B models, the sweet spot for agentic coding&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;32 GB+ / unified&lt;/td&gt;
&lt;td&gt;70B quantized, Qwen3-Coder-Next&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 32 GB on the RTX 5090 lands squarely in the sweet spot. We can run 35B-parameter models at full quality, which is where the current generation of agentic coding models lives. The 64 GB of system RAM provides headroom for KV cache spillover when context windows get long, and the 2 TB NVMe means models load fast and we can store a whole library of them.&lt;/p&gt;

&lt;p&gt;Only one generation model loads into VRAM at a time. Ollama automatically unloads the previous model when you switch. The embedding model is small enough to coexist with any of them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup Script
&lt;/h2&gt;

&lt;p&gt;Rather than running commands one at a time, we built a single bash script that handles the entire setup in six phases: hardware verification, Ollama install, service configuration, model pulls, verification, and a connection reference card.&lt;/p&gt;

&lt;p&gt;The script is tailored to this exact hardware profile but the structure works for any NVIDIA GPU setup. It supports &lt;code&gt;--models-only&lt;/code&gt; (already have Ollama, just pull models) and &lt;code&gt;--verify-only&lt;/code&gt; (check that everything is working) flags for re-runs.&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;
&lt;span class="c"&gt;# Target: AMD Ryzen 9 9950X3D / RTX 5090 32GB / 64GB DDR5 / Ubuntu 24.04 LTS&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-euo&lt;/span&gt; pipefail

&lt;span class="nv"&gt;PRIMARY_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"qwen3.5:35b-a3b"&lt;/span&gt;
&lt;span class="nv"&gt;CODING_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"devstral-small:24b"&lt;/span&gt;
&lt;span class="nv"&gt;REASONING_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"deepseek-r1:14b"&lt;/span&gt;
&lt;span class="nv"&gt;AUTOCOMPLETE_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"codestral:22b"&lt;/span&gt;
&lt;span class="nv"&gt;EMBEDDING_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"nomic-embed-text"&lt;/span&gt;
&lt;span class="nv"&gt;KEEP_ALIVE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"30m"&lt;/span&gt;
&lt;span class="nv"&gt;OLLAMA_HOST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"127.0.0.1"&lt;/span&gt;
&lt;span class="nv"&gt;OLLAMA_PORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"11434"&lt;/span&gt;

&lt;span class="c"&gt;# Phase 1: Verify hardware &amp;amp; drivers&lt;/span&gt;
nvidia-smi &lt;span class="nt"&gt;--query-gpu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;driver_version,name,memory.total &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;csv,noheader

&lt;span class="c"&gt;# Phase 2: Install Ollama&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh

&lt;span class="c"&gt;# Phase 3: Configure service (keep models loaded 30min)&lt;/span&gt;
&lt;span class="nb"&gt;sudo mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; /etc/systemd/system/ollama.service.d
&lt;span class="nb"&gt;sudo tee&lt;/span&gt; /etc/systemd/system/ollama.service.d/override.conf &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /dev/null &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
[Service]
Environment="OLLAMA_KEEP_ALIVE=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;KEEP_ALIVE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"
Environment="OLLAMA_HOST=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OLLAMA_HOST&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OLLAMA_PORT&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl daemon-reload &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl restart ollama

&lt;span class="c"&gt;# Phase 4: Pull models&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;model &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PRIMARY_MODEL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$CODING_MODEL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$REASONING_MODEL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
             &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$AUTOCOMPLETE_MODEL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$EMBEDDING_MODEL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;ollama pull &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$model&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;done&lt;/span&gt;

&lt;span class="c"&gt;# Phase 5: Verify&lt;/span&gt;
ollama run &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PRIMARY_MODEL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"What is 2+2? Reply with just the number."&lt;/span&gt;
curl &lt;span class="nt"&gt;-sf&lt;/span&gt; &lt;span class="s2"&gt;"http://&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OLLAMA_HOST&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OLLAMA_PORT&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/v1/models"&lt;/span&gt;

&lt;span class="c"&gt;# Phase 6: Connection reference&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Ollama API:    http://&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OLLAMA_HOST&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OLLAMA_PORT&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"OpenAI API:    http://&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OLLAMA_HOST&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OLLAMA_PORT&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/v1"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Primary Model: &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PRIMARY_MODEL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;p&gt;The real script is ~330 lines with color output, error handling, idempotent checks, and flag parsing. This is the condensed version showing the actual work.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Happened When We Ran It
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Phase 1: Hardware Detection
&lt;/h3&gt;

&lt;p&gt;All green:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[OK]  NVIDIA driver: 590.48.01
[OK]  GPU: NVIDIA GeForce RTX 5090 (32607 MiB)
[OK]  Driver 590.48.01 meets minimum requirement (550+).
[OK]  System RAM: 60 GB
[OK]  Available disk: 1717 GB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;System RAM reports 60 GB instead of 64 GB. Normal; the kernel and firmware reserve some. Not a problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 2: Ollama Install
&lt;/h3&gt;

&lt;p&gt;One curl command. Ollama v0.21.0 installed cleanly, auto-detected the NVIDIA GPU, created a systemd service, and added the user to render/video groups.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 3: Service Configuration
&lt;/h3&gt;

&lt;p&gt;The important piece here is &lt;code&gt;KEEP_ALIVE=30m&lt;/code&gt;. Without it, Ollama unloads models from VRAM after 5 minutes of inactivity. Loading a 23 GB model back into memory takes time, and if you're switching between coding and chatting every few minutes, you're hitting cold starts constantly. Thirty minutes keeps things warm during a real work session.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 4: Model Downloads
&lt;/h3&gt;

&lt;p&gt;~44 GB pulled. One failure:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;qwen3.5:35b-a3b&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;23 GB&lt;/td&gt;
&lt;td&gt;OK&lt;/td&gt;
&lt;td&gt;Primary agentic coder. MoE, only 3B params active per token.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;devstral-small:24b&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;FAILED&lt;/td&gt;
&lt;td&gt;Registry name wrong.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;deepseek-r1:14b&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;9.0 GB&lt;/td&gt;
&lt;td&gt;OK&lt;/td&gt;
&lt;td&gt;Chain-of-thought reasoning.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;codestral:22b&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;12 GB&lt;/td&gt;
&lt;td&gt;OK&lt;/td&gt;
&lt;td&gt;Fast autocomplete for IDE tab-completion.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;nomic-embed-text&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;274 MB&lt;/td&gt;
&lt;td&gt;OK&lt;/td&gt;
&lt;td&gt;Embedding model for codebase search.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;devstral-small:24b&lt;/code&gt; doesn't exist on Ollama's registry. The correct pull is &lt;code&gt;ollama pull devstral&lt;/code&gt;. Registry names don't always match what blogs and guides reference. This is the kind of thing you only learn by running it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 5: Verification
&lt;/h3&gt;

&lt;p&gt;The automated inference test returned empty. Cold-start timing issue: the bash &lt;code&gt;$()&lt;/code&gt; capture returned before the model finished loading 23 GB into VRAM. Manual verification worked immediately after:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;ollama run qwen3.5:35b-a3b &lt;span class="s2"&gt;"What is 2+2? Reply with just the number."&lt;/span&gt;
4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The OpenAI-compatible API endpoint confirmed working at &lt;code&gt;http://127.0.0.1:11434/v1/models&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why These Models
&lt;/h2&gt;

&lt;p&gt;Every model in the stack was chosen for a specific job. This isn't a "download the biggest model that fits" strategy. Different tasks have different requirements, and the right model for autocomplete is not the right model for debugging a race condition.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Primary: &lt;code&gt;qwen3.5:35b-a3b&lt;/code&gt;&lt;/strong&gt; is the all-rounder. Best agentic coder available in April 2026 at this VRAM tier. Mixture-of-Experts architecture means only 3B parameters are active per token despite being a 35B model. That gives you big-model quality with small-model speed. 256K context window. Strong tool-calling support. Fits comfortably in 32 GB VRAM at ~22 GB.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coding: &lt;code&gt;devstral&lt;/code&gt;&lt;/strong&gt; (Mistral's agentic coding model) is trained specifically for multi-file edits, terminal automation, and code repair. Benchmarks highest on Ollama for pure coding tasks. When you need raw code generation without the overhead of reasoning chains, this is the one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reasoning: &lt;code&gt;deepseek-r1:14b&lt;/code&gt;&lt;/strong&gt; is the chain-of-thought model. It thinks before answering. Slower, but catches bugs other models miss. At 14B it only needs ~12 GB VRAM, so it loads fast and leaves headroom.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Autocomplete: &lt;code&gt;codestral:22b&lt;/code&gt;&lt;/strong&gt; is optimized for fast inline code completion (fill-in-the-middle). Best fit for IDE tab-complete via Continue.dev. You want this model to be fast above all else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Embeddings: &lt;code&gt;nomic-embed-text&lt;/code&gt;&lt;/strong&gt; is a lightweight (274 MB) embedding model for codebase search and RAG pipelines. Small enough to run alongside any generation model without VRAM pressure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wiring It Into Dev Tools
&lt;/h2&gt;

&lt;p&gt;With Ollama running, everything that speaks the OpenAI API format can connect to it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="err"&gt;Ollama&lt;/span&gt; &lt;span class="py"&gt;API&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="s"&gt;http://127.0.0.1:11434&lt;/span&gt;
&lt;span class="err"&gt;OpenAI&lt;/span&gt; &lt;span class="py"&gt;API&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="s"&gt;http://127.0.0.1:11434/v1&lt;/span&gt;
&lt;span class="err"&gt;API&lt;/span&gt; &lt;span class="py"&gt;Key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;       &lt;span class="s"&gt;ollama  (placeholder, not validated)&lt;/span&gt;
&lt;span class="err"&gt;Primary&lt;/span&gt; &lt;span class="py"&gt;Model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;qwen3.5:35b-a3b&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Continue.dev (VS Code)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Local Coder&lt;/span&gt;
&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1.0.0&lt;/span&gt;
&lt;span class="na"&gt;schema&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;models&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Qwen3.5 35B (Chat/Edit)&lt;/span&gt;
    &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;qwen3.5:35b-a3b&lt;/span&gt;
    &lt;span class="na"&gt;roles&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;chat&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;edit&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;apply&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Devstral (Coding)&lt;/span&gt;
    &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;devstral-small:24b&lt;/span&gt;
    &lt;span class="na"&gt;roles&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;chat&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;edit&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Codestral (Autocomplete)&lt;/span&gt;
    &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;codestral:22b&lt;/span&gt;
    &lt;span class="na"&gt;roles&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;autocomplete&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Nomic Embed&lt;/span&gt;
    &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nomic-embed-text&lt;/span&gt;
    &lt;span class="na"&gt;roles&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;embed&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;code&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docs&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;diff&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terminal&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;codebase&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;h3&gt;
  
  
  Environment Variables
&lt;/h3&gt;

&lt;p&gt;For scripts and agents that use the OpenAI client format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OLLAMA_HOST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://127.0.0.1:11434
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_BASE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://127.0.0.1:11434/v1
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ollama
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Connecting to Coder Agents
&lt;/h2&gt;

&lt;p&gt;This is the real payoff. Ollama is running, the models are loaded, and the OpenAI-compatible API is live on localhost. Now we wire it into &lt;a href="https://coder.com/agents" rel="noopener noreferrer"&gt;Coder Agents&lt;/a&gt; so local models appear as selectable options right alongside the cloud providers.&lt;/p&gt;

&lt;p&gt;Coder Agents runs the LLM loop in the control plane, not inside workspaces. That means the Coder server process makes the API calls directly. Since Ollama and the Coder server are running on the same machine, this is just pointing one localhost process at another. No tunnels, no port forwarding, no API keys leaving the box.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Add the Provider
&lt;/h3&gt;

&lt;p&gt;In the Coder dashboard, navigate to &lt;strong&gt;Agents &amp;gt; Admin &amp;gt; Providers&lt;/strong&gt; and select &lt;strong&gt;OpenAI Compatible&lt;/strong&gt;. Coder treats any endpoint that implements the OpenAI chat completions API as a first-class provider.&lt;/p&gt;

&lt;p&gt;Set the &lt;strong&gt;Base URL&lt;/strong&gt; to &lt;code&gt;http://127.0.0.1:11434/v1&lt;/code&gt; and enter &lt;code&gt;ollama&lt;/code&gt; as the API key. Ollama doesn't validate keys, but Coder requires one, so this is a placeholder.&lt;/p&gt;

&lt;p&gt;&lt;a href="/images/putting-the-gpu-to-work/openai-compatible-setup.png" class="article-body-image-wrapper"&gt;&lt;img src="/images/putting-the-gpu-to-work/openai-compatible-setup.png" alt="OpenAI Compatible provider configuration in the Coder admin panel, with the base URL set to the local Ollama endpoint"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For &lt;strong&gt;Key policy&lt;/strong&gt;, keep the defaults: Central API key on, user API keys off. There's no reason for individual developers to bring their own key to a local Ollama instance. Everyone hits the same GPU.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Add Models
&lt;/h3&gt;

&lt;p&gt;Switch to the &lt;strong&gt;Models&lt;/strong&gt; tab and add each model you want available in the Agents chat. The &lt;strong&gt;Model Identifier&lt;/strong&gt; must match exactly what Ollama expects, because that string is sent directly to the &lt;code&gt;/v1/chat/completions&lt;/code&gt; endpoint.&lt;/p&gt;

&lt;p&gt;&lt;a href="/images/putting-the-gpu-to-work/qwen-model-setup.png" class="article-body-image-wrapper"&gt;&lt;img src="/images/putting-the-gpu-to-work/qwen-model-setup.png" alt="Adding the Qwen 3.5B model with its identifier and context limit"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We added two models to start:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model Identifier&lt;/th&gt;
&lt;th&gt;Display Name&lt;/th&gt;
&lt;th&gt;Context Limit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;qwen3:35b-a3b&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Qwen 3.5B&lt;/td&gt;
&lt;td&gt;32,768&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;devstral&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Devstral&lt;/td&gt;
&lt;td&gt;131,072&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The &lt;strong&gt;Cost Tracking&lt;/strong&gt;, &lt;strong&gt;Provider Configuration&lt;/strong&gt;, and &lt;strong&gt;Advanced&lt;/strong&gt; sections can all be skipped for local models. No token pricing to track (it's your own GPU), and the default generation parameters work fine.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Use It
&lt;/h3&gt;

&lt;p&gt;That's it. The models now appear in the Agents model selector dropdown alongside the existing Anthropic models. Pick one, start a conversation, and the entire inference loop runs on the local GPU.&lt;/p&gt;

&lt;p&gt;&lt;a href="/images/putting-the-gpu-to-work/devstral-example.png" class="article-body-image-wrapper"&gt;&lt;img src="/images/putting-the-gpu-to-work/devstral-example.png" alt="Devstral running a task in Coder Agents, fully local inference"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What Surprised Us
&lt;/h3&gt;

&lt;p&gt;They work. Not "sort of work" or "work for simple prompts." The local models handle real agentic tasks through Coder Agents: reading files, running shell commands, editing code across multiple files, and reasoning about the results. Devstral in particular was impressive for code-focused work.&lt;/p&gt;

&lt;p&gt;The latency difference compared to cloud providers is noticeable but not a dealbreaker. First-token time is slower because the model is running on a single consumer GPU rather than a cluster, but once inference is rolling, the throughput is solid. For the kind of iterative coding tasks Coder Agents handles, the tradeoff is worth it: zero API costs, zero data leaving your network, and no rate limits.&lt;/p&gt;

&lt;p&gt;The practical recommendation: keep your cloud provider (Anthropic, OpenAI, whatever you're already using) as the default for complex, multi-step tasks. Use the local models for focused coding work, experimentation, and anything where you want to iterate fast without watching a billing dashboard.&lt;/p&gt;

&lt;h2&gt;
  
  
  Ollama vs vLLM: When to Scale Up
&lt;/h2&gt;

&lt;p&gt;We chose Ollama because this is a single-developer workstation. Ollama wins on simplicity, resource efficiency, and single-user performance. One curl to install, one command to pull models, and it just works.&lt;/p&gt;

&lt;p&gt;The tradeoff: if you later need to serve multiple concurrent Coder workspaces (5+ users hitting the same GPU), vLLM delivers roughly 16x more throughput under concurrent load. That's a future upgrade path, not a day-one requirement.&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; &lt;span class="nt"&gt;--gpus&lt;/span&gt; all &lt;span class="nt"&gt;--ipc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;host &lt;span class="nt"&gt;--network&lt;/span&gt; host &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; ~/.cache/huggingface:/root/.cache/huggingface &lt;span class="se"&gt;\&lt;/span&gt;
  nvcr.io/nvidia/vllm:26.01-py3 &lt;span class="se"&gt;\&lt;/span&gt;
  vllm serve &lt;span class="s2"&gt;"Qwen/Qwen3-Coder-Next-FP8"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--served-model-name&lt;/span&gt; qwen3-coder-next &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--port&lt;/span&gt; 8000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--max-model-len&lt;/span&gt; 170000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--gpu-memory-utilization&lt;/span&gt; 0.90 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--enable-auto-tool-choice&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--enable-prefix-caching&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--kv-cache-dtype&lt;/span&gt; fp8
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;h2&gt;
  
  
  Gotchas
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Registry names lie.&lt;/strong&gt; &lt;code&gt;devstral-small:24b&lt;/code&gt; is what guides reference. &lt;code&gt;devstral&lt;/code&gt; is what Ollama's registry actually has. Always check &lt;code&gt;ollama search&lt;/code&gt; or the Ollama website before assuming a model name.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cold starts kill scripted tests.&lt;/strong&gt; Loading 23 GB into VRAM takes real time. If you're capturing output in a bash script with &lt;code&gt;$()&lt;/code&gt;, the command can return before the model finishes loading. Manual &lt;code&gt;ollama run&lt;/code&gt; works fine because it waits interactively.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;KEEP_ALIVE&lt;/code&gt; is essential.&lt;/strong&gt; The default 5-minute unload timer means constant cold starts during normal coding. Set it to &lt;code&gt;30m&lt;/code&gt; or &lt;code&gt;-1&lt;/code&gt; (indefinite) via the systemd override. This is the single biggest quality-of-life improvement.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;60 GB != 64 GB is normal.&lt;/strong&gt; The kernel and firmware reserve memory. Your 64 GB kit will report ~60 GB usable. This is expected, not a hardware problem.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Coder requires an API key even when the provider doesn't.&lt;/strong&gt; Ollama doesn't authenticate requests, but Coder's provider config won't save without a key. Use any placeholder string. &lt;code&gt;ollama&lt;/code&gt; works.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Model identifiers must be exact.&lt;/strong&gt; The string you enter in Coder's admin panel is sent verbatim to the &lt;code&gt;/v1/chat/completions&lt;/code&gt; endpoint. If you type &lt;code&gt;qwen3.5:35b-a3b&lt;/code&gt; but Ollama expects &lt;code&gt;qwen3:35b-a3b&lt;/code&gt;, you'll get a model-not-found error. Run &lt;code&gt;ollama list&lt;/code&gt; and copy the name exactly.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The models are running locally and wired into Coder Agents. We have a fully self-hosted AI coding environment: Coder server, Ollama, and local inference on the same box, with cloud providers as a fallback.&lt;/p&gt;

&lt;p&gt;The next step is benchmarking. How many tokens per second does &lt;code&gt;qwen3.5:35b-a3b&lt;/code&gt; actually push on this hardware? Is the 256K context window usable in practice, or does performance degrade at long contexts? Does &lt;code&gt;codestral:22b&lt;/code&gt; autocomplete feel instant in the IDE, or is there noticeable lag? And the real question: for which tasks do local models match cloud providers, and where do they fall short?&lt;/p&gt;

&lt;p&gt;Numbers coming soon.&lt;/p&gt;

&lt;h2&gt;
  
  
  By the Numbers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;1 Ollama install (v0.21.0, single curl command)&lt;/li&gt;
&lt;li&gt;5 models pulled (4 generation + 1 embedding)&lt;/li&gt;
&lt;li&gt;~44 GB total model storage&lt;/li&gt;
&lt;li&gt;32,607 MiB VRAM available&lt;/li&gt;
&lt;li&gt;2 models configured in Coder Agents (Qwen 3.5B + Devstral)&lt;/li&gt;
&lt;li&gt;1 model name that was wrong (&lt;code&gt;devstral-small:24b&lt;/code&gt; -&amp;gt; &lt;code&gt;devstral&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;1 cold-start timing bug in the verification script&lt;/li&gt;
&lt;li&gt;15 minutes from script start to working local inference&lt;/li&gt;
&lt;li&gt;0 cloud API calls required&lt;/li&gt;
&lt;li&gt;0 data leaving the network&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>homelab</category>
      <category>llm</category>
    </item>
    <item>
      <title>The Agentic Gap: Claude Oneshots, Gemma Fails</title>
      <dc:creator>Rob</dc:creator>
      <pubDate>Thu, 07 May 2026 23:36:32 +0000</pubDate>
      <link>https://dev.to/carryologist/the-agentic-gap-claude-oneshots-gemma-fails-14g2</link>
      <guid>https://dev.to/carryologist/the-agentic-gap-claude-oneshots-gemma-fails-14g2</guid>
      <description>&lt;p&gt;Two days ago, Gemma 4 topped our &lt;a href="https://dev.to/posts/model-showdown-round-2-gemma-kimi-and-579gb-of-stubborn-optimism"&gt;local model benchmark&lt;/a&gt; — 167 tokens per second, perfect code quality score, smallest download. Faster than Sonnet. Faster than Opus. The blog post said "Gemma 4 is the new default."&lt;/p&gt;

&lt;p&gt;Today we tested whether that's actually true.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Experiment
&lt;/h2&gt;

&lt;p&gt;Instead of another toy benchmark, we pulled a real item off the vibescoder.dev backlog: &lt;strong&gt;public-facing search across all blog posts&lt;/strong&gt;. Multi-file feature, architectural decisions required, design system integration, no specification beyond "make search work."&lt;/p&gt;

&lt;p&gt;Two models. Same prompt. Same codebase. Same workspace template. One shot — no follow-up instructions, no hand-holding. Walk away and see what happens.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Gemma 4 27B&lt;/th&gt;
&lt;th&gt;Opus 4.6&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Provider&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Ollama (local, RTX 5090)&lt;/td&gt;
&lt;td&gt;Anthropic API (cloud)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Benchmark speed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;167.1 tok/s&lt;/td&gt;
&lt;td&gt;74.3 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Benchmark score&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;100/100&lt;/td&gt;
&lt;td&gt;100/100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;Per-token pricing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The prompt was deliberately vague on implementation details:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Add public-facing search to vibescoder.dev. Users should be able to search across all published blog posts by title, content, tags, and description. The search should feel fast and match the site's existing Neon Brutalist design system. Consider: how users discover search, how results display, empty/no-result states, search state management (URL, keyboard shortcuts). Must be accessible from any page, work on mobile, and not introduce new design libraries. Commit and push when complete. Do not ask clarifying questions — make your own decisions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Setting Up the Arena
&lt;/h2&gt;

&lt;p&gt;Each model got its own Coder workspace with identical starting conditions: same Docker template, same base commit on &lt;code&gt;main&lt;/code&gt;, same content repo.&lt;/p&gt;

&lt;p&gt;&lt;a href="/images/gemma-vs-opus-search-showdown/create-workspace-gemma4.png" class="article-body-image-wrapper"&gt;&lt;img src="/images/gemma-vs-opus-search-showdown/create-workspace-gemma4.png" alt="Creating the search-gemma4 workspace in Coder"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Both workspaces built from the same Docker template — only the model selection differed.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We created two feature branches from the same commit (&lt;code&gt;12fd589&lt;/code&gt;) and verified Vercel was configured to auto-build preview deployments for any branch push.&lt;/p&gt;

&lt;p&gt;&lt;a href="/images/gemma-vs-opus-search-showdown/vercel-git-preview-settings.png" class="article-body-image-wrapper"&gt;&lt;img src="/images/gemma-vs-opus-search-showdown/vercel-git-preview-settings.png" alt="Vercel Git settings showing auto-deploy enabled"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Vercel preview deployments would give us side-by-side URLs to compare the finished features.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Both prompts were delivered at the same time. Then we stepped back.&lt;/p&gt;

&lt;h2&gt;
  
  
  Opus 4.6: The Quiet Professional
&lt;/h2&gt;

&lt;p&gt;Opus received the prompt and went silent. No questions. No plan narrated back. Just the spinning indicator showing it was working.&lt;/p&gt;

&lt;p&gt;Over the next eight minutes, Opus:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Cloned both repos and installed dependencies&lt;/li&gt;
&lt;li&gt;Read &lt;code&gt;package.json&lt;/code&gt;, &lt;code&gt;tsconfig.json&lt;/code&gt;, the app layout, existing components, &lt;code&gt;lib/posts.ts&lt;/code&gt;, &lt;code&gt;lib/types.ts&lt;/code&gt;, the design system in &lt;code&gt;globals.css&lt;/code&gt;, and the middleware&lt;/li&gt;
&lt;li&gt;Made architectural decisions: Cmd+K dialog with live API results for quick navigation, plus a full &lt;code&gt;/search&lt;/code&gt; page for detailed browsing&lt;/li&gt;
&lt;li&gt;Built a weighted scoring search API (&lt;code&gt;/api/search&lt;/code&gt;) that ranks title matches above tag matches above content matches&lt;/li&gt;
&lt;li&gt;Created a 407-line &lt;code&gt;SearchDialog&lt;/code&gt; component with keyboard navigation, body scroll lock, abort controllers for in-flight requests, and ARIA accessibility&lt;/li&gt;
&lt;li&gt;Built a server-rendered search results page with debounced URL state&lt;/li&gt;
&lt;li&gt;Modified &lt;code&gt;Header.tsx&lt;/code&gt; — three lines: import, component placement, mobile nav link&lt;/li&gt;
&lt;li&gt;Updated middleware to whitelist the search API route&lt;/li&gt;
&lt;li&gt;Committed everything in one clean commit and pushed&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;One prompt. One commit. &lt;strong&gt;698 lines across 6 files.&lt;/strong&gt; Pushed to GitHub, Vercel preview building.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt; src/app/api/search/route.ts     | 104 ++++++++++
 src/app/search/SearchInput.tsx  |  78 ++++++++
 src/app/search/page.tsx         |  97 ++++++++++
 src/components/Header.tsx       |  10 +
 src/components/SearchDialog.tsx | 407 +++++++++++++++++++++++++++++++++
 src/middleware.ts               |   3 +-
 6 files changed, 698 insertions(+), 1 deletion(-)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What Opus Built
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A Cmd+K search dialog.&lt;/strong&gt; Press &lt;code&gt;Cmd+K&lt;/code&gt; (or &lt;code&gt;/&lt;/code&gt;) from any page and a full-screen overlay appears with a search input. Results appear live as you type, debounced at 200ms, with scores-based ranking. Arrow keys navigate results, Enter selects, Escape closes. The dialog shows up to 8 results with title, date, tags, reading time, and a context snippet showing where the match was found.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A full search page at &lt;code&gt;/search&lt;/code&gt;.&lt;/strong&gt; Accessible from the mobile hamburger menu and via a "View all results" link in the dialog. Server-rendered with URL state (&lt;code&gt;/search?q=cloudflare&lt;/code&gt;). Shows the full &lt;code&gt;PostCard&lt;/code&gt; component for each result — consistent with the blog's existing post listing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A scored API route.&lt;/strong&gt; Title matches score 100 points (150 for exact match). Tag matches score 50. Description matches score 25. Content matches score 10 plus 3 per occurrence. Results are sorted by score descending, capped at 20. The API strips markdown from content before extracting snippets.&lt;/p&gt;

&lt;p&gt;The Header diff tells the story of good integration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="gi"&gt;+import { SearchDialog } from "@/components/SearchDialog";
&lt;/span&gt; ...
&lt;span class="gi"&gt;+          &amp;lt;SearchDialog /&amp;gt;
&lt;/span&gt;           &amp;lt;ThemeToggle /&amp;gt;
 ...
&lt;span class="gi"&gt;+          &amp;lt;Link href="/search" ...&amp;gt;Search&amp;lt;/Link&amp;gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three lines to wire a 407-line feature into the existing layout. That's how you integrate with a codebase.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gemma 4: The Brilliant Planner Who Never Coded
&lt;/h2&gt;

&lt;p&gt;Gemma's run was a different experience entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt 1&lt;/strong&gt; — the original task. Gemma cloned the repos, checked out the branch... and stopped. Checkmark. Done. It treated the setup commands as the entire task.&lt;/p&gt;

&lt;p&gt;&lt;a href="/images/gemma-vs-opus-search-showdown/gemma-stopped-after-setup.png" class="article-body-image-wrapper"&gt;&lt;img src="/images/gemma-vs-opus-search-showdown/gemma-stopped-after-setup.png" alt="Gemma stopped after completing only the git clone setup steps"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Gemma completed the setup and declared victory. The search feature hadn't been mentioned yet.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt 2&lt;/strong&gt; — "You only completed the setup steps. Now read the codebase and implement the search feature." Gemma cloned the content repo, listed the directory, read &lt;code&gt;package.json&lt;/code&gt;, wrote a thoughtful analysis plan (Architecture, Data Flow, Design System)... and stopped again. "I will begin by reading &lt;code&gt;package.json&lt;/code&gt;."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt 3&lt;/strong&gt; — "Keep going. Execute your plan — read the files, build the search feature, commit and push." Gemma read more files, outlined a detailed preliminary plan with four numbered steps... and stopped. "I will start by reading the core scripts directory."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt 4&lt;/strong&gt; — "Do not stop between steps. Read all the files you need, write all the code, and push to git — all in one go. Do not pause or ask for confirmation."&lt;/p&gt;

&lt;p&gt;&lt;a href="/images/gemma-vs-opus-search-showdown/gemma-planning-not-executing.png" class="article-body-image-wrapper"&gt;&lt;img src="/images/gemma-vs-opus-search-showdown/gemma-planning-not-executing.png" alt="Gemma planning extensively but not executing code"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The plan was actually good — client-side JSON index, search in header, &lt;code&gt;/search?q=query&lt;/code&gt; routing. It just wouldn't execute it.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Gemma responded with its most detailed plan yet. A JSON search index builder, modifications to &lt;code&gt;fetch-content.sh&lt;/code&gt;, a &lt;code&gt;SearchInput&lt;/code&gt; component, Header integration, a search results page. Smart architectural choices. Then: "I will perform all these changes now." And stopped.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt 5&lt;/strong&gt; — "Stop planning. Start coding. Write the files now."&lt;/p&gt;

&lt;p&gt;&lt;a href="/images/gemma-vs-opus-search-showdown/gemma-code-in-chat-not-files.png" class="article-body-image-wrapper"&gt;&lt;img src="/images/gemma-vs-opus-search-showdown/gemma-code-in-chat-not-files.png" alt="Gemma showing code in chat instead of writing to files"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;After being told to code, Gemma showed code in the chat window instead of writing it to disk.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This time Gemma actually wrote some code — &lt;code&gt;build-search-index.js&lt;/code&gt;, an edit to &lt;code&gt;fetch-content.sh&lt;/code&gt;, and &lt;code&gt;SearchInput.tsx&lt;/code&gt;. Three files to disk. Progress. Then it listed the three remaining tasks (Header, search page, commit) and stopped.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompts 6, 7, 8&lt;/strong&gt; — "Go." / "Go." / Explicit task list with three items. Gemma showed "Thinking..." briefly, then nothing. No output. No tool calls. The workspace eventually showed "unhealthy."&lt;/p&gt;

&lt;p&gt;&lt;a href="/images/gemma-vs-opus-search-showdown/gemma-final-stall.png" class="article-body-image-wrapper"&gt;&lt;img src="/images/gemma-vs-opus-search-showdown/gemma-final-stall.png" alt="Gemma's final stall — "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Eight prompts. Three partial files. Zero commits.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The AGENTS.md Experiment
&lt;/h3&gt;

&lt;p&gt;Before giving up, we tried one more thing. We added explicit agentic behavioral instructions to &lt;code&gt;AGENTS.md&lt;/code&gt; in the repo — the file that Coder agents read for project-level guidance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Agentic Execution Rules&lt;/span&gt;

You are an autonomous coding agent. Execute tasks end-to-end
in a single turn. Never stop to describe what you will do
next — just do it.

&lt;span class="gu"&gt;## What You Must Never Do&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Output a multi-step plan and then stop.
&lt;span class="p"&gt;-&lt;/span&gt; Describe code you intend to write instead of writing it.
&lt;span class="p"&gt;-&lt;/span&gt; Leave uncommitted changes in the workspace.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Started a fresh Gemma session with the same prompt. Same result. Clone, read &lt;code&gt;package.json&lt;/code&gt;, plan, stop.&lt;/p&gt;

&lt;p&gt;The instructions were clear. Gemma read them. And then it planned what it was going to do next and stopped.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Scoreboard
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Opus 4.6&lt;/th&gt;
&lt;th&gt;Gemma 4 27B&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Prompts needed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;8 (incomplete)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Files changed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;3 (never committed)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Lines written&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;698&lt;/td&gt;
&lt;td&gt;~150 (partial, uncommitted)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Commits pushed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Feature complete&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Time to completion&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~8 minutes&lt;/td&gt;
&lt;td&gt;Never&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Errors self-corrected&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes (middleware, routing)&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Design system match&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes (Neon Brutalist tokens)&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Keyboard shortcuts&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Cmd+K, /, Escape, arrows&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Mobile support&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes (hamburger menu link)&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Accessibility&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Full ARIA&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Technical Review: Opus Implementation
&lt;/h3&gt;

&lt;p&gt;The code isn't perfect. A few things to fix before merging:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Duplicate search logic.&lt;/strong&gt; The API route uses weighted scoring. The search page uses flat boolean filtering. Same query, different result order depending on which surface you use.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Unsafe type cast.&lt;/strong&gt; &lt;code&gt;post as Post&lt;/code&gt; in the search page strips content then casts back to &lt;code&gt;Post&lt;/code&gt;, which expects a content field. Works at runtime but lies to TypeScript.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Missing Suspense boundary.&lt;/strong&gt; &lt;code&gt;useSearchParams()&lt;/code&gt; in &lt;code&gt;SearchInput&lt;/code&gt; needs a &lt;code&gt;Suspense&lt;/code&gt; wrapper for Next.js 14+.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;But these are code review items — the kind of things you'd catch in a PR review and fix in 10 minutes. The feature works, the architecture is sound, the UX is polished.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Score: 87.5/100&lt;/strong&gt; across correctness (88), architecture (82), code quality (90), performance (85), completeness (92), and integration (91).&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Benchmarks test generation, not agency.&lt;/strong&gt; Gemma 4 writes excellent code when you tell it exactly what to write. That's what our todo-app benchmark measured — single-turn code generation from a clear spec. Agentic coding is a different skill entirely: reading a codebase, making decisions, chaining dozens of tool calls, self-correcting, and maintaining a plan across many steps. Gemma can't do that yet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The plan-and-stop pattern is a model behavior, not a configuration problem.&lt;/strong&gt; We tried explicit instructions ("do not stop"), behavioral directives in AGENTS.md, and increasingly urgent nudges. Gemma consistently planned what it would do, narrated the plan in detail, and then yielded control back to the user. It's not a token limit or context issue — it's how the model was trained to interact.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Speed doesn't matter if you can't finish.&lt;/strong&gt; Gemma generates at 167 tok/s. Opus generates at 74 tok/s. But Opus delivered a complete, working, tested feature in 8 minutes with zero human intervention. Gemma delivered nothing usable in 20+ minutes with eight human prompts. The fastest model in our benchmark is the slowest in production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The daily driver earned its spot.&lt;/strong&gt; Opus 4.6 has been behind every line of code on vibescoder.dev since day one. This experiment didn't just confirm that choice — it quantified why. On a real task, the gap between "writes great code" and "builds great features" is the difference between a benchmark score and a shipping product.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Local models aren't there yet for agentic coding.&lt;/strong&gt; This isn't a permanent verdict. Gemma 4 was released weeks ago. Agentic capabilities are the frontier every model vendor is racing toward. But today, if you need an AI agent that can autonomously build features, cloud models with tool-calling training (Claude, GPT) are still the only game in town.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;We have a working search feature on a Vercel preview branch, courtesy of Opus. Next step is reviewing the code, fixing the three issues identified, and merging it to production. vibescoder.dev gets search.&lt;/p&gt;

&lt;p&gt;But we're not done with Gemma. The more we dug into the results, the more we think this shootout wasn't a fair fight. Our &lt;a href="https://dev.to/posts/model-showdown-round-2-gemma-kimi-and-579gb-of-stubborn-optimism"&gt;Gemma 4 deep dive&lt;/a&gt; ran Gemma through Ollama with default settings — and we've since discovered that Gemma's reasoning tokens are &lt;em&gt;invisible&lt;/em&gt; but still eat your generation budget. With &lt;code&gt;num_predict: 16384&lt;/code&gt;, the model may have blown its entire token budget on chain-of-thought we never saw, leaving nothing for actual code output. That would explain the plan-and-stop pattern perfectly: Gemma wasn't refusing to code — it was running out of room.&lt;/p&gt;

&lt;p&gt;So we're rerunning the shootout. This time we're loading both models through llama.cpp directly, giving us fine-grained control over thinking budgets and VRAM allocation. We'll crank &lt;code&gt;num_predict&lt;/code&gt; and &lt;code&gt;num_ctx&lt;/code&gt; to 32K+, experiment with &lt;code&gt;--reasoning-budget&lt;/code&gt; to cap invisible thinking tokens, and give Gemma the full 32 GB of RTX 5090 VRAM to work with. No more starving the local model on default settings and then calling it a fair comparison.&lt;/p&gt;

&lt;p&gt;If Gemma was choking on its own reasoning, the fix might be as simple as giving it room to breathe. If it still can't finish — even with aggressive resources and tuned inference settings — then the agentic gap is real and it's in the model weights, not the configuration. Either way, we'll have a definitive answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  By the Numbers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;2&lt;/strong&gt; models tested head-to-head on a real feature&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1&lt;/strong&gt; prompt — identical for both, no follow-ups allowed (for Opus)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;8&lt;/strong&gt; prompts needed for Gemma before it stalled permanently&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;698&lt;/strong&gt; lines of working code from Opus&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;0&lt;/strong&gt; lines committed by Gemma&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;6&lt;/strong&gt; files changed by Opus (API route, search dialog, search page, input component, header, middleware)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3&lt;/strong&gt; files partially written by Gemma (never committed)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;8 minutes&lt;/strong&gt; from prompt to pushed commit (Opus)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;20+ minutes&lt;/strong&gt; of attempted nudging before calling it (Gemma)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;87.5/100&lt;/strong&gt; technical review score for Opus implementation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;407&lt;/strong&gt; lines in SearchDialog.tsx alone — keyboard nav, ARIA, scroll lock, abort controllers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3&lt;/strong&gt; code review items to fix before merging (duplicated logic, type cast, Suspense)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;$0&lt;/strong&gt; spent on Gemma inference (also $0 of value delivered)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1&lt;/strong&gt; AGENTS.md rewrite attempted to fix Gemma's behavior (didn't work)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1&lt;/strong&gt; clear winner&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>benchmark</category>
      <category>homelab</category>
    </item>
    <item>
      <title>Thursday Thought: Chat is the New Source Code</title>
      <dc:creator>Rob</dc:creator>
      <pubDate>Thu, 07 May 2026 23:35:59 +0000</pubDate>
      <link>https://dev.to/carryologist/thursday-thought-chat-is-the-new-source-code-4eie</link>
      <guid>https://dev.to/carryologist/thursday-thought-chat-is-the-new-source-code-4eie</guid>
      <description>&lt;p&gt;I just walked out of a customer meeting that completely shifted my perspective on the future of software development. What they told me sounds almost revolutionary, but it makes perfect sense when you think about it: &lt;strong&gt;chat is becoming the new source code&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Paradigm Shift: From Code to Conversation
&lt;/h2&gt;

&lt;p&gt;Here's what blew my mind. This customer explained that in their AI-agent-powered workflow, generating code has become the easy part. What's actually difficult—and incredibly valuable—is recreating the &lt;strong&gt;context&lt;/strong&gt;, the &lt;strong&gt;intent&lt;/strong&gt;, and the &lt;strong&gt;reasoning&lt;/strong&gt; that led to that code.&lt;/p&gt;

&lt;p&gt;Think about it: when you're working with an AI agent, the magic isn't just in the final output. It's in the entire conversation—the back-and-forth refinements, the clarifications, the "actually, let me change that" moments that shape the final solution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Storing Chat History in GitHub: A Game Changer
&lt;/h2&gt;

&lt;p&gt;This customer has started doing something fascinating: &lt;strong&gt;they store their chat histories directly in GitHub&lt;/strong&gt;. Not just the code that results from those chats, but the entire conversational thread that led to it.&lt;/p&gt;

&lt;p&gt;Why? Because they've discovered something profound:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;They can &lt;strong&gt;fork chat conversations&lt;/strong&gt; just like code branches&lt;/li&gt;
&lt;li&gt;They can &lt;strong&gt;roll back to previous chat states&lt;/strong&gt; &lt;/li&gt;
&lt;li&gt;Most importantly, they can &lt;strong&gt;recreate any piece of code trivially&lt;/strong&gt; from the chat history&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's like having a perfect record of not just &lt;em&gt;what&lt;/em&gt; was built, but &lt;em&gt;why&lt;/em&gt; it was built and &lt;em&gt;how&lt;/em&gt; the thinking evolved.&lt;/p&gt;

&lt;h2&gt;
  
  
  Intent Over Implementation
&lt;/h2&gt;

&lt;p&gt;This represents a fundamental shift in how we think about software development. We're moving from an &lt;strong&gt;implementation-first&lt;/strong&gt; world to an &lt;strong&gt;intent-first&lt;/strong&gt; world.&lt;/p&gt;

&lt;p&gt;In traditional development:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Idea → Code → Version Control → Collaboration
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the new agent-assisted world:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Intent → Conversation → Code Generation → Chat History Storage
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The code becomes ephemeral—easily regenerated. The conversation becomes permanent—the true source of truth.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Future of Version Control
&lt;/h2&gt;

&lt;p&gt;I predict we're going to see GitHub, GitLab, and other version control platforms rapidly evolve into something entirely different: &lt;strong&gt;extensible memory layers for agentic coding&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead of primarily tracking file changes, these platforms will become sophisticated conversation managers that can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Branch conversations&lt;/strong&gt; at any point in the dialogue&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Merge different conversational threads&lt;/strong&gt; when collaborating&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Diff chat histories&lt;/strong&gt; to see how approaches diverged&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Replay conversations&lt;/strong&gt; with different agents or parameters&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What This Means for Developers
&lt;/h2&gt;

&lt;p&gt;This shift has huge implications for how we work:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;Documentation Becomes Native&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The chat history &lt;em&gt;is&lt;/em&gt; the documentation. No more outdated comments or README files—the reasoning is preserved in the conversation that created the code.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;Collaboration Changes&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Instead of reviewing pull requests, we might be reviewing conversation threads. "I see you took this approach in your chat with the agent, but what if we tried this angle instead?"&lt;/p&gt;

&lt;h3&gt;
  
  
  3. &lt;strong&gt;Debugging Gets Easier&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;When something breaks, you don't just look at the code—you look at the conversation that created it. The context and assumptions are right there.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Big Picture
&lt;/h2&gt;

&lt;p&gt;We're witnessing the emergence of &lt;strong&gt;conversational version control&lt;/strong&gt;. Just as Git revolutionized how we think about code collaboration, chat-based development is about to revolutionize how we think about preserving and sharing &lt;em&gt;intent&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The source code was never really the valuable part—it was always the human thinking behind it. AI agents are just making that distinction crystal clear.&lt;/p&gt;

&lt;p&gt;What do you think? Are you ready for a world where your Git repos contain more conversations than code? Let me know in the comments—this feels like one of those moments where the industry is about to take a sharp turn, and I'm curious to hear how others are experiencing this shift.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have you experimented with storing chat histories as part of your development workflow? I'd love to hear about your experiences and approaches.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aiagents</category>
      <category>versioncontrol</category>
      <category>conversationalcoding</category>
      <category>github</category>
    </item>
    <item>
      <title>Your AI Strategy Has a Blind Spot: An SEO and AEO Audit of vibescoder.dev</title>
      <dc:creator>Rob</dc:creator>
      <pubDate>Thu, 07 May 2026 23:35:27 +0000</pubDate>
      <link>https://dev.to/carryologist/your-ai-strategy-has-a-blind-spot-an-seo-and-aeo-audit-of-vibescoderdev-379g</link>
      <guid>https://dev.to/carryologist/your-ai-strategy-has-a-blind-spot-an-seo-and-aeo-audit-of-vibescoderdev-379g</guid>
      <description>&lt;p&gt;I spend a lot of time thinking about how AI agents discover and consume content. I run a company that builds developer tools. I write a blog about building with AI agents. And most importantly, I'm married to a woman that runs an &lt;a href="https://genedge.co" rel="noopener noreferrer"&gt;AI consulting practice&lt;/a&gt;. Through the home-office wall I've heard her warn many a client that they have a silent suppressor in their content strategy if they're a Cloudflare customer. She recommends a site audit. &lt;/p&gt;

&lt;p&gt;And she was right. Until this morning, &lt;strong&gt;every major AI crawler was blocked from reading my site&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Not by choice. Not by misconfiguration. By a Cloudflare setting I'd already turned off — that got silently re-enabled by a different setting I didn't know existed.&lt;/p&gt;

&lt;p&gt;If you're a content creator, marketer, or engineer who cares about whether ChatGPT, Perplexity, Google AI Overviews, or Claude can find your work — read this. The infrastructure between your content and your audience may be working against you.&lt;/p&gt;

&lt;h2&gt;
  
  
  The TL;DR for Non-Technical Readers
&lt;/h2&gt;

&lt;p&gt;If you don't want to read the whole audit, here's what matters:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cloudflare's free tier blocks AI search engines by default.&lt;/strong&gt; If your site uses Cloudflare (and millions do), your content may be invisible to ChatGPT, Perplexity, Claude, and Google's AI features — even if you never asked for that.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;There are now two categories of discoverability.&lt;/strong&gt; Traditional SEO (Google search results) and AEO — Answer Engine Optimization (AI-powered search and assistants). You need both. They require different things.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The fix for Cloudflare takes 60 seconds&lt;/strong&gt; — but you have to know it exists. Go to Security → Settings → "Manage your robots.txt" and switch from "Instruct AI bots to not scrape content" to either "Content Signals Policy" or "Disable robots.txt configuration."&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;There's a new file called &lt;code&gt;llms.txt&lt;/code&gt;&lt;/strong&gt; that's becoming the robots.txt for AI. It tells AI agents what your site is, what it covers, and where to find content. If you don't have one, you're leaving discoverability on the table.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The TL;DR for Technical Readers
&lt;/h2&gt;

&lt;p&gt;We ran a full SEO + AEO audit against vibescoder.dev and found 20 issues across 4 severity levels. The highlights:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;4 P0 (critical):&lt;/strong&gt; Cloudflare's managed robots.txt was blocking GPTBot, ClaudeBot, Google-Extended, and 5 others. RSS feed had wrong URL prefix (15 broken links). Sitemap.xml was referenced but returned 404. Duplicate &lt;code&gt;User-agent: *&lt;/code&gt; blocks in robots.txt.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;6 P1 (high):&lt;/strong&gt; No JSON-LD structured data. No llms.txt. No canonical URLs. No heading anchor IDs. Missing article:author/tag meta. Homepage force-dynamic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Everything was fixed in a single session&lt;/strong&gt; — 17 files changed, 428 insertions, pushed and deployed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The commit: &lt;a href="https://github.com/carryologist/the-vibe-coder/commit/15b3483" rel="noopener noreferrer"&gt;SEO/AEO overhaul&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Audit
&lt;/h2&gt;

&lt;p&gt;I asked my Coder agent to evaluate vibescoder.dev on two dimensions: traditional search engine optimization (SEO) and &lt;a href="https://ahrefs.com/blog/answer-engine-optimization/" rel="noopener noreferrer"&gt;Answer Engine Optimization&lt;/a&gt; (AEO) — making the site discoverable and citable by AI agents like ChatGPT Search, Perplexity, Google AI Overviews, and Claude.&lt;/p&gt;

&lt;p&gt;The agent cloned the engine repo, crawled the live site, inspected every response header, parsed every meta tag, and cross-referenced the codebase against both SEO and AEO best practices.&lt;/p&gt;

&lt;p&gt;The results were humbling.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cloudflare Gotcha (Yes, Again)
&lt;/h2&gt;

&lt;p&gt;I wrote about Cloudflare's AI crawler settings &lt;a href="https://dev.to/posts/downtime-is-a-feature-custom-domains-cloudflare-and-mcp"&gt;two weeks ago&lt;/a&gt;. In that post, I specifically called out that Cloudflare's free tier has &lt;strong&gt;"Block AI bots"&lt;/strong&gt; and &lt;strong&gt;"AI Labyrinth"&lt;/strong&gt; turned on by default. I explicitly turned both off. I even wrote this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"If your site exists for thought leadership, you want AI services to find, index, and cite your content. Blocking AI crawlers is blocking your distribution channel."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I was right. And I was still blocked.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem&lt;/strong&gt;: Cloudflare has a &lt;em&gt;separate&lt;/em&gt; setting called &lt;strong&gt;"Manage your robots.txt"&lt;/strong&gt; under Security → Settings. It's not the same as "Block AI bots." It's a newer feature that injects directives directly into your robots.txt file at the edge — &lt;em&gt;after&lt;/em&gt; your origin server responds.&lt;/p&gt;

&lt;p&gt;Here's what the agent found when it compared my repo's &lt;code&gt;robots.txt&lt;/code&gt; (100 bytes, 7 lines) to what Cloudflare was actually serving:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;My robots.txt&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;100 bytes, 7 lines&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Content-Length header&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;100 (Vercel's original)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Actual response body&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1,838 bytes, ~65 lines&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Cloudflare was prepending 1,738 bytes of content — including &lt;code&gt;Disallow: /&lt;/code&gt; rules for ClaudeBot, GPTBot, Google-Extended, Amazonbot, CCBot, Bytespider, and meta-externalagent — &lt;strong&gt;without updating the Content-Length header&lt;/strong&gt;. The setting responsible? "Instruct AI bots to not scrape content," which was selected by default.&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;The fix&lt;/strong&gt;: Security → Settings → "Manage your robots.txt" → select "Disable robots.txt configuration." This tells Cloudflare to stop modifying your robots.txt entirely. Your origin file gets served as-is.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why "Disable" and not "Content Signals Policy"?&lt;/strong&gt; The Content Signals option keeps a &lt;code&gt;Content-Signal: ai-train=no&lt;/code&gt; directive, which tells AI crawlers not to use your content for model training. That sounds reasonable — but for a personal blog trying to maximize reach, being in the training corpus means AI models are more likely to know about you and reference your ideas. The risk it protects against (content absorbed without credit) is theoretical. The cost (reduced presence in AI systems) is concrete.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gotcha #1&lt;/strong&gt;: Cloudflare has three separate AI-related settings, and changing one doesn't affect the others. You need to check all three:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setting&lt;/th&gt;
&lt;th&gt;Location&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Block AI Bots Scope&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Security → Settings&lt;/td&gt;
&lt;td&gt;Deploys firewall rules blocking AI training crawlers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AI Labyrinth&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Security → Settings&lt;/td&gt;
&lt;td&gt;Injects fake content links to trap non-compliant bots&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Manage your robots.txt&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Security → Settings&lt;/td&gt;
&lt;td&gt;Modifies robots.txt at the edge to add AI crawler directives&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;I had turned off #1 and #2 weeks ago. But #3 was still on — silently rewriting my robots.txt at the CDN layer.&lt;/p&gt;

&lt;p&gt;Here's the full picture — the Security Overview flagging the AI-related action items, and each of the three settings:&lt;/p&gt;





&lt;h2&gt;
  
  
  What Is AEO?
&lt;/h2&gt;

&lt;p&gt;AEO — &lt;a href="https://ahrefs.com/blog/answer-engine-optimization/" rel="noopener noreferrer"&gt;Answer Engine Optimization&lt;/a&gt; — is the practice of making your content discoverable and citable by AI agents. (You'll also see it referred to as AI Engine Optimization or Agentic Engine Optimization — the discipline is new enough that the name is still settling.) It's the emerging counterpart to SEO. Where SEO focuses on Google's traditional index, AEO targets the systems that power ChatGPT Search, Perplexity, Google AI Overviews, Claude, and whatever comes next.&lt;/p&gt;

&lt;p&gt;The key differences:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;SEO&lt;/th&gt;
&lt;th&gt;AEO&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Primary consumer&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Googlebot&lt;/td&gt;
&lt;td&gt;GPTBot, ClaudeBot, PerplexityBot, Google-Extended&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Content format&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;HTML with meta tags&lt;/td&gt;
&lt;td&gt;Structured data (JSON-LD), plain text (llms.txt), RSS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Discovery mechanism&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Sitemap, backlinks, crawling&lt;/td&gt;
&lt;td&gt;Sitemap, RSS, llms.txt, structured data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ranking signal&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;PageRank, content quality, Core Web Vitals&lt;/td&gt;
&lt;td&gt;Authorship (Person schema + sameAs), recency, structured data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Citation style&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Blue link with snippet&lt;/td&gt;
&lt;td&gt;Inline citation with direct quote and link&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Key enabler&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Canonical URLs, meta descriptions&lt;/td&gt;
&lt;td&gt;JSON-LD, llms.txt, heading anchors for deep linking&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You need both. Many of the improvements help both. But some are AEO-specific.&lt;/p&gt;

&lt;h2&gt;
  
  
  AEO-Specific Changes
&lt;/h2&gt;

&lt;p&gt;These improvements specifically target AI agent discoverability:&lt;/p&gt;

&lt;h3&gt;
  
  
  llms.txt and llms-full.txt
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;llms.txt&lt;/code&gt; is an emerging convention — think of it as robots.txt for AI &lt;em&gt;comprehension&lt;/em&gt; rather than crawling. It tells AI agents what your site is, what topics it covers, and where to find content.&lt;/p&gt;

&lt;p&gt;We created two files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;/llms.txt&lt;/code&gt;&lt;/strong&gt; — a structured summary: site description, author, topics, key posts, and links&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;/llms-full.txt&lt;/code&gt;&lt;/strong&gt; — a dynamic route that serves every published post's full content as plain text&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The full-content version is the important one. When an AI agent wants to cite your work, it needs the actual content — not just metadata. &lt;code&gt;llms-full.txt&lt;/code&gt; is a single endpoint that gives it everything.&lt;/p&gt;

&lt;h3&gt;
  
  
  Person Schema with &lt;code&gt;sameAs&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;JSON-LD structured data tells AI engines &lt;em&gt;who&lt;/em&gt; wrote something and &lt;em&gt;where else&lt;/em&gt; that person exists online. The &lt;code&gt;sameAs&lt;/code&gt; property connects identity across platforms:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"@type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Person"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Rob Whiteley"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://vibescoder.dev/about"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"jobTitle"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CEO"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"sameAs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"https://www.linkedin.com/in/rwhiteley"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"https://github.com/carryologist"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"https://x.com/rwhiteley0"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"worksFor"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"@type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Organization"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Coder"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://coder.com"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When ChatGPT or Perplexity decides whether to cite "Rob Whiteley, CEO of Coder" in a response about AI-assisted development, this structured data is what gives it confidence in the attribution.&lt;/p&gt;

&lt;h3&gt;
  
  
  Full-Content RSS
&lt;/h3&gt;

&lt;p&gt;The existing RSS feed only had &lt;code&gt;&amp;lt;description&amp;gt;&lt;/code&gt; (a short excerpt). AI agents that consume RSS — and Perplexity in particular indexes it — get significantly more context from full-content feeds. We added &lt;code&gt;&amp;lt;content:encoded&amp;gt;&lt;/code&gt; with the full post body, plus &lt;code&gt;&amp;lt;author&amp;gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;managingEditor&amp;gt;&lt;/code&gt; tags.&lt;/p&gt;

&lt;h3&gt;
  
  
  Unblocking AI Crawlers
&lt;/h3&gt;

&lt;p&gt;The Cloudflare fix described above. The single highest-impact AEO change — going from completely invisible to fully accessible.&lt;/p&gt;

&lt;h2&gt;
  
  
  SEO-Specific Changes
&lt;/h2&gt;

&lt;p&gt;These target traditional Google search:&lt;/p&gt;

&lt;h3&gt;
  
  
  Sitemap.xml
&lt;/h3&gt;

&lt;p&gt;robots.txt referenced it. It didn't exist. Every SEO tool and Google Search Console would flag this. We created &lt;code&gt;src/app/sitemap.ts&lt;/code&gt; with dynamic generation — all posts, tags, and static pages with &lt;code&gt;lastmod&lt;/code&gt; dates from the changelog.&lt;/p&gt;

&lt;h3&gt;
  
  
  Canonical URLs
&lt;/h3&gt;

&lt;p&gt;No page had &lt;code&gt;&amp;lt;link rel="canonical"&amp;gt;&lt;/code&gt;. Without it, Google can treat URL variants (&lt;code&gt;?utm_source=twitter&lt;/code&gt;, &lt;code&gt;?ref=hackernews&lt;/code&gt;) as separate pages. We added explicit canonical URLs to every page type — homepage, posts, about, tags, and individual tag pages.&lt;/p&gt;

&lt;h3&gt;
  
  
  Homepage Caching
&lt;/h3&gt;

&lt;p&gt;The homepage was set to &lt;code&gt;force-dynamic&lt;/code&gt; — every request hit the server with zero caching. For a blog that publishes daily at most, that's unnecessary. We switched to ISR with a 60-second revalidation window. (Vercel still serves it dynamically due to a &lt;code&gt;cookies()&lt;/code&gt; call for admin detection — a future refactor.)&lt;/p&gt;

&lt;h3&gt;
  
  
  Custom 404 Page
&lt;/h3&gt;

&lt;p&gt;The default Next.js 404 is a dead end. Our custom version shows recent posts and navigation links — keeping both users and crawlers moving through the site instead of bouncing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Changes That Help Both
&lt;/h2&gt;

&lt;p&gt;Most improvements benefit both SEO and AEO:&lt;/p&gt;

&lt;h3&gt;
  
  
  JSON-LD Structured Data
&lt;/h3&gt;

&lt;p&gt;The single biggest miss. We added three schema types:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;WebSite&lt;/code&gt;&lt;/strong&gt; — site-level metadata with author info (every page)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;BlogPosting&lt;/code&gt;&lt;/strong&gt; — per-post schema with headline, dates, author, keywords, reading time (post pages)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;BreadcrumbList&lt;/code&gt;&lt;/strong&gt; — navigation hierarchy (post pages)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For SEO, this enables rich results in Google — article carousels, author info, breadcrumbs. For AEO, it's how AI engines understand content relationships and authorship with confidence.&lt;/p&gt;

&lt;h3&gt;
  
  
  Heading Anchor IDs
&lt;/h3&gt;

&lt;p&gt;Added &lt;code&gt;rehype-slug&lt;/code&gt; to the MDX pipeline. Every H2 and H3 now gets an auto-generated &lt;code&gt;id&lt;/code&gt; attribute.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SEO&lt;/strong&gt;: Google uses these for "jump to" links in search results and featured snippets.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AEO&lt;/strong&gt;: AI agents cite specific sections via fragment URLs (&lt;code&gt;#the-cloudflare-gotcha&lt;/code&gt;). Without heading IDs, citations can only link to the full page.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  RSS Feed Fix
&lt;/h3&gt;

&lt;p&gt;Every link in the RSS feed was a 404. The feed used &lt;code&gt;/blog/&lt;/code&gt; as the URL prefix, but the actual routes use &lt;code&gt;/posts/&lt;/code&gt;. All 15 posts were broken. One-line fix, massive impact — RSS is a primary discovery mechanism for both Google and AI agents.&lt;/p&gt;

&lt;h3&gt;
  
  
  Article Meta Tags
&lt;/h3&gt;

&lt;p&gt;Added &lt;code&gt;article:author&lt;/code&gt;, &lt;code&gt;article:tag&lt;/code&gt;, &lt;code&gt;article:modified_time&lt;/code&gt;, and &lt;code&gt;og:site_name&lt;/code&gt; to post OpenGraph metadata. These help both Google and AI engines categorize and attribute content correctly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Image Improvements
&lt;/h3&gt;

&lt;p&gt;MDX images now render inside &lt;code&gt;&amp;lt;figure&amp;gt;&lt;/code&gt; with &lt;code&gt;&amp;lt;figcaption&amp;gt;&lt;/code&gt; elements, and images without explicit alt text get an auto-generated fallback from the filename. Both changes improve how crawlers — traditional and AI — understand image content.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cloudflare Settings While We Were in the Dashboard
&lt;/h2&gt;

&lt;p&gt;While fixing the robots.txt issue, we also optimized two other Cloudflare settings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Early Hints&lt;/strong&gt; — enabled. Cloudflare sends &lt;code&gt;103 Early Hints&lt;/code&gt; responses from the edge, letting browsers start loading fonts and CSS before Vercel even responds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smart Tiered Caching&lt;/strong&gt; — enabled. Cloudflare edge nodes share cached content with each other, reducing origin hits. Ready to deliver benefits once ISR caching is fully enabled.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI Labyrinth&lt;/strong&gt; — confirmed still off. This injects fake content links to trap AI crawlers — the opposite of what a content site wants.&lt;/li&gt;
&lt;/ul&gt;



&lt;h2&gt;
  
  
  The Complete Scorecard
&lt;/h2&gt;

&lt;p&gt;Every change, its impact, and whether it addresses AEO, SEO, or both:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Change&lt;/th&gt;
&lt;th&gt;Impact&lt;/th&gt;
&lt;th&gt;AEO&lt;/th&gt;
&lt;th&gt;SEO&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Disable Cloudflare managed robots.txt&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Critical&lt;/strong&gt; — AI crawlers could not access the site&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fix RSS feed URLs (&lt;code&gt;/blog/&lt;/code&gt; → &lt;code&gt;/posts/&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Critical&lt;/strong&gt; — all 15 RSS links were 404s&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Create sitemap.xml&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Critical&lt;/strong&gt; — referenced in robots.txt but returned 404&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Consolidate robots.txt (disable CF injection)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Critical&lt;/strong&gt; — duplicate User-agent blocks caused ambiguity&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Add JSON-LD structured data&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;High&lt;/strong&gt; — zero structured data across entire site&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Create llms.txt + llms-full.txt&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;High&lt;/strong&gt; — no AI discovery files existed&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Add canonical URLs to all pages&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;High&lt;/strong&gt; — no page declared itself as canonical&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Add heading anchor IDs (rehype-slug)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;High&lt;/strong&gt; — no deep linking possible&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Add article:author, article:tag to OG meta&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;High&lt;/strong&gt; — tags and author missing from metadata&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Add Person schema with sameAs&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;High&lt;/strong&gt; — no cross-platform identity linking&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Switch homepage to ISR (revalidate: 60)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Medium&lt;/strong&gt; — every request was a cold server render&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Add RSS author + full content (&lt;code&gt;content:encoded&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Medium&lt;/strong&gt; — feed had excerpts only, no author&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Add twitter:site and twitter:creator&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Medium&lt;/strong&gt; — social cards had no account attribution&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Create custom 404 page&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Medium&lt;/strong&gt; — default 404 was a dead end&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wrap images in figure/figcaption&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Low&lt;/strong&gt; — bare img tags with no semantic context&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Alt text fallback from filenames&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Low&lt;/strong&gt; — empty alt on content images&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Remove x-powered-by header&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Low&lt;/strong&gt; — minor information disclosure&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Add humans.txt&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Low&lt;/strong&gt; — minor authorship signal&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enable Cloudflare Early Hints&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Low&lt;/strong&gt; — browsers preload assets faster&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enable Smart Tiered Caching&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Low&lt;/strong&gt; — prepared for when ISR is fully active&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Total: 20 changes. 13 help AEO. 17 help SEO. 11 help both.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;AEO is a real discipline now, not a buzzword.&lt;/strong&gt; The gap between "my content exists on the internet" and "AI agents can find, understand, and cite my content" is significant. Structured data, llms.txt, full-content RSS, heading anchors — these aren't nice-to-haves. They're the difference between being in the AI conversation and being invisible to it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your CDN can silently undermine your content strategy.&lt;/strong&gt; This is the one that stings. I &lt;em&gt;knew&lt;/em&gt; about the Cloudflare AI bot setting. I &lt;em&gt;wrote a blog post about turning it off.&lt;/em&gt; And a different setting — one I didn't know existed — was doing the same thing through a different mechanism. If you use Cloudflare, check your robots.txt right now. Not the file in your repo — the one Cloudflare is actually serving. &lt;code&gt;curl https://yoursite.com/robots.txt&lt;/code&gt; and compare it to what you expect.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The audit paid for itself in the first finding.&lt;/strong&gt; Everything else — the JSON-LD, the canonical URLs, the sitemap — those are incremental improvements that compound over time. But the Cloudflare fix was binary: invisible → visible. Every day that setting was on was a day ChatGPT Search, Perplexity, and Google AI Overviews couldn't index my content.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The one thing we identified but didn't implement: &lt;strong&gt;FAQPage schema&lt;/strong&gt; for how-to posts. Several posts follow a problem/solution pattern that could surface as direct answers in AI search. The frontmatter already has a &lt;code&gt;type&lt;/code&gt; field distinguishing &lt;code&gt;how-to&lt;/code&gt; from &lt;code&gt;opinion&lt;/code&gt; — the infrastructure is there. That's next.&lt;/p&gt;

&lt;h2&gt;
  
  
  By the Numbers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;1,738&lt;/strong&gt; bytes of robots.txt injected by Cloudflare without updating Content-Length&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;8&lt;/strong&gt; AI crawlers blocked (GPTBot, ClaudeBot, Google-Extended, Amazonbot, CCBot, Bytespider, Applebot-Extended, meta-externalagent)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;15&lt;/strong&gt; RSS feed links returning 404 — every single one&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;0&lt;/strong&gt; → &lt;strong&gt;3&lt;/strong&gt; JSON-LD schema types (WebSite, BlogPosting, BreadcrumbList)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;0&lt;/strong&gt; → &lt;strong&gt;5&lt;/strong&gt; pages with canonical URLs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;17&lt;/strong&gt; files changed, &lt;strong&gt;428&lt;/strong&gt; lines added&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3&lt;/strong&gt; Cloudflare settings that control AI crawlers — and you have to check all of them&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;60 seconds&lt;/strong&gt; to fix the Cloudflare setting that was blocking all AI visibility&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;~2 hours&lt;/strong&gt; for the full audit and implementation of all 20 changes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1&lt;/strong&gt; blog post that I thought had solved this problem — it hadn't&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>seo</category>
      <category>aeo</category>
      <category>cloudflare</category>
      <category>agents</category>
    </item>
    <item>
      <title>Invisible Failures: The Bugs That Hide in Plain Sight</title>
      <dc:creator>Rob</dc:creator>
      <pubDate>Thu, 07 May 2026 23:34:54 +0000</pubDate>
      <link>https://dev.to/carryologist/invisible-failures-the-bugs-that-hide-in-plain-sight-3lf4</link>
      <guid>https://dev.to/carryologist/invisible-failures-the-bugs-that-hide-in-plain-sight-3lf4</guid>
      <description>&lt;p&gt;Lucky you — bonus fix content, and you don't even have to wait until Friday.&lt;/p&gt;

&lt;p&gt;I had a work trip to Austin coming up. The homelab was humming along at home, but I realized something uncomfortable: if anything went sideways while I was gone, I had no reliable way to fix it. SSH works when everything is running. SSH doesn't help when you need to see a stuck GUI dialog, a frozen window manager, or a service that needs a browser to configure.&lt;/p&gt;

&lt;p&gt;So before I left, I fixed the access problem. Then, from a hotel room in Austin, I found and fixed three bugs that had been silently breaking things for days. None of them crashed. None of them logged errors. They just quietly did the wrong thing until the consequences finally became visible.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Setting Up Remote Access Before the Trip
&lt;/h2&gt;

&lt;p&gt;I'd been putting off remote desktop because "I can always walk over to it." A trip to Austin fixed that mindset.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Evaluating options&lt;/strong&gt;: Compared five tools. xrdp is a common recommendation but it's wrong for this use case — it spawns a new desktop session instead of mirroring the existing one. If something is stuck on the real display, xrdp can't help you see it. VNC works but it's laggy and unencrypted by default. Chrome Remote Desktop depends on Google's servers and a running Chrome instance. NoMachine is great but closed source.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RustDesk won&lt;/strong&gt;: Open source, self-hostable, mirrors the real desktop, has iOS and macOS clients.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Self-hosted server&lt;/strong&gt;: Two Docker containers — a rendezvous server and a relay server — so connections route through my own infrastructure instead of RustDesk's public relays:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--name&lt;/span&gt; rustdesk-hbbs &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--restart&lt;/span&gt; always &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 21115:21115 &lt;span class="nt"&gt;-p&lt;/span&gt; 21116:21116 &lt;span class="nt"&gt;-p&lt;/span&gt; 21116:21116/udp &lt;span class="nt"&gt;-p&lt;/span&gt; 21118:21118 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; /opt/rustdesk-server:/root &lt;span class="se"&gt;\&lt;/span&gt;
  rustdesk/rustdesk-server hbbs

&lt;span class="nb"&gt;sudo &lt;/span&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--name&lt;/span&gt; rustdesk-hbbr &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--restart&lt;/span&gt; always &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 21117:21117 &lt;span class="nt"&gt;-p&lt;/span&gt; 21119:21119 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; /opt/rustdesk-server:/root &lt;span class="se"&gt;\&lt;/span&gt;
  rustdesk/rustdesk-server hbbr
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Networking via Tailscale&lt;/strong&gt;: Cloudflare Tunnels already handle &lt;code&gt;coder.vibescoder.dev&lt;/code&gt;, but they only proxy TCP/HTTP — RustDesk's rendezvous server requires UDP on port 21116. Tailscale is a WireGuard mesh VPN that handles TCP and UDP natively. Each tool has its lane:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cloudflare Tunnel&lt;/strong&gt; → HTTP/S services (Coder dashboard, blog)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tailscale&lt;/strong&gt; → everything else (SSH, RustDesk, any UDP/TCP service)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Clients on macOS and iOS point at the workstation's Tailscale IP. A permanent password means fully remote access — no need to walk over and click "Accept" on a popup, which defeats the entire purpose of remote desktop for recovery scenarios.&lt;/p&gt;

&lt;p&gt;Everything auto-starts on reboot: Docker containers with &lt;code&gt;--restart always&lt;/code&gt;, RustDesk and Tailscale as systemd services. Five components confirmed persistent across power cycles.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The architecture now looks like this:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌───────────────────────────────────────────────────────┐
│              HOMELAB (Ubuntu + RTX 5090)               │
│                                                       │
│  Coder Server ──── Cloudflare Tunnel ──── Internet    │
│  (systemd, :3000)   (TCP/HTTP only)                   │
│                                                       │
│  RustDesk Client ── Tailscale Mesh ──── MacBook       │
│  (systemd)           (TCP + UDP)        iPhone        │
│                                                       │
│  RustDesk Server (Docker, --restart always)            │
│  ├── hbbs (rendezvous, :21115-21116, :21118)          │
│  └── hbbr (relay, :21117, :21119)                     │
│                                                       │
│  llama-server (Gemma 4, :8080)                        │
│  Tailscale (systemd)                                  │
│  Cloudflared (systemd, tunnel)                        │
└───────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="/images/invisible-failures-the-bugs-that-hide-in-plain-sight/network-interfaces-redacted.png" class="article-body-image-wrapper"&gt;&lt;img src="/images/invisible-failures-the-bugs-that-hide-in-plain-sight/network-interfaces-redacted.png" alt="Network interfaces on the workstation — Tailscale mesh, Docker bridge, and host networking all coexisting"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With that in place, I headed to Austin.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. The Deploy That Only Fails on New Content
&lt;/h2&gt;

&lt;p&gt;Three out of four Vercel deploys crashed with the same error:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Can't load image https://vibescoder.dev/images/downtime-is-a-feature/vercel-dns-ipv4-error.png: fetch failed
Error: Image size cannot be determined.
Export encountered an error on /posts/[slug]/opengraph-image/route
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The blog generates dynamic OpenGraph cards for social sharing — each post gets a unique 1200×630 image with the title, description, and a faded background pulled from the post's first image. The OG image route extracts the first &lt;code&gt;![alt](/images/...)&lt;/code&gt; reference from the markdown and renders it.&lt;/p&gt;

&lt;p&gt;The problem was &lt;strong&gt;how&lt;/strong&gt; it loaded that image:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;img&lt;/span&gt;
  &lt;span class="nx"&gt;src&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;`https://vibescoder.dev&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;firstImage&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nx"&gt;style&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="na"&gt;width&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;100%&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;height&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;100%&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;objectFit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;cover&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;opacity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.15&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;During &lt;code&gt;next build&lt;/code&gt;, this fetches from the &lt;strong&gt;live production site&lt;/strong&gt;. For a new post, those images don't exist on production yet — they're only in the current build's &lt;code&gt;public/&lt;/code&gt; directory, copied there by the prebuild script. The fetch fails, &lt;code&gt;next/og&lt;/code&gt; can't determine dimensions, and the entire build crashes.&lt;/p&gt;

&lt;p&gt;The one post that succeeded? Its first image already existed on production from a previous deploy.&lt;/p&gt;

&lt;p&gt;This is a classic chicken-and-egg bug. It only affects new content with new images. If you redeploy the same content twice, it works — because the first deploy put the images on the live site. You could publish for weeks without hitting it, then get three failures in a row when you finally add a post with a fresh screenshot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix&lt;/strong&gt;: Read from the local filesystem instead of fetching from production. The images are already in &lt;code&gt;public/images/&lt;/code&gt; at build time, so we read from disk and encode as a base64 data URI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;imgPath&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cwd&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;public&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;rawImage&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;buf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;readFileSync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;imgPath&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ext&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extname&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;rawImage&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;mime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;ext&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;jpg&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;ext&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;jpeg&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;image/jpeg&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`image/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;ext&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="nx"&gt;firstImage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`data:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;mime&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;;base64,&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;base64&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A &lt;code&gt;try/catch&lt;/code&gt; around the read means a missing image degrades gracefully — no background in the OG card instead of a build-killing crash.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. The Shell Guard That Eats Your Auth
&lt;/h2&gt;

&lt;p&gt;While pushing the OG image fix, &lt;code&gt;git push&lt;/code&gt; failed with an auth error. This had happened before. Every time, we'd manually run &lt;code&gt;coder external-auth access-token github&lt;/code&gt;, paste the token into the remote URL, and move on. This time we decided to actually trace it.&lt;/p&gt;

&lt;p&gt;Three layers of GitHub auth were configured in the workspace startup script, and &lt;strong&gt;all three were broken&lt;/strong&gt; in agent sessions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;What It Did&lt;/th&gt;
&lt;th&gt;Why It Failed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;credential.helper&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Shell function reading &lt;code&gt;$GITHUB_TOKEN&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Env var was empty&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;GITHUB_TOKEN&lt;/code&gt; export&lt;/td&gt;
&lt;td&gt;Appended to &lt;code&gt;~/.bashrc&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Below the interactive guard&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gh auth login&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Ran at startup&lt;/td&gt;
&lt;td&gt;Token persisted, but &lt;code&gt;gh&lt;/code&gt; also checks &lt;code&gt;GH_TOKEN&lt;/code&gt; env&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The startup script appended &lt;code&gt;export GITHUB_TOKEN=...&lt;/code&gt; to &lt;code&gt;~/.bashrc&lt;/code&gt;. That line landed at line 118 — after the interactive guard at line 8:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="nv"&gt;$-&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt;i&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;;;&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;;&lt;/span&gt; &lt;span class="k"&gt;esac&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every non-interactive shell — which is what Coder agent &lt;code&gt;execute()&lt;/code&gt; calls use — bailed out at line 8 and never reached the exports. The credential helper then read an empty &lt;code&gt;$GITHUB_TOKEN&lt;/code&gt; and returned an empty password. Git got a 401. The agent worked around it. Nobody noticed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix&lt;/strong&gt; was three changes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Credential helper calls &lt;code&gt;coder external-auth&lt;/code&gt; directly&lt;/strong&gt; — no dependency on env vars, fresh token on every git operation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git config &lt;span class="nt"&gt;--global&lt;/span&gt; credential.helper &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s1"&gt;'!f() { echo "username=x-access-token"; echo "password=$(coder external-auth access-token github 2&amp;gt;/dev/null)"; }; f'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Coder's &lt;code&gt;env&lt;/code&gt; block injects tokens via Terraform&lt;/strong&gt; — set in the agent process environment by Coder itself, inherited by every &lt;code&gt;execute()&lt;/code&gt; call, no shell sourcing needed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"coder_agent"&lt;/span&gt; &lt;span class="s2"&gt;"main"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;env&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;GITHUB_TOKEN&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;coder_external_auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;github&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;access_token&lt;/span&gt;
    &lt;span class="nx"&gt;GH_TOKEN&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;coder_external_auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;github&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;access_token&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cleanup of stale &lt;code&gt;.bashrc&lt;/code&gt; entries&lt;/strong&gt; — removed the old exports so they don't confuse future debugging.&lt;/p&gt;

&lt;p&gt;The auth bug had been hiding for multiple sessions. The startup script &lt;em&gt;looked&lt;/em&gt; correct. The agent silently worked around it every time. The fix was to stop relying on shell init files entirely and let Coder's process environment do the work at a level the shell can't interfere with.&lt;/p&gt;

&lt;p&gt;&lt;a href="/images/invisible-failures-the-bugs-that-hide-in-plain-sight/coder-agents-error-sidebar.png" class="article-body-image-wrapper"&gt;&lt;img src="/images/invisible-failures-the-bugs-that-hide-in-plain-sight/coder-agents-error-sidebar.png" alt="Coder Agents sidebar — multiple models, multiple errors, none of them obvious until you trace the auth chain"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  4. The Date That Froze at Draft Creation
&lt;/h2&gt;

&lt;p&gt;Published "The Agentic Gap" post. It went live. Then noticed the date: &lt;strong&gt;April 26&lt;/strong&gt; — three days ago. The post appeared sorted behind three days of other content instead of at the top.&lt;/p&gt;

&lt;p&gt;The blog engine had &lt;strong&gt;no mechanism to update the frontmatter &lt;code&gt;date&lt;/code&gt; when a draft is published&lt;/strong&gt;. The flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Draft created → Claude sets &lt;code&gt;date&lt;/code&gt; to the day it generates the content&lt;/li&gt;
&lt;li&gt;Draft sits unpublished for N days&lt;/li&gt;
&lt;li&gt;Someone flips &lt;code&gt;published: false&lt;/code&gt; → &lt;code&gt;published: true&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Post appears on the blog sorted by its &lt;strong&gt;creation date&lt;/strong&gt;, not its &lt;strong&gt;publish date&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Two publish paths existed — the admin UI and the API — and neither touched the date. The admin UI did a single regex replace on the boolean. The API had a &lt;code&gt;fixDateYear()&lt;/code&gt; function that corrects stale years (e.g., "2025" when it's 2026), but same-year drafts sailed right through.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix&lt;/strong&gt; was two layers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Client-side&lt;/strong&gt;: &lt;code&gt;handlePublish()&lt;/code&gt; stamps today's date immediately after flipping the boolean:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;today&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;T&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="nx"&gt;published&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;published&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="sr"&gt;/^date:&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="sr"&gt;*'&lt;/span&gt;&lt;span class="se"&gt;[^&lt;/span&gt;&lt;span class="sr"&gt;'&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;*'/m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s2"&gt;`date: '&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;today&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;'`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Server-side&lt;/strong&gt;: A new &lt;code&gt;stampPublishDate()&lt;/code&gt; function detects the &lt;code&gt;false&lt;/code&gt; → &lt;code&gt;true&lt;/code&gt; transition by comparing old and new content, and rewrites the date if it's a fresh publish. Safety net for all paths.&lt;/p&gt;

&lt;p&gt;There's a third publish path — an agent directly editing the MDX and pushing to git, which is exactly what caused this bug. No code can fix that path. The fix there is process: the agent skill now instructs agents to always set the date to today when publishing.&lt;/p&gt;

&lt;p&gt;The meta moment: the agent that introduced this bug also diagnosed, fixed, and documented it. In about fifteen minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Capacity Planning: How Many Workspaces Can This Machine Run?
&lt;/h2&gt;

&lt;p&gt;With the homelab now running Coder, Gemma 4 via llama.cpp, RustDesk, Tailscale, and Cloudflare — the question became: how much headroom is left?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The inventory&lt;/strong&gt;: Ryzen 9 9950X3D, 64 GB RAM, RTX 5090 32 GB VRAM. Profiled from inside a container via &lt;code&gt;/proc&lt;/code&gt; and from the host via Termius on an iPhone.&lt;/p&gt;

&lt;p&gt;&lt;a href="/images/invisible-failures-the-bugs-that-hide-in-plain-sight/iphone-capacity-planning-redacted.png" class="article-body-image-wrapper"&gt;&lt;img src="/images/invisible-failures-the-bugs-that-hide-in-plain-sight/iphone-capacity-planning-redacted.png" alt="Capacity diagnostics via Termius on iPhone — Docker stats, RAM breakdown, and service status from a hotel room"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key findings&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Memory&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Host services (GNOME, Coder, Docker, Tailscale, Cloudflare, RustDesk)&lt;/td&gt;
&lt;td&gt;~5 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7 idle workspaces&lt;/td&gt;
&lt;td&gt;~1.7 GB (230–270 MB each)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemma 4 (32K context, llama.cpp)&lt;/td&gt;
&lt;td&gt;~19 GB &lt;strong&gt;VRAM&lt;/strong&gt; — zero RAM impact&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Available for workspaces&lt;/td&gt;
&lt;td&gt;~58 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The GPU insight&lt;/strong&gt;: Gemma 4 runs entirely in VRAM on the RTX 5090. Zero system RAM impact. Workspaces call it via the OpenAI-compatible API on the host. The 32 GB VRAM pool is completely separate from the 64 GB system RAM — running a local LLM doesn't reduce workspace capacity at all.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Capacity estimates&lt;/strong&gt;: 8–12 active agent sessions comfortably (CPU is the bottleneck, not RAM). Dozens of idle workspaces parked at ~250 MB each.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resource guardrails&lt;/strong&gt;: Added an 8 GB per-container memory limit as a safety net — 32× normal usage, so it never constrains normal work, but a runaway &lt;code&gt;npm install&lt;/code&gt; or memory leak gets OOM-killed cleanly instead of dragging the whole host into swap:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"docker_container"&lt;/span&gt; &lt;span class="s2"&gt;"workspace"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;memory&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8192&lt;/span&gt;  &lt;span class="c1"&gt;# MB — safety net, not a throttle&lt;/span&gt;
  &lt;span class="nx"&gt;memory_swap&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8192&lt;/span&gt;  &lt;span class="c1"&gt;# equal to memory = no swap for container&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Autostop&lt;/strong&gt;: 2-hour default TTL with 1-hour activity bump. Workspaces auto-stop when forgotten. All configurable via &lt;code&gt;coder templates edit&lt;/code&gt; — template metadata, not Terraform.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Set up remote access before you need it.&lt;/strong&gt; Every one of the fixes below happened from a hotel room because I'd set up RustDesk and Tailscale the day before I left. If I'd waited, I'd have come home to three days of broken deploys and a post sorted in the wrong place.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Invisible bugs are the most expensive.&lt;/strong&gt; The deploy bug only hit new content. The auth bug was silently worked around. The date bug was three days stale before anyone noticed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Shell init files are a liability.&lt;/strong&gt; Anything that depends on &lt;code&gt;.bashrc&lt;/code&gt; or &lt;code&gt;.profile&lt;/code&gt; is fragile by default. Non-interactive shells, cron jobs, agent tool calls — none of them source your profile. If auth or config needs to be available everywhere, put it in the process environment or in a tool that's always in &lt;code&gt;$PATH&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Self-hosted doesn't mean unmanaged.&lt;/strong&gt; Adding RustDesk, Tailscale, and resource guardrails isn't gold-plating — it's the difference between a homelab that works when you're sitting in front of it and one that works when you're debugging from a phone in another room. Recovery access is table stakes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The agent finds its own bugs.&lt;/strong&gt; The publish-date bug was introduced by an agent, discovered by a human, then diagnosed, fixed, and documented by the same agent. That loop — agent ships, human reviews, agent fixes — is becoming the default workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  By the Numbers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;1&lt;/strong&gt; trip to Austin that forced the remote access setup&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;4&lt;/strong&gt; invisible bugs found and fixed — three of them remotely&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3&lt;/strong&gt; failed Vercel deploys from the OG image chicken-and-egg bug&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3&lt;/strong&gt; layers of broken GitHub auth (credential helper, env var, gh CLI)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3 days&lt;/strong&gt; a post was live with the wrong date before anyone noticed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;~58 GB&lt;/strong&gt; RAM available for workspaces after all services running&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;8–12&lt;/strong&gt; concurrent active agent sessions the workstation can handle&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;8 GB&lt;/strong&gt; per-container memory limit as a safety net&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5&lt;/strong&gt; tools evaluated for remote desktop — RustDesk won&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2&lt;/strong&gt; Docker containers for self-hosted RustDesk server&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5&lt;/strong&gt; services confirmed persistent across reboots&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;0&lt;/strong&gt; ports exposed to the internet (Tailscale mesh, Cloudflare tunnel)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;~15 minutes&lt;/strong&gt; from "the date is wrong" to fix deployed across both repos&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;~30 minutes&lt;/strong&gt; from "how do I remote desktop" to working iPhone → Linux access&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1&lt;/strong&gt; agent that introduced a bug, then found and fixed it&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>homelab</category>
      <category>agents</category>
      <category>devops</category>
      <category>debugging</category>
    </item>
    <item>
      <title>Slaying the Gemma Beast: How We Fixed Local AI and Shipped Search</title>
      <dc:creator>Rob</dc:creator>
      <pubDate>Thu, 07 May 2026 23:34:22 +0000</pubDate>
      <link>https://dev.to/carryologist/slaying-the-gemma-beast-how-we-fixed-local-ai-and-shipped-search-3hbe</link>
      <guid>https://dev.to/carryologist/slaying-the-gemma-beast-how-we-fixed-local-ai-and-shipped-search-3hbe</guid>
      <description>&lt;p&gt;Two days ago, Gemma 4 couldn't finish a feature. Today it built one, pushed it to GitHub, and it's live on this site right now.&lt;/p&gt;

&lt;p&gt;If you press &lt;code&gt;⌘K&lt;/code&gt; (or &lt;code&gt;Ctrl+K&lt;/code&gt;) on any page of vibescoder.dev, you'll see a search modal. Gemma 4 built that — running locally on an RTX 5090, zero cloud API calls, zero dollars spent. Then Claude reviewed the code, fixed the rough edges, and merged the polish. The feature you're using is a collaboration between a local model and a cloud model, each doing what they're best at.&lt;/p&gt;

&lt;p&gt;Here's how we got there.&lt;/p&gt;

&lt;h2&gt;
  
  
  Previously: The Agentic Gap
&lt;/h2&gt;

&lt;p&gt;In our &lt;a href="https://dev.to/posts/the-agentic-gap-claude-oneshots-gemma-fails"&gt;last experiment&lt;/a&gt;, we pitted Gemma 4 against Opus 4.6 on the same task: build public-facing search for this blog. Opus one-shot it — 698 lines across 6 files, committed and pushed in 8 minutes. Gemma planned brilliantly, then stopped. Eight prompts later: 3 partial files, 0 commits.&lt;/p&gt;

&lt;p&gt;We called it "the agentic gap" — the difference between a model that writes great code and one that builds great features. But we also left a thread dangling: maybe Gemma wasn't refusing to code. Maybe it was running out of room.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Diagnosis
&lt;/h2&gt;

&lt;p&gt;Our &lt;a href="https://dev.to/posts/friday-fixes-the-agent-was-flying-blind"&gt;deep dive into Gemma 4's local inference&lt;/a&gt; uncovered the root cause: &lt;strong&gt;invisible thinking tokens consume your generation budget&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Gemma 4 defaults to a reasoning mode where it generates chain-of-thought tokens before producing visible output. These thinking tokens are hidden — you never see them in the response — but they still count against &lt;code&gt;num_predict&lt;/code&gt;. With Ollama's defaults, the model was blowing its entire token budget on reasoning, leaving nothing for actual code.&lt;/p&gt;

&lt;p&gt;That's not a model failure. That's a configuration failure.&lt;/p&gt;

&lt;p&gt;The fix on paper was straightforward: give the model a bigger budget. But getting there required switching the entire inference stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  Switching from Ollama to llama.cpp
&lt;/h2&gt;

&lt;p&gt;Ollama is great for pulling and running models. It's not great for fine-grained control. The specific controls we needed:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Control&lt;/th&gt;
&lt;th&gt;Ollama&lt;/th&gt;
&lt;th&gt;llama.cpp&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Context window (&lt;code&gt;num_ctx&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Modelfile only&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;--ctx-size&lt;/code&gt; flag&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output limit (&lt;code&gt;num_predict&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;API parameter&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;-n&lt;/code&gt; flag + API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Reasoning budget&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Not available&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;&lt;code&gt;--reasoning-budget&lt;/code&gt; flag&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool calling&lt;/td&gt;
&lt;td&gt;Basic&lt;/td&gt;
&lt;td&gt;Grammar-constrained&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;--reasoning-budget&lt;/code&gt; flag is the key. It caps how many tokens the model can spend on invisible chain-of-thought, forcing it to start producing real content after hitting the limit. Ollama has zero equivalent.&lt;/p&gt;

&lt;p&gt;The switch itself was an adventure. We couldn't use Ollama's blob files directly — llama.cpp expects standard GGUF files, but Ollama stores models in a split format that standalone tools can't load. We pulled the full Gemma 4 26B-A4B GGUF from Hugging Face (&lt;code&gt;unsloth/gemma-4-26B-A4B-it-GGUF&lt;/code&gt;, Q4_K_M quantization, 16.9 GB download) and launched llama-server with tuned settings:&lt;/p&gt;

&lt;p&gt;&lt;a href="/images/slaying-the-gemma-beast/huggingface-gemma4-download.png" class="article-body-image-wrapper"&gt;&lt;img src="/images/slaying-the-gemma-beast/huggingface-gemma4-download.png" alt="Downloading Gemma 4 26B GGUF from Hugging Face — 16.9 GB at 82 MB/s"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;~/llama.cpp/build/bin/llama-server &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-m&lt;/span&gt; ~/models/gemma4-26b/gemma-4-26B-A4B-it-UD-Q4_K_M.gguf &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--ctx-size&lt;/span&gt; 32768 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-n&lt;/span&gt; 32768 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--reasoning-budget&lt;/span&gt; 4096 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--reasoning-format&lt;/span&gt; deepseek &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--parallel&lt;/span&gt; 1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--host&lt;/span&gt; 0.0.0.0 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--port&lt;/span&gt; 8080 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-ngl&lt;/span&gt; 999
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="/images/slaying-the-gemma-beast/llama-server-gemma4-loaded.png" class="article-body-image-wrapper"&gt;&lt;img src="/images/slaying-the-gemma-beast/llama-server-gemma4-loaded.png" alt="llama-server loaded with Gemma 4 — model ready, server listening on port 8080"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Key settings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;--ctx-size 32768&lt;/code&gt;&lt;/strong&gt; — 32K context window. Fits comfortably at ~19 GB on the 5090.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;-n 32768&lt;/code&gt;&lt;/strong&gt; — 32K max output tokens. Room for both reasoning and code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;--reasoning-budget 4096&lt;/code&gt;&lt;/strong&gt; — Cap invisible thinking at 4K tokens. The rest is for actual output.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;--reasoning-format deepseek&lt;/code&gt;&lt;/strong&gt; — Expose thinking tokens in the API response so we can see what's happening.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;--parallel 1&lt;/code&gt;&lt;/strong&gt; — Single slot instead of default 4. Four slots × 32K context was causing OOM kills.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then we pointed Coder at the new endpoint. The provider base URL switched from Ollama's &lt;code&gt;localhost:11434&lt;/code&gt; to llama-server's &lt;code&gt;localhost:8080/v1/&lt;/code&gt;, and the model config got the full GGUF filename with 32K context and output limits.&lt;/p&gt;

&lt;p&gt;&lt;a href="/images/slaying-the-gemma-beast/coder-provider-config-llamacpp.png" class="article-body-image-wrapper"&gt;&lt;img src="/images/slaying-the-gemma-beast/coder-provider-config-llamacpp.png" alt="Coder Agents provider configuration — base URL pointing to llama.cpp's OpenAI-compatible endpoint"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="/images/slaying-the-gemma-beast/coder-model-config-gemma4.png" class="article-body-image-wrapper"&gt;&lt;img src="/images/slaying-the-gemma-beast/coder-model-config-gemma4.png" alt="Coder Agents model configuration — Gemma 4 GGUF with 32K context limit"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="/images/slaying-the-gemma-beast/coder-model-config-advanced.png" class="article-body-image-wrapper"&gt;&lt;img src="/images/slaying-the-gemma-beast/coder-model-config-advanced.png" alt="Advanced model settings — max output tokens set to 32768 to match the context window"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Attempts to Slay the Beast
&lt;/h2&gt;

&lt;p&gt;It didn't work on the first try.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attempt 1&lt;/strong&gt;: Gemma made tool calls — real progress compared to the original test — but hit a GitHub auth failure (&lt;code&gt;$GITHUB_TOKEN&lt;/code&gt; wasn't set in the workspace) and stalled. The last output was raw token leakage: &lt;code&gt;call:execute{command:&amp;lt;|"&amp;gt;find...&lt;/code&gt; — special tokens leaking into the response, one of the known Gemma issues.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attempt 2&lt;/strong&gt;: We fixed the auth, added &lt;code&gt;--reasoning-format deepseek&lt;/code&gt;, and restarted. Gemma got much further — wrote a search index generator, ran it, started exploring the codebase. Then llama-server got &lt;code&gt;Killed&lt;/code&gt; — the OOM killer struck. Four parallel slots at 32K context each was too much VRAM.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attempt 3&lt;/strong&gt;: Reduced to &lt;code&gt;--parallel 1&lt;/code&gt;, pre-cloned both repos in the workspace so Gemma didn't have to fight auth during exploration. This time it worked. Gemma laid out a clear implementation plan, and after one nudge — "keep going, don't stop, code and commit" — it executed the entire thing.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Fast Was It?
&lt;/h2&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/posts/model-showdown-round-2-adding-gemma-kimi-and-579-gb-of-stubborn-optimism"&gt;Model Showdown Round 2&lt;/a&gt;, Gemma 4 clocked 167.1 tok/s on a short benchmark task via Ollama — the fastest perfect scorer. But a benchmark prompt and an agentic coding session are different workloads. How does Gemma perform when it's actually building something?&lt;/p&gt;

&lt;p&gt;We ran fresh benchmarks against the llama.cpp server with coding prompts at different output lengths:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Prompt Tokens&lt;/th&gt;
&lt;th&gt;Output Tokens&lt;/th&gt;
&lt;th&gt;TTFT&lt;/th&gt;
&lt;th&gt;Tok/s&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Short (debounce function)&lt;/td&gt;
&lt;td&gt;29&lt;/td&gt;
&lt;td&gt;512&lt;/td&gt;
&lt;td&gt;27ms&lt;/td&gt;
&lt;td&gt;179.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Medium (React component)&lt;/td&gt;
&lt;td&gt;63&lt;/td&gt;
&lt;td&gt;2,048&lt;/td&gt;
&lt;td&gt;28ms&lt;/td&gt;
&lt;td&gt;177.3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Long (full Node.js script)&lt;/td&gt;
&lt;td&gt;62&lt;/td&gt;
&lt;td&gt;2,679&lt;/td&gt;
&lt;td&gt;29ms&lt;/td&gt;
&lt;td&gt;181.2&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Three things stand out.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Time to first token is near-instant.&lt;/strong&gt; 27–29ms TTFT means the streaming UI starts filling in almost immediately. For comparison, cloud models typically hit 500ms–2s TTFT depending on load and routing. On a local GPU, there's no network round-trip, no queue, no cold start.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Generation speed doesn't degrade.&lt;/strong&gt; Whether Gemma is writing 512 tokens or 2,679 tokens, throughput stays locked at 177–181 tok/s. There's no slowdown as context grows — at least not at these output lengths. During the actual search build session, with thousands of tokens of accumulated context from tool calls and file contents, we observed ~159 tok/s. That's a ~12% drop from peak, which is expected: more context means more attention computation per token.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The reasoning budget has a real cost.&lt;/strong&gt; With &lt;code&gt;--reasoning-format deepseek&lt;/code&gt;, Gemma's thinking tokens are visible in the API response. On a short 256-token request, the model spent all 256 tokens reasoning and produced zero visible output. That's the invisible thinking token problem in action — and exactly why &lt;code&gt;--reasoning-budget 4096&lt;/code&gt; matters. Cap the thinking, and the remaining budget goes to code.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Ollama (Showdown R2)&lt;/th&gt;
&lt;th&gt;llama.cpp (this session)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tok/s (benchmark)&lt;/td&gt;
&lt;td&gt;167.1&lt;/td&gt;
&lt;td&gt;177–181&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tok/s (real workload)&lt;/td&gt;
&lt;td&gt;N/A (failed)&lt;/td&gt;
&lt;td&gt;~159&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TTFT&lt;/td&gt;
&lt;td&gt;3.92s&lt;/td&gt;
&lt;td&gt;~28ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reasoning budget control&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;&lt;code&gt;--reasoning-budget 4096&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The TTFT difference is dramatic — 3.92s vs 28ms. Ollama's 3.92s likely included model loading or prompt cache misses. llama-server keeps the model hot in VRAM with a persistent prompt cache, so subsequent requests start generating almost instantly.&lt;/p&gt;

&lt;p&gt;Bottom line: Gemma 4 on an RTX 5090 via llama.cpp generates code at ~180 tok/s peak, ~159 tok/s under real agentic load, with sub-30ms TTFT. That's fast enough that the model is never the bottleneck — tool execution (git operations, file I/O, npm installs) takes longer than inference.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Gemma Built
&lt;/h2&gt;

&lt;p&gt;Two prompts. One feature. Pushed to main.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt; package-lock.json                | 466 +++++++++++++++++++
 package.json                     |   3 +-
 public/search-index.json         |  34 +++
 scripts/generate-search-index.ts |  40 ++++
 src/components/Header.tsx        |  32 +++
 src/components/SearchModal.tsx   | 216 ++++++++++++++++++
 6 files changed, 618 insertions(+), 173 deletions(-)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The architecture: a &lt;strong&gt;client-side Fuse.js search&lt;/strong&gt; with a pre-generated JSON index. A build-time script reads all published posts and generates &lt;code&gt;public/search-index.json&lt;/code&gt;. The &lt;code&gt;SearchModal&lt;/code&gt; component loads this index on first open, runs fuzzy searches with Fuse.js, and renders results in a Cmd+K overlay.&lt;/p&gt;

&lt;p&gt;Gemma even hit an authentication error during &lt;code&gt;git push&lt;/code&gt; — and &lt;strong&gt;self-corrected&lt;/strong&gt;. It ran &lt;code&gt;coder external-auth access-token github&lt;/code&gt;, reconfigured the git remote with the token, and pushed successfully. That's agentic behavior — the thing that was completely absent in the original test.&lt;/p&gt;

&lt;p&gt;The commit message: &lt;code&gt;afb5c73 feat: add search functionality with Fuse.js&lt;/code&gt;. Vercel auto-deployed from main. The feature went live.&lt;/p&gt;

&lt;p&gt;&lt;a href="/images/slaying-the-gemma-beast/gemma4-search-plan-coder-agents.png" class="article-body-image-wrapper"&gt;&lt;img src="/images/slaying-the-gemma-beast/gemma4-search-plan-coder-agents.png" alt="Gemma 4 in Coder Agents — laying out its search implementation plan before writing code"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="/images/slaying-the-gemma-beast/vibescoder-homepage-live.png" class="article-body-image-wrapper"&gt;&lt;img src="/images/slaying-the-gemma-beast/vibescoder-homepage-live.png" alt="vibescoder.dev homepage with the search feature now live"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="/images/slaying-the-gemma-beast/search-modal-no-results.png" class="article-body-image-wrapper"&gt;&lt;img src="/images/slaying-the-gemma-beast/search-modal-no-results.png" alt="The search modal in action — Gemma built this"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Code Review: What Gemma Got Right and Wrong
&lt;/h2&gt;

&lt;p&gt;Working code that ships is a milestone. But "it works" and "it's production-quality" are different standards. Claude reviewed every line of Gemma's implementation. Here's the honest assessment.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Gemma Got Right
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Architecture was sound.&lt;/strong&gt; Client-side search with a pre-generated JSON index is the correct call for a 14-post blog. No server-side API needed, no database, sub-5ms search times. The index is ~130 KB — smaller than a hero image.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Component structure was clean.&lt;/strong&gt; Separate &lt;code&gt;SearchModal&lt;/code&gt; component, separate build script, clean Header integration. Three lines to wire it into the existing layout.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It used the existing design system.&lt;/strong&gt; CSS variables like &lt;code&gt;bg-surface&lt;/code&gt;, &lt;code&gt;border-primary&lt;/code&gt;, &lt;code&gt;text-on-surface&lt;/code&gt; — all from the Neon Brutalist theme. It read the codebase and matched the patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Self-correcting on errors.&lt;/strong&gt; When &lt;code&gt;git push&lt;/code&gt; failed, Gemma diagnosed the auth issue and fixed it autonomously. Three tool calls: fetch token → reconfigure remote → push. No human intervention needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Gemma Got Wrong
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Zero accessibility.&lt;/strong&gt; No &lt;code&gt;role="dialog"&lt;/code&gt;, no &lt;code&gt;role="combobox"&lt;/code&gt;, no &lt;code&gt;aria-modal&lt;/code&gt;, no &lt;code&gt;aria-activedescendant&lt;/code&gt;, no focus trap. A screen reader would have no idea this modal existed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Broken exit animations.&lt;/strong&gt; The &lt;code&gt;AnimatePresence&lt;/code&gt; wrapper contained a regular &lt;code&gt;&amp;lt;div&amp;gt;&lt;/code&gt; instead of a &lt;code&gt;motion.div&lt;/code&gt;. When the modal closed, React unmounted the wrapper immediately, killing the exit animations before they played. The code looked right but didn't work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Performance anti-pattern.&lt;/strong&gt; A new &lt;code&gt;Fuse&lt;/code&gt; instance was constructed on every keystroke. Fuse builds an internal index on construction — that's wasted work. Should be &lt;code&gt;useMemo&lt;/code&gt; keyed on the index data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Eager loading.&lt;/strong&gt; The search index was fetched on every page load, even if the user never opened search. Should lazy-load on first modal open.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wrong fonts.&lt;/strong&gt; Applied &lt;code&gt;--font-headline&lt;/code&gt; (Space Grotesk) to the entire modal including body text and descriptions. The codebase uses headline for titles only, with the default font for body text.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ignored existing components.&lt;/strong&gt; Rendered tags as raw &lt;code&gt;&amp;lt;span&amp;gt;&lt;/code&gt; elements with custom styling instead of reusing the existing &lt;code&gt;TagBadge&lt;/code&gt; component that already had the right design tokens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stale search index committed to git.&lt;/strong&gt; The generated &lt;code&gt;search-index.json&lt;/code&gt; was committed with 3 placeholder posts. It's a build artifact — should be in &lt;code&gt;.gitignore&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Content truncated too aggressively.&lt;/strong&gt; Each post's content was cut to 1,000 characters. Terms that only appeared deeper in posts (like "RustDesk" in our infrastructure writeups) were invisible to search.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Polish Pass
&lt;/h2&gt;

&lt;p&gt;Claude's fix addressed every issue in a single PR:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Accessibility&lt;/strong&gt;: Full ARIA combobox pattern — &lt;code&gt;role="dialog"&lt;/code&gt;, &lt;code&gt;role="combobox"&lt;/code&gt; on the input with &lt;code&gt;aria-expanded&lt;/code&gt;/&lt;code&gt;aria-activedescendant&lt;/code&gt;, &lt;code&gt;role="listbox"&lt;/code&gt; and &lt;code&gt;role="option"&lt;/code&gt; on results, &lt;code&gt;aria-live="polite"&lt;/code&gt; for result count announcements.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keyboard navigation&lt;/strong&gt;: Arrow Up/Down to move through results, Enter to navigate, Escape to close. Active result scrolls into view automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Performance&lt;/strong&gt;: Fuse instance memoized with &lt;code&gt;useMemo&lt;/code&gt; (rebuilds only when index changes). Index fetched lazily on first modal open. Minimum 2 characters before searching.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Search quality&lt;/strong&gt;: Weighted field scoring — title matches score 3× higher than content matches, tags 2×, descriptions 1.5×. Markdown stripped from indexed content. Full post content indexed with no truncation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Design system&lt;/strong&gt;: Correct font usage matching PostCard patterns. TagBadge component reused. Platform-aware keyboard hint (⌘K on Mac, Ctrl+K elsewhere).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Animation fix&lt;/strong&gt;: Outer wrapper is now a &lt;code&gt;motion.div&lt;/code&gt; — exit animations actually play.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cleanup&lt;/strong&gt;: Body scroll lock, query cleared on close, build artifact gitignored, dead imports removed.&lt;/p&gt;

&lt;p&gt;The polish commit: 383 insertions, 201 deletions across 5 files. The combined feature is 804 lines across 6 files.&lt;/p&gt;

&lt;h2&gt;
  
  
  Opus vs. Gemma+Opus: An Honest Comparison
&lt;/h2&gt;

&lt;p&gt;We now have two complete implementations of the same feature. Opus 4.6's original branch (&lt;code&gt;feature/search-opus46&lt;/code&gt;, 698 lines) is still in the repo. Here's how they compare.&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Opus 4.6 (original)&lt;/th&gt;
&lt;th&gt;Gemma 4 + Opus (shipped)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Search engine&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Server-side API route with weighted scoring&lt;/td&gt;
&lt;td&gt;Client-side Fuse.js with weighted config&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Index&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None — reads posts at request time&lt;/td&gt;
&lt;td&gt;Pre-generated JSON, fetched once&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Surfaces&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Cmd+K dialog + &lt;code&gt;/search&lt;/code&gt; page&lt;/td&gt;
&lt;td&gt;Cmd+K modal only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;URL state&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes (&lt;code&gt;/search?q=cloudflare&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Opus's architecture is more feature-complete. A dedicated &lt;code&gt;/search&lt;/code&gt; page with URL state means search results are linkable and shareable. The server-side API route means the search logic runs where the content lives, with no index to generate or cache.&lt;/p&gt;

&lt;p&gt;Gemma's architecture is simpler and arguably better for this scale. A static JSON index means zero server load, instant results, and the feature works on Vercel's free tier without hitting function invocation limits. At 14 posts and 130 KB, client-side search is the right call.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Quality
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Opus 4.6&lt;/th&gt;
&lt;th&gt;Gemma 4 (raw)&lt;/th&gt;
&lt;th&gt;Gemma 4 + Opus (merged)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Accessibility&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Full ARIA, keyboard nav&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Full ARIA, keyboard nav&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Animation correctness&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Correct&lt;/td&gt;
&lt;td&gt;Broken exits&lt;/td&gt;
&lt;td&gt;Fixed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Performance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AbortController for API calls&lt;/td&gt;
&lt;td&gt;Fuse recreated per keystroke&lt;/td&gt;
&lt;td&gt;Memoized, lazy-loaded&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Design system&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Mostly correct&lt;/td&gt;
&lt;td&gt;Mostly correct&lt;/td&gt;
&lt;td&gt;Fully correct&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Known bugs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3 (duplicate logic, type cast, missing Suspense)&lt;/td&gt;
&lt;td&gt;7 (see review above)&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Opus's raw output was higher quality. Its SearchDialog had 407 lines including full ARIA, keyboard navigation, body scroll lock, and abort controllers — things Gemma missed entirely. But Opus also had its own bugs: duplicated search logic between the API route and the &lt;code&gt;/search&lt;/code&gt; page, an unsafe type cast, and a missing Suspense boundary. We scored it 87.5/100 in the original review.&lt;/p&gt;

&lt;p&gt;The merged Gemma+Opus implementation is the cleanest of the three. It takes Gemma's simpler architecture, applies Opus's quality standards for accessibility and interaction design, and fixes the issues both models left behind.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Real Comparison
&lt;/h3&gt;

&lt;p&gt;The honest truth: if I had to ship search today with one model and no review, I'd pick Opus. It produced higher-quality code in a single turn with zero intervention. The 87.5/100 score reflects real, shippable work with minor fixable issues.&lt;/p&gt;

&lt;p&gt;But that's not the interesting takeaway. The interesting takeaway is that &lt;strong&gt;the configuration changes mattered more than the model differences.&lt;/strong&gt; The original Gemma test didn't fail because Gemma is a bad model. It failed because:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;num_predict&lt;/code&gt; was too low (invisible thinking tokens consumed the budget)&lt;/li&gt;
&lt;li&gt;Ollama doesn't expose &lt;code&gt;--reasoning-budget&lt;/code&gt; (no way to cap thinking)&lt;/li&gt;
&lt;li&gt;Default parallel slots exhausted VRAM&lt;/li&gt;
&lt;li&gt;GitHub auth wasn't configured in the workspace&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Fix those four things — all infrastructure, not model weights — and Gemma went from "0 commits in 8 prompts" to "shipped a feature in 2 prompts." The model was the same. The environment was different.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for Local Models
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Local models can ship production features.&lt;/strong&gt; Not hypothetically. This search feature is live, built entirely by Gemma 4 running on consumer hardware. The code needed polish — but so does most code from any developer, human or AI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Configuration is the bottleneck, not capability.&lt;/strong&gt; The difference between "Gemma can't finish anything" and "Gemma ships a feature" was four infrastructure changes. Most teams evaluating local models are testing against default settings that actively sabotage the model's output. Invisible thinking tokens, insufficient context windows, VRAM contention — these are environment bugs, not model bugs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The best workflow might be local + cloud.&lt;/strong&gt; Gemma built the feature (free, fast, private). Claude reviewed and polished it (thorough, quality-focused). Each model did what it's best at. The total cost was one Opus API call for the review pass, not dozens for the entire build.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;llama.cpp is the right tool for serious local inference.&lt;/strong&gt; Ollama is great for getting started. For production use — where you need reasoning budgets, precise context control, and OpenAI-compatible APIs that tools like Coder can consume — llama-server gives you the knobs you actually need.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Settings That Made It Work
&lt;/h2&gt;

&lt;p&gt;For anyone running Gemma 4 locally, here's the configuration that turned it from a planning machine into a shipping machine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;llama-server &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-m&lt;/span&gt; gemma-4-26B-A4B-it-UD-Q4_K_M.gguf &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--ctx-size&lt;/span&gt; 32768 &lt;span class="se"&gt;\ &lt;/span&gt;      &lt;span class="c"&gt;# 32K context — ~19 GB VRAM on 5090&lt;/span&gt;
  &lt;span class="nt"&gt;-n&lt;/span&gt; 32768 &lt;span class="se"&gt;\ &lt;/span&gt;              &lt;span class="c"&gt;# 32K max output tokens&lt;/span&gt;
  &lt;span class="nt"&gt;--reasoning-budget&lt;/span&gt; 4096 &lt;span class="se"&gt;\ &lt;/span&gt;&lt;span class="c"&gt;# Cap thinking at 4K tokens&lt;/span&gt;
  &lt;span class="nt"&gt;--reasoning-format&lt;/span&gt; deepseek &lt;span class="se"&gt;\ &lt;/span&gt;&lt;span class="c"&gt;# Expose thinking in API response&lt;/span&gt;
  &lt;span class="nt"&gt;--parallel&lt;/span&gt; 1 &lt;span class="se"&gt;\ &lt;/span&gt;          &lt;span class="c"&gt;# Single slot — don't OOM with 4 × 32K&lt;/span&gt;
  &lt;span class="nt"&gt;-ngl&lt;/span&gt; 999                 &lt;span class="c"&gt;# All layers on GPU&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--reasoning-budget 4096&lt;/code&gt; is the single most important flag. Without it, Gemma can spend its entire output budget on reasoning you never see. With it, the model gets 4K tokens to think, then the rest is for actual code. That one flag is the difference between a model that plans forever and a model that ships.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;Right now, Gemma 4 serves a single Coder instance on the workstation where it runs. That's fine for one person, but the RTX 5090 is sitting idle most of the day. The obvious next step: &lt;strong&gt;make it available to every machine on the local network.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;My wife runs &lt;a href="https://github.com/openclaw/openclaw" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; on a Mac Mini in the other room. With Tailscale already meshing our devices together, pointing her OpenClaw instance at &lt;code&gt;http://workstation:8080/v1/&lt;/code&gt; is trivially easy — llama-server's OpenAI-compatible API means any tool that speaks the OpenAI protocol can use it. One GPU, multiple clients, zero cloud costs.&lt;/p&gt;

&lt;p&gt;Beyond that: migrating the remaining Ollama models to llama.cpp (for the same reasoning budget control we needed here), experimenting with longer context windows now that we know the VRAM budget, and — inevitably — the next model showdown when Gemma 4's bigger variants drop.&lt;/p&gt;

&lt;p&gt;The homelab keeps growing. Who knows? Maybe the lobster starts vibe coding for me, too.&lt;/p&gt;

&lt;h2&gt;
  
  
  By the Numbers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;3&lt;/strong&gt; attempts before Gemma completed the task (auth fix, OOM fix, success)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2&lt;/strong&gt; prompts in the successful run (vs 8 failed prompts in the original test)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;618&lt;/strong&gt; lines written by Gemma 4 across 6 files&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;383&lt;/strong&gt; lines changed in the Opus polish pass (insertions + deletions)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;804&lt;/strong&gt; total lines in the merged feature&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;0&lt;/strong&gt; cloud API calls for the build phase (Gemma ran 100% local)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;177–181&lt;/strong&gt; tokens per second — Gemma's peak generation speed on the RTX 5090&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;~159&lt;/strong&gt; tokens per second — effective speed under real agentic load (accumulated context)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;28ms&lt;/strong&gt; time to first token — near-instant streaming start&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;16.9 GB&lt;/strong&gt; model size (Gemma 4 26B-A4B, Q4_K_M quantization)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;~19 GB&lt;/strong&gt; total VRAM at 32K context (comfortable fit on 32 GB card)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;4,096&lt;/strong&gt; reasoning budget tokens — the setting that made it all work&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;$0&lt;/strong&gt; inference cost for the feature build&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1&lt;/strong&gt; nudge needed ("keep going, don't stop, code and commit")&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;7&lt;/strong&gt; bugs found in Gemma's code during review (all fixed)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3&lt;/strong&gt; bugs in Opus's original implementation (never merged, never fixed)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;0&lt;/strong&gt; bugs in the merged Gemma+Opus version&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1&lt;/strong&gt; production feature, live on vibescoder.dev right now — press ⌘K to try it&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>benchmark</category>
      <category>homelab</category>
    </item>
    <item>
      <title>Day 3: Building the Editing Layer</title>
      <dc:creator>Rob</dc:creator>
      <pubDate>Thu, 07 May 2026 23:30:32 +0000</pubDate>
      <link>https://dev.to/carryologist/day-3-building-the-editing-layer-53jn</link>
      <guid>https://dev.to/carryologist/day-3-building-the-editing-layer-53jn</guid>
      <description>&lt;h2&gt;
  
  
  The Problem with Day 2
&lt;/h2&gt;

&lt;p&gt;Day 2 ended with a beautiful site. Neon Brutalist palette, light/dark toggle, design tokens wired up properly. It looked great. But I couldn't actually use it.&lt;/p&gt;

&lt;p&gt;The admin login page accepted the correct password and then… nothing happened. No error, no redirect, no feedback. Just the same login form staring back at me. The edit button on posts linked to a recording page that ignored the fact that you were editing — it always started fresh. The admin dashboard had one big "Record" button and no way to find an existing post.&lt;/p&gt;

&lt;p&gt;Day 3 was about turning a pretty blog into a functional one. Less visual, more plumbing. The kind of session where everything you build is invisible to readers but makes the difference between "I'll update that later" and actually updating it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Silent Login Bug
&lt;/h2&gt;

&lt;p&gt;This one was subtle. The login form called &lt;code&gt;fetch&lt;/code&gt; to set a session cookie, then used Next.js &lt;code&gt;router.push("/admin")&lt;/code&gt; to navigate. In the App Router, &lt;code&gt;router.push&lt;/code&gt; is a soft navigation — it fetches a React Server Component payload over the wire and patches the DOM. No full page reload.&lt;/p&gt;

&lt;p&gt;The problem: after &lt;code&gt;fetch&lt;/code&gt; sets a cookie, a soft navigation can reuse a stale client-side cache or have the middleware redirect silently swallowed. The browser has the cookie, but the RSC request either doesn't send it or the middleware response gets eaten by the router. The user lands right back on &lt;code&gt;/admin/login&lt;/code&gt; with zero feedback.&lt;/p&gt;

&lt;p&gt;The fix is almost embarrassingly simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Before: soft navigation, cookie may not propagate&lt;/span&gt;
&lt;span class="nx"&gt;router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/admin&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// After: hard navigation, browser sends fresh cookies&lt;/span&gt;
&lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;location&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;href&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/admin&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same fix for logout. &lt;code&gt;window.location.href&lt;/code&gt; forces the browser to make a full request with the current cookie jar. It's the standard pattern for auth state transitions in Next.js App Router — any time cookies change, don't trust soft nav.&lt;/p&gt;

&lt;p&gt;This is the kind of bug that doesn't show up in development. You're already authenticated, the cache is warm, everything works. It only bites you in production when a real user (me, from my phone) hits the cold path.&lt;/p&gt;

&lt;h2&gt;
  
  
  Admin Access
&lt;/h2&gt;

&lt;p&gt;Two small UX fixes while I was in the auth flow:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Footer link.&lt;/strong&gt; Added a subtle "Admin" link in the site footer — dimmed by default, lights up on hover with the primary accent. No security implications; the middleware blocks everything without a valid JWT. But now I don't have to type &lt;code&gt;/admin&lt;/code&gt; from memory on a phone keyboard.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Visitor-friendly login page.&lt;/strong&gt; If a non-admin stumbles onto &lt;code&gt;/admin/login&lt;/code&gt;, they now see "This area is for the site owner" with a "Back to the blog →" link instead of a bare password field. Small thing, but it's the difference between a page that looks broken and one that looks intentional.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Edit Pipeline
&lt;/h2&gt;

&lt;p&gt;This was the biggest fix of the session. The blog had a voice recording feature from Day 1 — talk into the browser, Claude generates an MDX post, one-click publish to GitHub. But it was create-only. The "Edit" button on each post linked to &lt;code&gt;/admin/record?edit=slug&lt;/code&gt;, and the record page completely ignored the query parameter. Every recording session created a new post.&lt;/p&gt;

&lt;p&gt;The fix touched five files:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Record page&lt;/strong&gt; reads &lt;code&gt;?edit=slug&lt;/code&gt; via &lt;code&gt;useSearchParams&lt;/code&gt; (wrapped in a Suspense boundary — App Router requires this for client-side search params)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;On mount in edit mode&lt;/strong&gt;, fetches the existing post content from the API&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generation API&lt;/strong&gt; receives the existing content alongside the new transcript, so Claude merges rather than rewrites&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Publish call&lt;/strong&gt; uses &lt;code&gt;PUT&lt;/code&gt; (update) instead of &lt;code&gt;POST&lt;/code&gt; (create) when in edit mode&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PostPreview&lt;/strong&gt; locks the slug field and shows "Update on GitHub" instead of "Publish to GitHub"&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The &lt;code&gt;useSearchParams&lt;/code&gt; + Suspense requirement is a Next.js App Router detail worth calling out. Without the Suspense boundary, the page throws during static rendering because search params aren't available server-side. It's documented, but it's the kind of thing that bites you when you're porting a pattern from Pages Router.&lt;/p&gt;

&lt;p&gt;Now the voice-to-blog pipeline works in both directions: create from scratch, or record additional context and merge it into an existing post. Same UI, same flow, different HTTP verb.&lt;/p&gt;

&lt;h2&gt;
  
  
  Admin Dashboard Overhaul
&lt;/h2&gt;

&lt;p&gt;The original admin dashboard had three cards: Record, Manage, Settings. Record was the only one that did anything useful. For Day 3, it became two distinct entry points:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Record New Post&lt;/strong&gt; — same as before, links to &lt;code&gt;/admin/record&lt;/code&gt; with no query params.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Edit Existing Post&lt;/strong&gt; — a searchable dropdown picker. Type to filter posts by title (case-insensitive substring match), click to navigate to &lt;code&gt;/admin/record?edit=slug&lt;/code&gt;. The dropdown is scrollable (&lt;code&gt;max-h-64&lt;/code&gt;) and closes on click-outside via a &lt;code&gt;pointerdown&lt;/code&gt; listener. Designed to handle hundreds of posts without pagination — client-side filtering is fine at blog scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  Inline Text Editing
&lt;/h2&gt;

&lt;p&gt;The voice recording flow is great for substantial rewrites, but overkill for fixing a typo. So I added a third option: inline text editing directly on the blog post page.&lt;/p&gt;

&lt;p&gt;The admin bar on each post (visible only when logged in) expanded from two actions to three:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Type Edits&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Opens a textarea with the raw MDX, right on the post page&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Record Edits&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Links to the voice recording flow for substantial changes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Delete Post&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Removes the post from GitHub (moved to right side)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The inline editor loads the raw MDX source, lets you edit in place, and saves via &lt;code&gt;PUT /api/posts&lt;/code&gt;. No page navigation, no recording session, no Claude generation. Just fix the typo and hit Save.&lt;/p&gt;

&lt;p&gt;This is the feature that changed how I use the site. Before, fixing a date typo meant opening the voice recorder, describing the change, waiting for Claude to regenerate, and publishing. Now it's: click Type Edits, fix the character, Save. Five seconds instead of two minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Public Changelog
&lt;/h2&gt;

&lt;p&gt;Every edit should be transparent. I added a &lt;code&gt;changelog&lt;/code&gt; field to the post frontmatter — an array of &lt;code&gt;{date, summary}&lt;/code&gt; entries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;changelog&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;date&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;2026-04-16'&lt;/span&gt;
    &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Fixed Koto -&amp;gt; Coder typo&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A collapsible &lt;code&gt;&amp;lt;Changelog&amp;gt;&lt;/code&gt; component renders between the post header and content. Collapsed by default — "▸ 1 update" or "▸ N updates" — and expands on click to show the full history. Subtle monospace styling that doesn't compete with the post content.&lt;/p&gt;

&lt;p&gt;The changelog entries are generated automatically based on the edit type:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Inline edits&lt;/strong&gt; get a summary derived from a line-level diff: "Minor text edits," "Edited N lines," or "Revised post content (N lines changed)" depending on scope&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voice recording edits&lt;/strong&gt; default to "Updated via voice recording"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manual summaries&lt;/strong&gt; are still supported if you want to be specific&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No AI needed for the diff summaries — just line counting. No commit SHAs, file paths, or system information exposed. Only reader-facing descriptions.&lt;/p&gt;

&lt;p&gt;One gotcha: the initial changelog text used &lt;code&gt;text-outline-variant/60&lt;/code&gt;, which was nearly invisible in light mode. Bumped to &lt;code&gt;text-on-surface-variant&lt;/code&gt; for proper readability in both themes. Another reminder to test both modes after every UI change.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Blog Post Consolidation
&lt;/h2&gt;

&lt;p&gt;A meta moment: during Day 3, I used the voice recording feature to publish a post about the Google Stitch workflow from Day 2. Then I realized it belonged in the Day 2 post, not as a standalone entry.&lt;/p&gt;

&lt;p&gt;So I merged it — the brand guidelines trick and the mobile pipeline section got folded into the Day 2 post as new subsections, and the standalone third post got deleted. This is the kind of editorial decision that's easy when your content is just files in git. No CMS to wrestle with, no database records to reconcile. Delete a file, edit another file, push.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Soft navigation is not your friend during auth transitions.&lt;/strong&gt; Next.js App Router's &lt;code&gt;router.push&lt;/code&gt; is a client-side RSC fetch. If you've just set or cleared a cookie, the soft navigation may not reflect that. Use &lt;code&gt;window.location.href&lt;/code&gt; any time authentication state changes. This isn't a bug — it's a fundamental aspect of how RSC caching works.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Build the editing tools early.&lt;/strong&gt; I should have built inline editing on Day 1. Every session since has involved going back to fix small things in previous posts — typos, date errors, wording tweaks. Without quick inline editing, each fix was a multi-step process through the voice pipeline. The moment I had a textarea and a Save button, my publishing velocity doubled.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Transparency scales trust.&lt;/strong&gt; The public changelog is a small feature, but it signals something: this content is alive, corrections are acknowledged, and readers can see what changed. For a blog about building in public, that's table stakes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Each session fixes friction from the last one.&lt;/strong&gt; Day 1 built the foundation. Day 2 made it look right. Day 3 made it usable. The pattern is consistent: build something, use it for real, discover what's broken or slow, fix it next session. The site isn't being designed upfront — it's being discovered through use.&lt;/p&gt;

&lt;h2&gt;
  
  
  By the Numbers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;1 silent auth bug&lt;/strong&gt; fixed (soft nav → hard nav)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5 files&lt;/strong&gt; changed to make the edit pipeline work&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3 admin actions&lt;/strong&gt; per post (type edits, record edits, delete)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1 new component&lt;/strong&gt; (InlineEditor)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1 new frontmatter field&lt;/strong&gt; (changelog)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2 blog posts&lt;/strong&gt; retroactively given changelog entries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1 post&lt;/strong&gt; merged into another, 1 deleted&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;0 design changes&lt;/strong&gt; — all plumbing, all invisible to readers&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>meta</category>
      <category>ai</category>
      <category>coder</category>
      <category>nextjs</category>
    </item>
    <item>
      <title>Open-Sourcing a Blog Without Open-Sourcing Your Drafts</title>
      <dc:creator>Rob</dc:creator>
      <pubDate>Thu, 07 May 2026 23:28:59 +0000</pubDate>
      <link>https://dev.to/carryologist/open-sourcing-a-blog-without-open-sourcing-your-drafts-3pnl</link>
      <guid>https://dev.to/carryologist/open-sourcing-a-blog-without-open-sourcing-your-drafts-3pnl</guid>
      <description>&lt;p&gt;I open-sourced my personal blog repo so I could use &lt;a href="https://giscus.app" rel="noopener noreferrer"&gt;Giscus&lt;/a&gt; for blog comments — it needs a public repo with GitHub Discussions enabled. But open-sourcing the repo meant &lt;em&gt;everything&lt;/em&gt; was public: unpublished drafts, raw session notes, half-baked ideas, and my TODO list. For a thought leadership blog, that's a problem. People could just read GitHub instead of the site.&lt;/p&gt;

&lt;p&gt;Before we even got to that realization, though, we found something worse.&lt;/p&gt;

&lt;p&gt;All of the work in this session was done conversationally through &lt;a href="https://coder.com/agents" rel="noopener noreferrer"&gt;Coder Agents&lt;/a&gt; on a self-hosted home lab setup. (You'll hear a lot more about that setup soon — I'll be writing about the full home lab build next.)&lt;/p&gt;

&lt;h2&gt;
  
  
  The Security Audit
&lt;/h2&gt;

&lt;p&gt;First thing we did was scan the repo for anything sensitive now that it was public. Found three issues.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The .gitignore Was Gone
&lt;/h3&gt;

&lt;p&gt;In a previous session, an agent had tried to set up a drafts workflow. The idea was to use &lt;code&gt;.gitignore&lt;/code&gt; to keep drafts out of the repo. When that broke persistence (gitignored files don't survive workspace destruction), I asked the agent to fix it. Instead of removing the one &lt;code&gt;blog-drafts/&lt;/code&gt; line, it replaced the entire &lt;code&gt;.gitignore&lt;/code&gt; with a single comment — deleting all 50 standard Next.js ignore patterns.&lt;/p&gt;

&lt;p&gt;This meant &lt;code&gt;.env&lt;/code&gt;, &lt;code&gt;.env.local&lt;/code&gt;, &lt;code&gt;node_modules/&lt;/code&gt;, &lt;code&gt;.next/&lt;/code&gt;, &lt;code&gt;.vercel/&lt;/code&gt;, &lt;code&gt;*.pem&lt;/code&gt; — none of it was being ignored. If anyone (or any agent) had run &lt;code&gt;git add .&lt;/code&gt;, every secret in &lt;code&gt;.env&lt;/code&gt; would have been committed to a public repo.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The root cause&lt;/strong&gt;: The agent conflated two separate concerns — git tracking (persistence) and site publishing (visibility). &lt;code&gt;.gitignore&lt;/code&gt; controls what git tracks, not what the site renders. The blog already had &lt;code&gt;published: false&lt;/code&gt; frontmatter support in &lt;code&gt;posts.ts&lt;/code&gt; that filters unpublished posts from the public site. The agent didn't look at existing code before reaching for a filesystem-level solution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson&lt;/strong&gt;: When an AI agent suggests a fix, check whether the codebase already solves the problem. Also, always diff what an agent changed — don't assume a targeted edit was actually targeted.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Live Server URL Exposed
&lt;/h3&gt;

&lt;p&gt;The blog drafts contained my actual Coder server tunnel URL — a live endpoint to my self-hosted instance sitting in the blog fodder notes from a previous session. Anyone could have tried to hit it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix&lt;/strong&gt;: Replaced with a placeholder and rotated the URL.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Infrastructure Reconnaissance
&lt;/h3&gt;

&lt;p&gt;The drafts also contained hardware specs, home lab architecture details, systemd config paths, and multi-user setup info. Not a vulnerability per se, but useful reconnaissance for someone targeting the setup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verdict&lt;/strong&gt;: Acceptable for a "building in public" blog, but worth being aware of.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Problem: Code vs. Content Visibility
&lt;/h2&gt;

&lt;p&gt;After fixing the immediate issues, we hit the bigger question: what about a world where this is a well-trafficked site? Anyone could ignore the site entirely and browse GitHub for unpublished drafts, upcoming topics, and editorial strategy.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;published: false&lt;/code&gt; only gates the rendered site. GitHub shows everything.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Not Just Make the Repo Private?
&lt;/h3&gt;

&lt;p&gt;Giscus. The whole reason we open-sourced was for blog comments. Giscus requires a public repo with GitHub Discussions enabled. Making the repo private kills comments.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Key Insight
&lt;/h3&gt;

&lt;p&gt;Giscus doesn't care what's &lt;em&gt;in&lt;/em&gt; the repo — it just needs a public repo to host GitHub Discussions. The discussions are completely independent of the repo's file contents. So we could separate the code from the content without touching Giscus at all.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: Two Repos
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;the-vibe-coder&lt;/code&gt;&lt;/strong&gt; (public) — The blog engine. All source code, configs, components, API routes. Giscus stays pointed here. Open source, as intended.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;the-vibe-coder-content&lt;/code&gt;&lt;/strong&gt; (private) — All content: published posts, unpublished drafts, raw session notes, settings, images, TODO list. Nobody sees this but me.&lt;/p&gt;

&lt;h3&gt;
  
  
  How They Connect
&lt;/h3&gt;

&lt;p&gt;The critical design decision: the private repo uses the &lt;strong&gt;exact same directory structure&lt;/strong&gt; as the original. This meant zero code changes to the GitHub API client, the post loader, or any admin panel routes. The only change was pointing the &lt;code&gt;GITHUB_REPO&lt;/code&gt; environment variable at the private repo.&lt;/p&gt;

&lt;p&gt;A prebuild script clones the private repo at build time and overlays the content into the working tree. On Vercel, this runs automatically before &lt;code&gt;next build&lt;/code&gt;. Locally, you clone the content repo once and copy or symlink.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Deploy Hook
&lt;/h3&gt;

&lt;p&gt;Since Vercel watches the public code repo, it wouldn't know to rebuild when content changes in the private repo. A GitHub Action on the private repo hits a Vercel Deploy Hook on every push to main:&lt;/p&gt;

&lt;p&gt;Content commit → GitHub Action → Vercel rebuild → site updated.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Wiring
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Created the private content repo&lt;/li&gt;
&lt;li&gt;Pushed all content files (same directory structure)&lt;/li&gt;
&lt;li&gt;Added a &lt;code&gt;fetch-content.sh&lt;/code&gt; prebuild script to the public repo&lt;/li&gt;
&lt;li&gt;Updated &lt;code&gt;.gitignore&lt;/code&gt; to exclude content directories&lt;/li&gt;
&lt;li&gt;Removed content files from the public repo&lt;/li&gt;
&lt;li&gt;Updated &lt;code&gt;GITHUB_REPO&lt;/code&gt; on Vercel to point to private repo&lt;/li&gt;
&lt;li&gt;Created a Vercel Deploy Hook + GitHub Action trigger&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  The Token Gotcha
&lt;/h3&gt;

&lt;p&gt;First deploy failed with exit code 128 (git auth failure). The &lt;code&gt;GITHUB_TOKEN&lt;/code&gt; was a fine-grained PAT scoped to only the original repo. Had to update it in GitHub to also include the new private repo. Fine-grained PATs don't automatically pick up new repos — if your build pipeline uses one and you add a new private repo, you'll get a 403 until you update the token's repo list.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Agents and .gitignore
&lt;/h3&gt;

&lt;p&gt;Agents reach for &lt;code&gt;.gitignore&lt;/code&gt; as a blunt instrument. When the problem is "don't show this on the site," the answer is almost never "don't track it in git." Those are different concerns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Git tracking&lt;/strong&gt; = persistence, collaboration, backup&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Site publishing&lt;/strong&gt; = what visitors see&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Conflating them leads to either lost work (gitignored files vanish) or the opposite — a gutted &lt;code&gt;.gitignore&lt;/code&gt; that exposes secrets.&lt;/p&gt;

&lt;h3&gt;
  
  
  Always Audit Before Open-Sourcing
&lt;/h3&gt;

&lt;p&gt;We caught three issues in a five-minute scan. The &lt;code&gt;.gitignore&lt;/code&gt; one was a genuine time bomb. Open sourcing without a security pass is shipping without testing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Giscus Is Decoupled From Content
&lt;/h3&gt;

&lt;p&gt;This was the unlock. You can have a public repo with zero content files and Giscus works perfectly. "I need Giscus" and "I need private content" aren't in conflict.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Same-Structure Trick
&lt;/h3&gt;

&lt;p&gt;By keeping the private repo's directory layout identical to the original, we avoided code changes entirely. The admin panel, the build process, and the content API all work unchanged — they just talk to a different repo via the same env var. This is the kind of thing that makes a migration smooth instead of a refactor.&lt;/p&gt;

&lt;h2&gt;
  
  
  By the Numbers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;1 &lt;code&gt;.gitignore&lt;/code&gt; restored from 1 line to 52 lines&lt;/li&gt;
&lt;li&gt;1 server URL redacted and rotated&lt;/li&gt;
&lt;li&gt;2 repos (1 public, 1 private)&lt;/li&gt;
&lt;li&gt;0 code changes to the blog engine&lt;/li&gt;
&lt;li&gt;1 prebuild script (24 lines of bash)&lt;/li&gt;
&lt;li&gt;1 GitHub Action (8 lines of YAML)&lt;/li&gt;
&lt;li&gt;1 Vercel Deploy Hook&lt;/li&gt;
&lt;li&gt;1 fine-grained PAT updated&lt;/li&gt;
&lt;li&gt;5 published posts confirmed rendering&lt;/li&gt;
&lt;li&gt;1 unpublished draft confirmed hidden&lt;/li&gt;
&lt;li&gt;~45 minutes from "is there anything sensitive?" to verified deploy&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agents</category>
      <category>security</category>
    </item>
    <item>
      <title>Downtime Is a Feature: Custom Domains, Cloudflare, and MCP While Models Download</title>
      <dc:creator>Rob</dc:creator>
      <pubDate>Thu, 07 May 2026 23:28:55 +0000</pubDate>
      <link>https://dev.to/carryologist/downtime-is-a-feature-custom-domains-cloudflare-and-mcp-while-models-download-33ba</link>
      <guid>https://dev.to/carryologist/downtime-is-a-feature-custom-domains-cloudflare-and-mcp-while-models-download-33ba</guid>
      <description>&lt;p&gt;You know how it goes in AI development — sometimes you're stuck watching progress bars crawl forward. I've been preparing for the next installment of the Local Model Showdown series, and that means downloading some hefty models. Kimi K2.6 decided to take its sweet time. We're talking &lt;em&gt;hours&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;But here's the thing: downtime is really just opportunity in disguise. Instead of watching percentages tick up, I knocked out three items from the backlog that had been bugging me for a week. All done conversationally through &lt;a href="https://coder.com/agents" rel="noopener noreferrer"&gt;Coder Agents&lt;/a&gt;, naturally.&lt;/p&gt;

&lt;p&gt;The hit list:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Put my self-hosted Coder instance behind a real domain&lt;/li&gt;
&lt;li&gt;Harden the Cloudflare setup (and discover a gotcha that every content creator needs to know)&lt;/li&gt;
&lt;li&gt;Wire up MCP servers to give my agents superpowers&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Goal: coder.vibescoder.dev
&lt;/h2&gt;

&lt;p&gt;My Coder instance was running on my homelab Ubuntu workstation, accessible through a &lt;code&gt;try.coder.app&lt;/code&gt; tunnel URL — functional but ugly, hard to remember, and not exactly on-brand. I bought &lt;code&gt;vibescoder.dev&lt;/code&gt; for the blog. Time to use &lt;code&gt;coder.vibescoder.dev&lt;/code&gt; for the dev environment.&lt;/p&gt;

&lt;p&gt;Sounds simple. It wasn't.&lt;/p&gt;

&lt;h2&gt;
  
  
  Attempt 1: CNAME Records (The Naive Approach)
&lt;/h2&gt;

&lt;p&gt;First move was straightforward — add CNAME records in Vercel's DNS management (since I bought the domain through Vercel) pointing the &lt;code&gt;coder&lt;/code&gt; subdomain to the existing tunnel URL.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gotcha #1&lt;/strong&gt;: Vercel's DNS form defaults to record type "A" (which expects an IPv4 address). Spent a minute confused by the "value should match format ipv4" error before realizing I needed to switch the Type dropdown to "CNAME." Small thing, but it'll trip you up if you're not looking at the form defaults.&lt;/p&gt;

&lt;p&gt;&lt;a href="/images/downtime-is-a-feature/vercel-dns-ipv4-error.png" class="article-body-image-wrapper"&gt;&lt;img src="/images/downtime-is-a-feature/vercel-dns-ipv4-error.png" alt="Vercel DNS form showing " value=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Vercel's DNS form defaults to an A record — switch the type to CNAME or you'll get this unhelpful error.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Then I went to update the Access URL in Coder's dashboard. Deployment → General → Access URL. It's right there on the screen... and it's read-only. The UI shows you the value but can't change it. The badges underneath tell the story: &lt;code&gt;CLI --access-url&lt;/code&gt;, &lt;code&gt;ENV CODER_ACCESS_URL&lt;/code&gt;, &lt;code&gt;YAML accessURL&lt;/code&gt;. Server config only.&lt;/p&gt;

&lt;p&gt;Since Coder runs via systemd on my homelab, the config lives at &lt;code&gt;/etc/coder.d/coder.env&lt;/code&gt;. But before I could update it, I needed to solve the bigger problem: getting traffic from the internet to my machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Attempt 2: Port Forwarding (The Frustrating Detour)
&lt;/h2&gt;

&lt;p&gt;For a CNAME-based approach to work, my homelab needs to be reachable from the internet on ports 80 and 443. That means port forwarding on the router — a TP-Link Archer BE800.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The router app doesn't expose port forwarding.&lt;/strong&gt; The Tether iPhone app has a "More" menu with various settings, but NAT Forwarding isn't there. Had to use the web interface at &lt;code&gt;192.168.0.1&lt;/code&gt; instead. Found it under Advanced → NAT Forwarding → Port Forwarding.&lt;/p&gt;

&lt;p&gt;Set up the rules: ports 80 and 443, TCP, forwarded to my machine's internal IP (&lt;code&gt;192.168.0.243&lt;/code&gt;). Then tested with a simple Python HTTP server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;python3 &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"from http.server import HTTPServer, SimpleHTTPRequestHandler; HTTPServer(('0.0.0.0', 443), SimpleHTTPRequestHandler).handle_request()"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Local curl worked. External requests to my public IP? "Connection refused." Not a timeout — &lt;em&gt;refused&lt;/em&gt;. Tried non-standard ports too. Same result.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The diagnosis&lt;/strong&gt;: "Connection refused" from outside while "works locally" means traffic is reaching the public IP but getting actively rejected before it hits the machine. The ISP is likely blocking inbound connections or there's a NAT layer beyond the router. A timeout would mean packets are being dropped. Refused means something is saying "no."&lt;/p&gt;

&lt;p&gt;I spent more time on this than I'd like to admit. Time for Plan B.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: Cloudflare Tunnel
&lt;/h2&gt;

&lt;p&gt;Cloudflare Tunnel flips the model entirely. Instead of opening inbound ports, it creates an &lt;em&gt;outbound&lt;/em&gt; connection from your machine to Cloudflare's edge. No port forwarding. No router config. No public IP exposure. And it's free.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Move Nameservers
&lt;/h3&gt;

&lt;p&gt;Since I needed Cloudflare to manage DNS for &lt;code&gt;vibescoder.dev&lt;/code&gt;, I:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Created a free Cloudflare account&lt;/li&gt;
&lt;li&gt;Added &lt;code&gt;vibescoder.dev&lt;/code&gt; as a site&lt;/li&gt;
&lt;li&gt;Let it auto-import my existing DNS records (Vercel A records, CAA records, everything)&lt;/li&gt;
&lt;li&gt;Selected the Free plan&lt;/li&gt;
&lt;li&gt;Updated nameservers at Vercel (the registrar) to point to Cloudflare's nameservers&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The blog continues to work — Cloudflare imported all existing records, so &lt;code&gt;vibescoder.dev&lt;/code&gt; still routes to Vercel's servers.&lt;/p&gt;

&lt;p&gt;&lt;a href="/images/downtime-is-a-feature/cloudflare-dns-setup.png" class="article-body-image-wrapper"&gt;&lt;img src="/images/downtime-is-a-feature/cloudflare-dns-setup.png" alt="Cloudflare DNS setup page showing "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Cloudflare auto-imports your existing DNS records — the blog keeps working while you set up the tunnel.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Create the Tunnel
&lt;/h3&gt;

&lt;p&gt;On the homelab machine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-L&lt;/span&gt; &lt;span class="nt"&gt;--output&lt;/span&gt; cloudflared.deb https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb
&lt;span class="nb"&gt;sudo &lt;/span&gt;dpkg &lt;span class="nt"&gt;-i&lt;/span&gt; cloudflared.deb
cloudflared tunnel login
cloudflared tunnel create coder-tunnel
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="/images/downtime-is-a-feature/cloudflare-tunnel-success.png" class="article-body-image-wrapper"&gt;&lt;img src="/images/downtime-is-a-feature/cloudflare-tunnel-success.png" alt="Cloudflared certificate installation success page"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;After &lt;code&gt;cloudflared tunnel login&lt;/code&gt;, the browser confirms the certificate is installed and you're authorized.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Config file at &lt;code&gt;~/.cloudflared/config.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;tunnel&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;a1b2c3d4-e5f6-7890-abcd-ef1234567890&lt;/span&gt;
&lt;span class="na"&gt;credentials-file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/home/youruser/.cloudflared/a1b2c3d4-e5f6-7890-abcd-ef1234567890.json&lt;/span&gt;

&lt;span class="na"&gt;ingress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;hostname&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;coder.vibescoder.dev&lt;/span&gt;
    &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://localhost:3000&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;hostname&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*.coder.vibescoder.dev"&lt;/span&gt;
    &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://localhost:3000&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http_status:404&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The wildcard entry handles Coder's app proxying — port forwarding, web terminals, and workspace apps all use subdomains like &lt;code&gt;8080--main--ws--user--coder.vibescoder.dev&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Wire Up DNS and Coder
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cloudflared tunnel route dns coder-tunnel coder.vibescoder.dev
cloudflared tunnel route dns coder-tunnel &lt;span class="s2"&gt;"*.coder.vibescoder.dev"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Updated &lt;code&gt;/etc/coder.d/coder.env&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;CODER_ACCESS_URL&lt;/span&gt;=&lt;span class="n"&gt;https&lt;/span&gt;://&lt;span class="n"&gt;coder&lt;/span&gt;.&lt;span class="n"&gt;vibescoder&lt;/span&gt;.&lt;span class="n"&gt;dev&lt;/span&gt;
&lt;span class="n"&gt;CODER_WILDCARD_ACCESS_URL&lt;/span&gt;=*.&lt;span class="n"&gt;coder&lt;/span&gt;.&lt;span class="n"&gt;vibescoder&lt;/span&gt;.&lt;span class="n"&gt;dev&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Restarted Coder, ran the tunnel — four connections registered to Cloudflare's SJC edge locations. Opened &lt;code&gt;https://coder.vibescoder.dev&lt;/code&gt; in a browser. Coder login screen. Done.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Make It Permanent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Gotcha #2&lt;/strong&gt;: &lt;code&gt;sudo cloudflared service install&lt;/code&gt; couldn't find the config. It looks in &lt;code&gt;/etc/cloudflared/&lt;/code&gt;, not &lt;code&gt;~/.cloudflared/&lt;/code&gt;. Had to copy both files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; /etc/cloudflared
&lt;span class="nb"&gt;sudo cp&lt;/span&gt; ~/.cloudflared/config.yml /etc/cloudflared/config.yml
&lt;span class="nb"&gt;sudo cp&lt;/span&gt; ~/.cloudflared/&lt;span class="k"&gt;*&lt;/span&gt;.json /etc/cloudflared/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Updated the &lt;code&gt;credentials-file&lt;/code&gt; path in the copied config to point to &lt;code&gt;/etc/cloudflared/&lt;/code&gt;, then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;cloudflared service &lt;span class="nb"&gt;install
sudo &lt;/span&gt;systemctl &lt;span class="nb"&gt;enable &lt;/span&gt;cloudflared
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl start cloudflared
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two systemd services now: &lt;code&gt;coder&lt;/code&gt; and &lt;code&gt;cloudflared&lt;/code&gt;. Both start on boot. The architecture:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Browser → https://coder.vibescoder.dev
       → Cloudflare Edge (TLS termination, DNS)
       → Cloudflare Tunnel (outbound from homelab)
       → localhost:3000 (Coder server)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Zero inbound ports. Zero public IP exposure. Free TLS from Cloudflare. Survives reboots.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hardening Cloudflare (And the AI Crawler Gotcha)
&lt;/h2&gt;

&lt;p&gt;With Cloudflare in front of everything, I reviewed the security settings. Here's what's worth enabling on the free tier:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setting&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SSL/TLS → Full (strict)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Ensures encryption all the way to the origin, not just browser-to-Cloudflare&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bot Fight Mode&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Challenges malicious bots — scrapers, credential stuffers, spam&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DDoS Protection&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Already active by default&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Always Online&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Serves cached pages if your origin goes down&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The &lt;strong&gt;WAF Managed Ruleset&lt;/strong&gt; (SQL injection, XSS protection) requires a Pro plan. Skipped for now.&lt;/p&gt;

&lt;p&gt;&lt;a href="/images/downtime-is-a-feature/cloudflare-ssl-full.png" class="article-body-image-wrapper"&gt;&lt;img src="/images/downtime-is-a-feature/cloudflare-ssl-full.png" alt="Cloudflare SSL/TLS encryption mode set to Full"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Set SSL/TLS to Full (strict) to encrypt traffic all the way from Cloudflare to your origin server.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Part Every Content Creator Needs to Read
&lt;/h3&gt;

&lt;p&gt;Cloudflare's free tier includes two AI-related settings that are &lt;strong&gt;on by default&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Block AI bots&lt;/strong&gt; — Blocks bots Cloudflare categorizes as AI training crawlers (GPTBot, CCBot, Google-Extended, etc.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI Labyrinth&lt;/strong&gt; (Beta) — Injects fake AI-generated content into your pages to poison bots that ignore crawling standards&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Both sound great if you want to protect your content from being scraped. But think about what these actually do: they block the crawlers that feed ChatGPT search, Perplexity, Google AI Overviews, and every other AI-powered discovery tool.&lt;/p&gt;

&lt;p&gt;&lt;a href="/images/downtime-is-a-feature/cloudflare-block-ai-bots.png" class="article-body-image-wrapper"&gt;&lt;img src="/images/downtime-is-a-feature/cloudflare-block-ai-bots.png" alt="Cloudflare Block AI bots toggle in the security settings"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This innocent-looking toggle blocks the AI crawlers that power ChatGPT search, Perplexity, and Google AI Overviews.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If your site exists for thought leadership, you &lt;em&gt;want&lt;/em&gt; AI services to find, index, and cite your content.&lt;/strong&gt; That's the entire point. Blocking AI crawlers is blocking your distribution channel.&lt;/p&gt;

&lt;p&gt;The distinction:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Block AI bots / AI Labyrinth&lt;/strong&gt; = blocks crawlers that feed AI search and training. Kills discoverability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bot Fight Mode&lt;/strong&gt; = blocks malicious bots. Doesn't affect legitimate AI crawlers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I turned both &lt;strong&gt;Block AI bots&lt;/strong&gt; and &lt;strong&gt;AI Labyrinth off&lt;/strong&gt;, while keeping Bot Fight Mode on. If you're running a personal brand, a company blog, or anything where you care about AI-powered search visibility — check these settings immediately after onboarding to Cloudflare. The defaults optimize for content protection, not content distribution.&lt;/p&gt;

&lt;h2&gt;
  
  
  MCP: Giving Agents Superpowers
&lt;/h2&gt;

&lt;p&gt;With the infrastructure sorted, I moved to the fun part: MCP (Model Context Protocol) integration. MCP lets AI agents access external tools — think of it as a plugin system for LLMs.&lt;/p&gt;

&lt;h3&gt;
  
  
  The AI Gateway
&lt;/h3&gt;

&lt;p&gt;Two lines in &lt;code&gt;coder.env&lt;/code&gt; unlock the big stuff:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;CODER_EXPERIMENTS&lt;/span&gt;=&lt;span class="n"&gt;oauth2&lt;/span&gt;,&lt;span class="n"&gt;mcp&lt;/span&gt;-&lt;span class="n"&gt;server&lt;/span&gt;-&lt;span class="n"&gt;http&lt;/span&gt;
&lt;span class="n"&gt;CODER_EXTERNAL_AUTH_0_MCP_URL&lt;/span&gt;=&lt;span class="n"&gt;https&lt;/span&gt;://&lt;span class="n"&gt;api&lt;/span&gt;.&lt;span class="n"&gt;githubcopilot&lt;/span&gt;.&lt;span class="n"&gt;com&lt;/span&gt;/&lt;span class="n"&gt;mcp&lt;/span&gt;/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first enables Coder's experimental MCP support. The second wires GitHub's MCP server into Coder's AI Gateway. Since I already had GitHub OAuth configured, this means the gateway automatically injects GitHub tools (prefixed with &lt;code&gt;bmcp_&lt;/code&gt;) into every agent's LLM requests. Every agent in every workspace gets GitHub repo access, PR management, issue tracking — zero per-workspace config.&lt;/p&gt;

&lt;h3&gt;
  
  
  Choosing MCP Servers
&lt;/h3&gt;

&lt;p&gt;Researched the ecosystem and selected five servers based on this specific stack:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;MCP Server&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;GitHub&lt;/strong&gt; (official, 29K stars)&lt;/td&gt;
&lt;td&gt;Blog content is a private GitHub repo. Handled via AI Gateway.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Context7&lt;/strong&gt; (Upstash, 53K stars)&lt;/td&gt;
&lt;td&gt;Feeds current library docs to LLMs instead of hallucinated APIs. Critical for Next.js 16.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Vercel&lt;/strong&gt; (official)&lt;/td&gt;
&lt;td&gt;Check deployments, read build logs, manage env vars.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Cloudflare&lt;/strong&gt; (official, 3.6K stars)&lt;/td&gt;
&lt;td&gt;DNS analytics, tunnel debugging, observability.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Playwright&lt;/strong&gt; (Microsoft, 31K stars)&lt;/td&gt;
&lt;td&gt;Visual testing of blog deployments.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;What I deliberately skipped: &lt;strong&gt;Ollama-specific MCP servers&lt;/strong&gt;. This is a common misconception worth calling out. You don't need an "Ollama MCP server." Ollama is the LLM backend — agents call it for inference. MCP servers provide &lt;em&gt;tools&lt;/em&gt; (GitHub access, deployment management, browser automation). The agent uses Ollama to &lt;em&gt;think&lt;/em&gt; about what to do, and MCP tools to &lt;em&gt;do&lt;/em&gt; it. They're separate concerns.&lt;/p&gt;

&lt;h3&gt;
  
  
  Wiring It Into the Template
&lt;/h3&gt;

&lt;p&gt;MCP servers in Coder aren't configured in the admin panel — they're discovered via a &lt;code&gt;.mcp.json&lt;/code&gt; file in the workspace root. &lt;strong&gt;Gotcha #3&lt;/strong&gt;: I spent time looking for an MCP settings page in the Coder dashboard before discovering this.&lt;/p&gt;

&lt;p&gt;To make it persistent across all workspaces, I edited the Docker template's &lt;code&gt;main.tf&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;coder templates pull docker &lt;span class="nb"&gt;.&lt;/span&gt;
nano main.tf  &lt;span class="c"&gt;# add .mcp.json to startup_script&lt;/span&gt;
coder templates push docker
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="/images/downtime-is-a-feature/nano-main-tf-mcp.png" class="article-body-image-wrapper"&gt;&lt;img src="/images/downtime-is-a-feature/nano-main-tf-mcp.png" alt="Nano editor showing main.tf with MCP server configuration"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Editing the Docker template's main.tf to inject .mcp.json into every workspace at startup.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The startup script now writes this &lt;code&gt;.mcp.json&lt;/code&gt; to every workspace:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"context7"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-y"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"@upstash/context7-mcp"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"vercel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"mcp-remote"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://mcp.vercel.com"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"cloudflare"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"mcp-remote"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://agents.cloudflare.com/mcp"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"playwright"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-y"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"@playwright/mcp@latest"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Context7 and Playwright run locally as stdio processes. Vercel and Cloudflare connect to remote HTTP endpoints and handle OAuth on first use.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Full Stack
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────┐
│              HOMELAB (Ubuntu + RTX 5090)             │
│                                                     │
│  Ollama (local LLMs) ◄── Coder AI Gateway           │
│                           │                         │
│  Coder Server ────────────┤  Injected tools:        │
│  (systemd, port 3000)     │  • bmcp_github_*        │
│                           │                         │
│  Cloudflared ─────────────┤                         │
│  (systemd, tunnel)        │                         │
│                           │                         │
│  Workspace (.mcp.json):   │                         │
│  • context7 (stdio)       │                         │
│  • playwright (stdio)     │                         │
│  • vercel (remote HTTP)   │                         │
│  • cloudflare (remote HTTP)                         │
└───────────────┬─────────────────────────────────────┘
                │ Cloudflare Tunnel
                ▼
┌─────────────────────────────────────────────────────┐
│              EXTERNAL SERVICES                      │
│                                                     │
│  Cloudflare Edge (DNS, TLS, DDoS, Bot Fight Mode)   │
│  GitHub (content repo, MCP via AI Gateway)           │
│  Vercel (blog hosting, MCP remote server)            │
│  Anthropic Claude (blog post generation)             │
│  Upstash Redis (analytics)                           │
└─────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  By the Numbers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;2 port forwarding rules attempted (failed — ISP blocking)&lt;/li&gt;
&lt;li&gt;1 Cloudflare Tunnel created (0 inbound ports required)&lt;/li&gt;
&lt;li&gt;4 environment variables changed in &lt;code&gt;coder.env&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;2 systemd services running (coder + cloudflared)&lt;/li&gt;
&lt;li&gt;5 Cloudflare security settings reviewed&lt;/li&gt;
&lt;li&gt;2 AI-blocking features disabled for thought leadership discoverability&lt;/li&gt;
&lt;li&gt;5 MCP servers configured (GitHub via AI Gateway + 4 in .mcp.json)&lt;/li&gt;
&lt;li&gt;1 workspace template updated&lt;/li&gt;
&lt;li&gt;3 gotchas discovered (Cloudflare AI defaults, &lt;code&gt;.mcp.json&lt;/code&gt; discovery, &lt;code&gt;cloudflared&lt;/code&gt; config paths)&lt;/li&gt;
&lt;li&gt;~2 hours of productive "downtime" while models downloaded&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>coder</category>
      <category>cloudflare</category>
      <category>homelab</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Model Showdown: Benchmarking Local vs Cloud LLMs on a Real Coding Task</title>
      <dc:creator>Rob</dc:creator>
      <pubDate>Thu, 07 May 2026 23:28:28 +0000</pubDate>
      <link>https://dev.to/carryologist/model-showdown-benchmarking-local-vs-cloud-llms-on-a-real-coding-task-10d5</link>
      <guid>https://dev.to/carryologist/model-showdown-benchmarking-local-vs-cloud-llms-on-a-real-coding-task-10d5</guid>
      <description>&lt;p&gt;Last post we stood up Ollama on the RTX 5090, pulled a stack of models, and wired them into our coding workflow. The whole time there was an obvious question hanging over it: are local models actually good enough?&lt;/p&gt;

&lt;p&gt;Not good enough in the abstract benchmarks-on-a-leaderboard sense. Good enough for the thing we’re journaling: vibe coding. Specifically, can a model running on consumer hardware in my homelab produce code that's as correct, as fast, and as complete as what comes back from Anthropic's cloud?&lt;/p&gt;

&lt;p&gt;We built a benchmark to find out.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;Six models, one prompt, no second chances.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cloud (Anthropic API):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sonnet 4.6 (&lt;code&gt;claude-sonnet-4-20250514&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Opus 4.6 (&lt;code&gt;claude-opus-4-20250514&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Local (Ollama on RTX 5090, 32 GB VRAM):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Codestral 22B (&lt;code&gt;codestral:22b&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;DeepSeek R1 14B (&lt;code&gt;deepseek-r1:14b&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Devstral (&lt;code&gt;devstral:latest&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Qwen 3.5B MoE (&lt;code&gt;qwen3.5:35b-a3b&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The prompt was intentionally straightforward: build a Python CLI todo app with SQLite persistence, CRUD commands (add, list, complete, delete), timestamps, pretty output, error handling, and a &lt;code&gt;__main__&lt;/code&gt; block. The kind of task that shows up in real work. A simple "write a small, complete program."&lt;/p&gt;

&lt;p&gt;Every model got the exact same prompt with the instruction: "Respond with ONLY the Python code, no explanation."&lt;/p&gt;

&lt;p&gt;We measured:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Time to first token (TTFT)&lt;/strong&gt;: how long before output starts streaming&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total generation time&lt;/strong&gt;: wall clock from request to last token&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output tokens&lt;/strong&gt;: how much the model wrote&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tokens per second&lt;/strong&gt;: raw generation throughput&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validation&lt;/strong&gt;: does it parse, does it have all the features, does it actually run through a functional test suite of 7 operations (add two todos, list, complete one, list again, delete one, list again)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Results: Performance
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;TTFT&lt;/th&gt;
&lt;th&gt;Total Time&lt;/th&gt;
&lt;th&gt;Output Tokens&lt;/th&gt;
&lt;th&gt;Tok/s&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Sonnet 4.6&lt;/td&gt;
&lt;td&gt;Cloud&lt;/td&gt;
&lt;td&gt;0.87s&lt;/td&gt;
&lt;td&gt;14.89s&lt;/td&gt;
&lt;td&gt;1,461&lt;/td&gt;
&lt;td&gt;104.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Opus 4.6&lt;/td&gt;
&lt;td&gt;Cloud&lt;/td&gt;
&lt;td&gt;1.23s&lt;/td&gt;
&lt;td&gt;19.06s&lt;/td&gt;
&lt;td&gt;1,324&lt;/td&gt;
&lt;td&gt;74.3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codestral 22B&lt;/td&gt;
&lt;td&gt;Local&lt;/td&gt;
&lt;td&gt;15.81s&lt;/td&gt;
&lt;td&gt;22.11s&lt;/td&gt;
&lt;td&gt;620&lt;/td&gt;
&lt;td&gt;98.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek R1&lt;/td&gt;
&lt;td&gt;Local&lt;/td&gt;
&lt;td&gt;11.74s&lt;/td&gt;
&lt;td&gt;20.64s&lt;/td&gt;
&lt;td&gt;1,707&lt;/td&gt;
&lt;td&gt;191.7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Devstral&lt;/td&gt;
&lt;td&gt;Local&lt;/td&gt;
&lt;td&gt;2.24s&lt;/td&gt;
&lt;td&gt;10.26s&lt;/td&gt;
&lt;td&gt;723&lt;/td&gt;
&lt;td&gt;90.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 3.5B&lt;/td&gt;
&lt;td&gt;Local&lt;/td&gt;
&lt;td&gt;28.20s&lt;/td&gt;
&lt;td&gt;30.91s&lt;/td&gt;
&lt;td&gt;4,096&lt;/td&gt;
&lt;td&gt;1,510.2&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A few things jump out immediately. Devstral finished faster than every other model, cloud or local. Qwen's tokens-per-second number is absurd. And DeepSeek R1 produced the most tokens despite writing roughly the same amount of code as (more on why in a minute).&lt;/p&gt;

&lt;h2&gt;
  
  
  The Results: Quality
&lt;/h2&gt;

&lt;p&gt;Performance doesn't matter if the code is wrong. Here's how each model scored:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Syntax Valid&lt;/th&gt;
&lt;th&gt;Features (X/10)&lt;/th&gt;
&lt;th&gt;Functional (X/7)&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Sonnet 4.6&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;10/10&lt;/td&gt;
&lt;td&gt;7/7&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Opus 4.6&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;10/10&lt;/td&gt;
&lt;td&gt;7/7&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Devstral&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;10/10&lt;/td&gt;
&lt;td&gt;7/7&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codestral 22B&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;10/10&lt;/td&gt;
&lt;td&gt;0/7&lt;/td&gt;
&lt;td&gt;60&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek R1&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;10/10&lt;/td&gt;
&lt;td&gt;0/7&lt;/td&gt;
&lt;td&gt;60&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 3.5B&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;7/10&lt;/td&gt;
&lt;td&gt;0/7&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Three perfect scores. Two models that wrote valid code that didn't pass functional tests. One that didn't even produce valid Python.&lt;/p&gt;

&lt;p&gt;Let's talk about what happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Went Wrong (and Right)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Interactive Menu Problem
&lt;/h3&gt;

&lt;p&gt;Codestral 22B and DeepSeek R1 both scored 10/10 on features. Their code had SQLite, all four CRUD operations, timestamps, completion tracking, error handling, a main block, and pretty output. On paper, they nailed it.&lt;/p&gt;

&lt;p&gt;The problem: both interpreted "Commands: add, list, complete, delete" as an interactive menu application. They built &lt;code&gt;while True&lt;/code&gt; loops with &lt;code&gt;input()&lt;/code&gt; prompts instead of CLI argument parsers.&lt;/p&gt;

&lt;p&gt;Codestral's approach:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;action&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Enter a command (add, list, complete, delete): &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;DeepSeek R1 went even further, building an entire menu system:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Todo App Menu:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;               ---&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;add - Add a new todo      &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;list - List all todos     &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;cmd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Enter command: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both are perfectly valid interpretations of "commands." Both produced clean, working code. But our automated test suite calls the script with command-line arguments (&lt;code&gt;python todo.py add "Buy groceries"&lt;/code&gt;), not interactive input. The scripts immediately hit &lt;code&gt;EOFError: EOF when reading a line&lt;/code&gt; because there's no stdin to read from.&lt;/p&gt;

&lt;p&gt;This is arguably a prompt clarity issue, not a model quality issue. If the prompt had said "using argparse" or "using sys.argv," both models would have nailed it. But the three models that scored 100 all inferred CLI arguments without being told, which is the more common pattern for "command-line app" in the training data.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Token Limit Trap
&lt;/h3&gt;

&lt;p&gt;Qwen 3.5B is fascinating and frustrating in equal measure.&lt;/p&gt;

&lt;p&gt;That 1,510 tokens-per-second number is real. The model uses a Mixture of Experts (MoE) architecture: 35 billion total parameters, but only ~3 billion active per token. The RTX 5090 tears through it. In pure generation speed, nothing else comes close.&lt;/p&gt;

&lt;p&gt;But it hit the 4,096 output token limit mid-f-string:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ID&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Status&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Created At&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;70&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;id_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;completed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;
            &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[X]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;completed&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[ ]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. The code cuts off right there. No closing quote, no remaining functions, no main block. The syntax is invalid. The features for &lt;code&gt;complete&lt;/code&gt;, &lt;code&gt;delete&lt;/code&gt;, and &lt;code&gt;__main__&lt;/code&gt; are missing because the model never got to write them.&lt;/p&gt;

&lt;p&gt;The speed is meaningless if the output is incomplete. The lesson: always set generous &lt;code&gt;max_tokens&lt;/code&gt; for code generation tasks. A 4,096 limit that's fine for chat responses will absolutely truncate a complete program. We should have set 8,192 or higher. That's on us.&lt;/p&gt;

&lt;h3&gt;
  
  
  DeepSeek R1's Thinking Tax
&lt;/h3&gt;

&lt;p&gt;DeepSeek R1 produced 1,707 output tokens, the most of any model, but its actual code was only 156 lines. Where did the extra tokens go?&lt;/p&gt;

&lt;p&gt;Into &lt;code&gt;&amp;lt;think&amp;gt;&lt;/code&gt; blocks. DeepSeek R1 is a reasoning model. Before writing code, it spends tokens working through the problem:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Let me think about how to structure this... I need SQLite for persistence... I'll use a class-based approach with a menu system..."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is genuinely useful for hard debugging problems or complex architectural decisions. But for straightforward code generation where the answer is obvious, it's wasted compute. You're paying (in time and tokens) for the model to reason through something it could just write directly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Devstral's Quiet Dominance
&lt;/h3&gt;

&lt;p&gt;The standout result of the entire benchmark. Devstral is a 24B parameter model from Mistral, purpose-built for coding tasks. On paper it's smaller than some of the competition. In practice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fastest total time&lt;/strong&gt;: 10.26 seconds, beating even the cloud models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best local TTFT&lt;/strong&gt;: 2.24 seconds, nearly as fast as cloud cold-start&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Perfect score&lt;/strong&gt;: 100/100 on quality&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clean architecture&lt;/strong&gt;: argparse-based CLI, exactly what the test expected&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It didn't overthink it. It didn't build a menu system. It didn't run out of tokens. It just wrote a clean, correct, well-structured todo app and moved on.&lt;/p&gt;

&lt;h2&gt;
  
  
  Code Comparison: The Three Perfect Scores
&lt;/h2&gt;

&lt;p&gt;All three 100-score models (Sonnet, Opus, Devstral) used argument-based CLI patterns, but their implementations differ in interesting ways.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sonnet 4.6&lt;/strong&gt; went with &lt;code&gt;argparse&lt;/code&gt; and a class-based design. 149 lines. Full docstrings, type hints, and emoji-rich output with status indicators:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TodoApp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;todos.db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db_path&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;init_database&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_todo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# ...
&lt;/span&gt;        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✅ Added todo #&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;todo_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Opus 4.6&lt;/strong&gt; also used a class with &lt;code&gt;sys.argv&lt;/code&gt; parsing instead of argparse. 157 lines. It used &lt;code&gt;sqlite3.Row&lt;/code&gt; for named column access and a manual usage printer. More defensive with explicit connection closing in a &lt;code&gt;finally&lt;/code&gt; block:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TodoApp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;todos.db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;row_factory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Row&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_table&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Devstral&lt;/strong&gt; took the most minimal approach. 99 lines. Flat functions instead of a class, &lt;code&gt;argparse&lt;/code&gt; with subparsers, &lt;code&gt;CURRENT_TIMESTAMP&lt;/code&gt; in SQL instead of Python-side datetime generation. No emoji, no decorations, just clean output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;list_todos&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="c1"&gt;# ...
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[x]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[ ]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (created at: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The style differences are telling. Sonnet writes like a senior developer who cares about DX. Opus writes like someone who's been burned by unclosed connections before. Devstral writes like someone who bills by the hour and values conciseness. All three approaches are correct. Devstral's is the one I'd actually want in a codebase.&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;argparse&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Tuple&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TodoApp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;todos.db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db_path&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;init_database&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;init_database&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Initialize the SQLite database and create the todos table if it doesn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t exist.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
            CREATE TABLE IF NOT EXISTS todos (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                title TEXT NOT NULL,
                created_at TEXT NOT NULL,
                completed INTEGER NOT NULL DEFAULT 0
            )
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_todo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Add a new todo item.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;❌ Error: Todo title cannot be empty&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;

        &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%Y-%m-%d %H:%M:%S&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INSERT INTO todos (title, created_at, completed) VALUES (?, ?, 0)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;todo_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lastrowid&lt;/span&gt;
        &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✅ Added todo #&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;todo_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;list_todos&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;List all todo items.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT id, title, created_at, completed FROM todos ORDER BY id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;todos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;todos&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;📝 No todos found. Add some with: python todo.py add &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;Your task&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;

        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;📋 Your Todos:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;todo_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;completed&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;todos&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✓&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;completed&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;○&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="n"&gt;status_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DONE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;completed&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TODO&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; #&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;todo_id&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; [&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;status_text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;    📅 Created: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Total: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;todos&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; todos&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;complete_todo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;todo_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Mark a todo as completed.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT id, title, completed FROM todos WHERE id = ?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;todo_id&lt;/span&gt;&lt;span class="p"&gt;,))&lt;/span&gt;
        &lt;span class="n"&gt;todo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchone&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;todo&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;❌ Error: Todo #&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;todo_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; not found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;todo&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ℹ️  Todo #&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;todo_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; is already completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;

        &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UPDATE todos SET completed = 1 WHERE id = ?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;todo_id&lt;/span&gt;&lt;span class="p"&gt;,))&lt;/span&gt;
        &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✅ Completed todo #&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;todo_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;todo&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;delete_todo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;todo_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Delete a todo item.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT id, title FROM todos WHERE id = ?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;todo_id&lt;/span&gt;&lt;span class="p"&gt;,))&lt;/span&gt;
        &lt;span class="n"&gt;todo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchone&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;todo&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;❌ Error: Todo #&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;todo_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; not found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;

        &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DELETE FROM todos WHERE id = ?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;todo_id&lt;/span&gt;&lt;span class="p"&gt;,))&lt;/span&gt;
        &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;🗑️  Deleted todo #&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;todo_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;todo&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TodoApp&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;parser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;argparse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ArgumentParser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A simple command-line todo application&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;subparsers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_subparsers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dest&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;command&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;help&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Available commands&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;add_parser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subparsers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_parser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;add&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;help&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Add a new todo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;add_parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;help&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Todo title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;list_parser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subparsers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_parser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;list&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;help&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;List all todos&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;complete_parser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subparsers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_parser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;complete&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;help&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Mark a todo as completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;complete_parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;help&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Todo ID to complete&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;delete_parser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subparsers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_parser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;delete&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;help&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Delete a todo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;delete_parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;help&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Todo ID to delete&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse_args&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;add&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_todo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;list&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_todos&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;complete&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;complete_todo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;delete&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;delete_todo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;print_help&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;❌ Error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;







&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Tuple&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TodoApp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;todos.db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;row_factory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Row&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_table&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
            CREATE TABLE IF NOT EXISTS todos (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                title TEXT NOT NULL,
                created_at TIMESTAMP NOT NULL,
                completed BOOLEAN NOT NULL DEFAULT 0
            )
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: Todo title cannot be empty&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INSERT INTO todos (title, created_at) VALUES (?, ?)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✓ Added todo: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: Failed to add todo - &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM todos ORDER BY created_at DESC&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;todos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;todos&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No todos found. Add one with &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;add &amp;lt;title&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;'"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt;

            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TODO LIST&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;todo&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;todos&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✓&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;todo&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;○&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="n"&gt;created&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromisoformat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;todo&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;created_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%Y-%m-%d %H:%M&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; [&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;todo&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;todo&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;   Created: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;created&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: Failed to list todos - &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;todo_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;todo_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;todo_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: Invalid todo ID. Must be a number.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UPDATE todos SET completed = 1 WHERE id = ?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;todo_id&lt;/span&gt;&lt;span class="p"&gt;,)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rowcount&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: Todo with ID &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;todo_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; not found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✓ Completed todo &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;todo_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: Failed to complete todo - &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;todo_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;todo_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;todo_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: Invalid todo ID. Must be a number.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DELETE FROM todos WHERE id = ?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;todo_id&lt;/span&gt;&lt;span class="p"&gt;,)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rowcount&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: Todo with ID &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;todo_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; not found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✓ Deleted todo &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;todo_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: Failed to delete todo - &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;print_usage&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Usage:
    python todo.py add &amp;lt;title&amp;gt;      - Add a new todo
    python todo.py list             - List all todos
    python todo.py complete &amp;lt;id&amp;gt;    - Mark a todo as completed
    python todo.py delete &amp;lt;id&amp;gt;      - Delete a todo
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TodoApp&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print_usage&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;command&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;command&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;add&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: Please provide a todo title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Usage: python todo.py add &amp;lt;title&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:])&lt;/span&gt;
                &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;command&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;list&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;command&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;complete&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: Please provide a todo ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Usage: python todo.py complete &amp;lt;id&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;command&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;delete&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: Please provide a todo ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Usage: python todo.py delete &amp;lt;id&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: Unknown command &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;print_usage&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;







&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;argparse&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;

&lt;span class="n"&gt;DATABASE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;todos.db&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;init_db&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DATABASE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'''&lt;/span&gt;&lt;span class="s"&gt;CREATE TABLE IF NOT EXISTS todos (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                title TEXT NOT NULL,
                created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
                completed BOOLEAN NOT NULL CHECK (completed IN (0, 1))
              )&lt;/span&gt;&lt;span class="sh"&gt;'''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_todo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DATABASE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INSERT INTO todos (title, completed) VALUES (?, ?)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;list_todos&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DATABASE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT id, title, created_at, completed FROM todos ORDER BY completed, created_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[x]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[ ]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (created at: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;complete_todo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DATABASE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UPDATE todos SET completed = 1 WHERE id = ?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,))&lt;/span&gt;
    &lt;span class="n"&gt;rows_affected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rowcount&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;rows_affected&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Todo with ID &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; not found.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;delete_todo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DATABASE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DELETE FROM todos WHERE id = ?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,))&lt;/span&gt;
    &lt;span class="n"&gt;rows_affected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rowcount&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;rows_affected&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Todo with ID &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; not found.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;argparse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ArgumentParser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;CLI todo application&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;subparsers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_subparsers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dest&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;command&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;add_parser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subparsers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_parser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;add&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;help&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Add a new todo&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;add_parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;help&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Todo title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;list_parser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subparsers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_parser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;list&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;help&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;List all todos&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;complete_parser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subparsers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_parser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;complete&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;help&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Mark a todo as completed&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;complete_parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;help&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;TODO ID to mark as completed&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;delete_parser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subparsers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_parser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;delete&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;help&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Delete a todo&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;delete_parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;help&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;TODO ID to delete&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse_args&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="nf"&gt;init_db&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;add&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;add_todo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;list&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;list_todos&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;complete&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;complete_todo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;delete&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;delete_todo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;print_help&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;h2&gt;
  
  
  Speed Analysis
&lt;/h2&gt;

&lt;p&gt;The performance numbers tell two very different stories depending on what you care about.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For interactive chat and streaming&lt;/strong&gt;, TTFT is what matters. Cloud models dominated here. Sonnet 4.6 started streaming in 0.87 seconds. Opus in 1.23 seconds. You ask a question, you immediately see output. That responsiveness is a big part of why cloud models feel fast even when their total generation time is longer.&lt;/p&gt;

&lt;p&gt;Local models have a fundamentally different cost model. TTFT includes model loading time, and on first request after a cold start, that loading time is significant:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Devstral&lt;/strong&gt;: 2.24s TTFT (best local, model stays warm in VRAM)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DeepSeek R1&lt;/strong&gt;: 11.74s (14B params loading into VRAM)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Codestral 22B&lt;/strong&gt;: 15.81s (22B params, larger model footprint)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Qwen 3.5B&lt;/strong&gt;: 28.20s (35B total params, 23 GB model loading from disk into VRAM despite only 3B active)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Qwen's 28-second TTFT is brutal for interactive use. You type a prompt and wait half a minute before anything appears. The MoE architecture means the full model weight file is enormous even though inference is fast once loaded.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For batch processing and code generation&lt;/strong&gt;, total time and throughput matter more than TTFT. And here, the picture flips. Devstral at 10.26 seconds total beat both cloud models. Once the local models are loaded and generating, their token throughput is competitive:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Tok/s&lt;/th&gt;
&lt;th&gt;Context&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 3.5B&lt;/td&gt;
&lt;td&gt;1,510.2&lt;/td&gt;
&lt;td&gt;MoE architecture, 3B active params&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek R1&lt;/td&gt;
&lt;td&gt;191.7&lt;/td&gt;
&lt;td&gt;Includes reasoning tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sonnet 4.6&lt;/td&gt;
&lt;td&gt;104.2&lt;/td&gt;
&lt;td&gt;Cloud, shared infrastructure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codestral 22B&lt;/td&gt;
&lt;td&gt;98.5&lt;/td&gt;
&lt;td&gt;Full 22B model on single GPU&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Devstral&lt;/td&gt;
&lt;td&gt;90.2&lt;/td&gt;
&lt;td&gt;24B model, balanced speed/quality&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Opus 4.6&lt;/td&gt;
&lt;td&gt;74.3&lt;/td&gt;
&lt;td&gt;Cloud, larger model&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Devstral found the sweet spot: fast enough TTFT to feel responsive, fast enough generation to beat the cloud on wall-clock time, and high enough quality to score perfectly. It's the model that made me stop thinking of local inference as a compromise.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Verdict
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;For production coding tasks&lt;/strong&gt;: Sonnet 4.6 or Devstral. Sonnet if you're already in the Anthropic ecosystem and want sub-second TTFT. Devstral if you want the same quality with zero API costs, zero rate limits, and total data privacy. Both scored 100. Devstral was actually faster end-to-end.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Opus 4.6&lt;/strong&gt; is capable but slower and more expensive for no quality gain on this task. Its strengths show on harder problems: multi-file refactors, complex debugging, architectural decisions. For straightforward code generation, you're paying a premium for capability you don't need.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Codestral 22B and DeepSeek R1&lt;/strong&gt; aren't bad models. They wrote valid, working code. The "failure" was a prompt interpretation issue that a single clarifying word would have fixed. In a conversational coding session where you can follow up, both would have corrected course immediately.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Qwen 3.5B&lt;/strong&gt; is a speed demon trapped by token limits. At 1,510 tok/s it's the fastest generator by an order of magnitude, but that speed is wasted if you cap output too low. With proper &lt;code&gt;max_tokens&lt;/code&gt; settings and the right tasks (short functions, completions, refactors), it could be the best option for high-throughput local work. We'll retest with higher limits.&lt;/p&gt;

&lt;p&gt;The real takeaway isn't about which model "won." It's that &lt;strong&gt;the prompt matters as much as the model&lt;/strong&gt;. Two models scored 60 because of a single ambiguous word in the prompt. One model scored 28 because of a configuration parameter. The gap between cloud and local quality has effectively closed for focused coding tasks. The remaining differences are in speed characteristics, token economics, and how forgiving the model is when your prompt isn't perfectly specific.&lt;/p&gt;

&lt;p&gt;Local LLMs on consumer hardware aren't a compromise anymore. They're a legitimate option. Devstral proved it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next: Round 2
&lt;/h2&gt;

&lt;p&gt;This was a useful first benchmark, but it was also a simple one. A single-file todo app with a clear spec is the kind of task where every model should do well. The interesting question is what happens when you make it harder.&lt;/p&gt;

&lt;p&gt;Round 2 will use a more complex task: multi-file, with tests, with ambiguous requirements that force the model to make architectural decisions. We'll also adjust based on what we learned here. The prompt will be more explicit (no more "CLI or web" ambiguity that tripped up two models), and we'll give every model a larger context window and higher token limits so no one gets cut off mid-line.&lt;/p&gt;

&lt;p&gt;We're also adding two models that came as requests from the Coder dev team:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Kimi K2.6&lt;/strong&gt; (Moonshot AI). A 1T-parameter MoE model with 32B active parameters and 256K context. It's getting strong benchmark scores and has native tool-calling support. The catch: even the most aggressively quantized version needs ~240 GB of memory, which is well beyond what the homelab can handle locally. We'll need to test this one via API.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemma 4&lt;/strong&gt; (Google). We need to research the available sizes and quantizations to see what fits on 32 GB of VRAM. If there's a version in the 14B-27B range, it could slot in alongside Devstral and Qwen as another local contender.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both additions will be interesting tests of whether the "local models are good enough" conclusion holds with a harder prompt, and whether the newer model generation has closed the gap further.&lt;/p&gt;

&lt;h2&gt;
  
  
  By the Numbers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;6 models benchmarked&lt;/li&gt;
&lt;li&gt;1 prompt, identical across all models&lt;/li&gt;
&lt;li&gt;3 perfect scores (Sonnet 4.6, Opus 4.6, Devstral)&lt;/li&gt;
&lt;li&gt;2 models that built the wrong kind of app&lt;/li&gt;
&lt;li&gt;1 model that ran out of tokens mid-f-string&lt;/li&gt;
&lt;li&gt;10.26 seconds for Devstral to write a complete, working todo app&lt;/li&gt;
&lt;li&gt;1,510 tokens per second from Qwen 3.5B (fastest local generation)&lt;/li&gt;
&lt;li&gt;0.87 seconds for Sonnet 4.6's first token (fastest TTFT)&lt;/li&gt;
&lt;li&gt;28.2 seconds for Qwen's first token (slowest TTFT)&lt;/li&gt;
&lt;li&gt;4,096 token limit that killed an otherwise promising run&lt;/li&gt;
&lt;li&gt;32 GB of VRAM making all of this possible on a single GPU&lt;/li&gt;
&lt;li&gt;0 API costs for the three local models&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>benchmark</category>
      <category>homelab</category>
    </item>
    <item>
      <title>Model Showdown Round 2: Adding Gemma, Kimi, and 579 GB of Stubborn Optimism</title>
      <dc:creator>Rob</dc:creator>
      <pubDate>Thu, 07 May 2026 23:28:23 +0000</pubDate>
      <link>https://dev.to/carryologist/model-showdown-round-2-adding-gemma-kimi-and-579-gb-of-stubborn-optimism-40jp</link>
      <guid>https://dev.to/carryologist/model-showdown-round-2-adding-gemma-kimi-and-579-gb-of-stubborn-optimism-40jp</guid>
      <description>&lt;p&gt;At the end of Round 1, we promised a rematch. More models. Fixed settings. Harder questions about what "local inference" really means when you push past what fits in VRAM.&lt;/p&gt;

&lt;p&gt;This is that rematch.&lt;/p&gt;

&lt;p&gt;We added two models that the Coder dev team specifically requested: &lt;strong&gt;Gemma 4&lt;/strong&gt; from Google (27B parameters, fits comfortably on the RTX 5090) and &lt;strong&gt;Kimi K2&lt;/strong&gt; from Moonshot AI (1 trillion parameters, does not fit in anything reasonable). We also reran every model from Round 1 with fixes for the configuration issues that tripped up three of them.&lt;/p&gt;

&lt;p&gt;The results changed the leaderboard significantly.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Fixed from Round 1
&lt;/h2&gt;

&lt;p&gt;Round 1 had three avoidable failures:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Qwen hit the token limit&lt;/strong&gt; — scored 28/100 because the output was capped at 4,096 tokens and the code got truncated mid-f-string. The model was generating at 1,510 tok/s. It wasn't slow. We just cut it off.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Codestral and DeepSeek built interactive menus&lt;/strong&gt; — both interpreted "commands: add, list, complete, delete" as &lt;code&gt;while True: input()&lt;/code&gt; loops instead of CLI argument parsers. The code worked perfectly if you used it interactively. Our automated test suite couldn't.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Context windows varied&lt;/strong&gt; — each model had different settings, making the comparison uneven.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For Round 2:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setting&lt;/th&gt;
&lt;th&gt;Round 1&lt;/th&gt;
&lt;th&gt;Round 2&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;num_predict&lt;/code&gt; (max output tokens)&lt;/td&gt;
&lt;td&gt;4,096&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;16,384&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;num_ctx&lt;/code&gt; (context window)&lt;/td&gt;
&lt;td&gt;Varied&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;16,384 for all&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompt clarity&lt;/td&gt;
&lt;td&gt;"Commands: add, list, complete, delete"&lt;/td&gt;
&lt;td&gt;"using argparse or sys.argv, &lt;strong&gt;NOT interactive input&lt;/strong&gt;"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model management&lt;/td&gt;
&lt;td&gt;Random loading&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Auto-unload previous, preload next&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Same prompt. Same task. Same validation. Just fair settings this time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adding Gemma 4
&lt;/h2&gt;

&lt;p&gt;Google released Gemma 4 while we were writing the Round 1 results. The 27B parameter model downloads as a 9.6 GB file through Ollama — the smallest of our serious contenders.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull gemma4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Model pulled, loaded onto the 5090 in seconds, registered in Coder's admin panel as another OpenAI-compatible model on the existing Ollama provider. The entire setup was one command and two form fields.&lt;/p&gt;

&lt;p&gt;After Round 1's configuration adventure with five different models, this felt almost anticlimactic. In the best possible way.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adding Kimi K2 (The Hard Way)
&lt;/h2&gt;

&lt;p&gt;Kimi K2 is a different story entirely.&lt;/p&gt;

&lt;p&gt;The numbers: 1 trillion total parameters, 32 billion active per token (Mixture of Experts architecture), 256K context window. The quantized model (Q4_K_M) is &lt;strong&gt;579 GB across 13 shard files&lt;/strong&gt;. Our RTX 5090 has 32 GB of VRAM.&lt;/p&gt;

&lt;p&gt;We knew this going in. Round 1's post explicitly said Kimi would need API testing because it's too large for local. But this blog is about pushing boundaries with consumer hardware, and "it probably won't work" isn't a reason not to try. It's the reason &lt;em&gt;to&lt;/em&gt; try.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Getting llama.cpp Built
&lt;/h3&gt;

&lt;p&gt;Ollama doesn't offer Kimi K2 for local inference — only a cloud-hosted variant. So we went to llama.cpp, the C++ inference engine that supports loading models larger than VRAM via memory-mapped NVMe offloading.&lt;/p&gt;

&lt;p&gt;Building it required installing half of Ubuntu's dev toolchain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; cmake build-essential nvidia-cuda-toolkit
&lt;span class="nb"&gt;cd&lt;/span&gt; ~ &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; git clone https://github.com/ggerganov/llama.cpp.git
&lt;span class="nb"&gt;cd &lt;/span&gt;llama.cpp
cmake &lt;span class="nt"&gt;-B&lt;/span&gt; build &lt;span class="nt"&gt;-DGGML_CUDA&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ON
cmake &lt;span class="nt"&gt;--build&lt;/span&gt; build &lt;span class="nt"&gt;--config&lt;/span&gt; Release &lt;span class="nt"&gt;-j&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;nproc&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;First roadblock&lt;/strong&gt;: &lt;code&gt;cmake&lt;/code&gt; wasn't installed. Fixed with apt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second roadblock&lt;/strong&gt;: CUDA toolkit not found. Fixed with &lt;code&gt;nvidia-cuda-toolkit&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Third roadblock&lt;/strong&gt;: &lt;code&gt;nvcc fatal: Unsupported gpu architecture 'compute_120a'&lt;/code&gt;. The RTX 5090 is Blackwell architecture (compute 12.0), but Ubuntu's apt CUDA toolkit is version 12.0 — too old to know about it. The fix was targeting an older compatible architecture:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cmake &lt;span class="nt"&gt;-B&lt;/span&gt; build &lt;span class="nt"&gt;-DGGML_CUDA&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ON &lt;span class="nt"&gt;-DCMAKE_CUDA_ARCHITECTURES&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;89
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Compute capability 89 (Ada Lovelace) runs fine on the 5090 via backward compatibility. Not ideal, but it builds.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Downloading 579 GB
&lt;/h3&gt;

&lt;p&gt;Next: the Hugging Face CLI. Which required pip. Which was externally managed. Which required &lt;code&gt;--break-system-packages&lt;/code&gt;. Which installed but wasn't on PATH. Which turned out to be deprecated in favor of the &lt;code&gt;hf&lt;/code&gt; CLI. Which required &lt;code&gt;python3.12-venv&lt;/code&gt;. Which left behind a broken virtual environment that needed manual cleanup.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; python3-pip python3.12-venv
pip &lt;span class="nb"&gt;install &lt;/span&gt;huggingface-hub[cli] &lt;span class="nt"&gt;--break-system-packages&lt;/span&gt;
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; ~/.hf-cli
curl &lt;span class="nt"&gt;-LsSf&lt;/span&gt; https://hf.co/cli/install.sh | bash
&lt;span class="nb"&gt;source&lt;/span&gt; ~/.bashrc
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then the actual download:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;~/.local/bin/hf download unsloth/Kimi-K2-Instruct-GGUF &lt;span class="nt"&gt;--include&lt;/span&gt; &lt;span class="s2"&gt;"*Q4_K_M*"&lt;/span&gt; &lt;span class="nt"&gt;--local-dir&lt;/span&gt; ~/models/kimi-k2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The download started reporting 384 GB, then revised upward to 432 GB, then 481 GB, then settled at &lt;strong&gt;529 GB&lt;/strong&gt;. The HF CLI discovers shards progressively — it didn't know the full file list upfront.&lt;/p&gt;

&lt;p&gt;&lt;a href="/images/model-showdown-round-2/kimi-k2-download-progress.png" class="article-body-image-wrapper"&gt;&lt;img src="/images/model-showdown-round-2/kimi-k2-download-progress.png" alt="Terminal showing Kimi K2 download progress at 327 GB of 432 GB with multiple shard progress bars"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Kimi K2 mid-download — 327 GB down, revising the total upward as new shards are discovered.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3 hours and 27 minutes later&lt;/strong&gt;, 13 shard files totaling 579 GB sat on the NVMe. At ~370 Mbps sustained throughput.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 3: The VRAM Math
&lt;/h3&gt;

&lt;p&gt;First attempt: 10 GPU layers. Tried to allocate 94 GB on a 32 GB card. Dead.&lt;/p&gt;

&lt;p&gt;The math: 94 GB / 10 layers ≈ 9.4 GB per layer. With 32 GB of VRAM, that's roughly 3 layers maximum. MoE architectures make each layer massive because every expert's weights live in the same layer.&lt;/p&gt;

&lt;p&gt;We settled on &lt;strong&gt;2 GPU layers&lt;/strong&gt; (confirmed working, 3 was borderline). That means ~18 GB on the GPU, the remaining ~560 GB paging from NVMe via memory-mapped I/O. The OS's virtual memory system handles the page faults — when inference needs weights that aren't in RAM, it reads them from the NVMe on demand.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 4: The Conversation Mode Bug
&lt;/h3&gt;

&lt;p&gt;Here's where it got interesting. llama.cpp's &lt;code&gt;llama-cli&lt;/code&gt; has a &lt;code&gt;--no-conversation&lt;/code&gt; flag that's supposed to run a single prompt and exit. It doesn't work. Every run dropped into an interactive &lt;code&gt;&amp;gt;&lt;/code&gt; prompt, waiting for input. Our benchmark script would hang indefinitely.&lt;/p&gt;

&lt;p&gt;We tried:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;--no-conversation&lt;/code&gt; flag (ignored)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--no-display-prompt&lt;/code&gt; flag (still conversational)&lt;/li&gt;
&lt;li&gt;Piping prompt via &lt;code&gt;-p&lt;/code&gt; with &lt;code&gt;-e&lt;/code&gt; flag (still conversational)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="/images/model-showdown-round-2/kimi-k2-conversation-mode-bug.png" class="article-body-image-wrapper"&gt;&lt;img src="/images/model-showdown-round-2/kimi-k2-conversation-mode-bug.png" alt="Terminal showing llama-cli loading Kimi K2 with conversation flag and dropping into interactive prompt"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;llama-cli ignoring --no-conversation and dropping into an interactive prompt, hanging the benchmark script.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Three benchmark attempts. Three hangs. The script captured zero timing data from Kimi because it was waiting for a conversation that would never end.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 5: The Fix — llama-server
&lt;/h3&gt;

&lt;p&gt;Instead of fighting the CLI, we ditched it. llama.cpp ships with &lt;code&gt;llama-server&lt;/code&gt;, which exposes an &lt;strong&gt;OpenAI-compatible HTTP API&lt;/strong&gt; — the exact same interface Ollama uses. We wrote a standalone benchmark script that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Starts &lt;code&gt;llama-server&lt;/code&gt; as a background process&lt;/li&gt;
&lt;li&gt;Polls &lt;code&gt;/health&lt;/code&gt; until the 579 GB model finishes loading&lt;/li&gt;
&lt;li&gt;Sends the benchmark prompt to &lt;code&gt;/v1/chat/completions&lt;/code&gt; with streaming&lt;/li&gt;
&lt;li&gt;Captures every metric programmatically — TTFT, total time, tokens, tok/s&lt;/li&gt;
&lt;li&gt;Runs the full validation suite&lt;/li&gt;
&lt;li&gt;Shuts down the server&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No conversation mode. No stopwatch. No manual intervention.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;server_cmd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="n"&gt;LLAMA_SERVER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-m&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;MODEL_PATH&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--n-gpu-layers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;N_GPU_LAYERS&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--mmap&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-c&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CTX_SIZE&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--port&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PORT&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;server_proc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Popen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;server_cmd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...)&lt;/span&gt;

&lt;span class="c1"&gt;# Wait for 579 GB to load into memory
&lt;/span&gt;&lt;span class="nf"&gt;wait_for_server&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PORT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;900&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Hit the same API as Ollama
&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://127.0.0.1:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;PORT&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/v1/chat/completions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It worked on the first try. The model loaded in &lt;strong&gt;375 seconds&lt;/strong&gt; (6.3 minutes), then generation began.&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Kimi K2 generating code at 0.6 tokens per second. Every character is paging through 579 GB of weights on an NVMe drive.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Results: Performance
&lt;/h2&gt;

&lt;p&gt;&lt;a href="/images/model-showdown-round-2/benchmark-results-terminal.png" class="article-body-image-wrapper"&gt;&lt;img src="/images/model-showdown-round-2/benchmark-results-terminal.png" alt="Terminal output showing Qwen benchmark results completing and Kimi K2 benchmark starting"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Raw benchmark output — Qwen finishing its run and Kimi K2 kicking off next.&lt;/em&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;TTFT&lt;/th&gt;
&lt;th&gt;Total Time&lt;/th&gt;
&lt;th&gt;Output Tokens&lt;/th&gt;
&lt;th&gt;Tok/s&lt;/th&gt;
&lt;th&gt;Lines&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Codestral 22B&lt;/td&gt;
&lt;td&gt;1.75s&lt;/td&gt;
&lt;td&gt;10.01s&lt;/td&gt;
&lt;td&gt;826&lt;/td&gt;
&lt;td&gt;82.5&lt;/td&gt;
&lt;td&gt;80&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Devstral 24B&lt;/td&gt;
&lt;td&gt;2.11s&lt;/td&gt;
&lt;td&gt;9.97s&lt;/td&gt;
&lt;td&gt;703&lt;/td&gt;
&lt;td&gt;70.5&lt;/td&gt;
&lt;td&gt;98&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemma 4 27B&lt;/td&gt;
&lt;td&gt;3.92s&lt;/td&gt;
&lt;td&gt;11.77s&lt;/td&gt;
&lt;td&gt;1,966&lt;/td&gt;
&lt;td&gt;167.1&lt;/td&gt;
&lt;td&gt;171&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek R1 14B&lt;/td&gt;
&lt;td&gt;7.21s&lt;/td&gt;
&lt;td&gt;12.44s&lt;/td&gt;
&lt;td&gt;1,451&lt;/td&gt;
&lt;td&gt;116.7&lt;/td&gt;
&lt;td&gt;84&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 3.5 MoE 35B&lt;/td&gt;
&lt;td&gt;27.00s&lt;/td&gt;
&lt;td&gt;35.23s&lt;/td&gt;
&lt;td&gt;5,020&lt;/td&gt;
&lt;td&gt;142.5&lt;/td&gt;
&lt;td&gt;144&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kimi K2 1T&lt;/td&gt;
&lt;td&gt;68.90s&lt;/td&gt;
&lt;td&gt;1,140.94s&lt;/td&gt;
&lt;td&gt;686&lt;/td&gt;
&lt;td&gt;0.6&lt;/td&gt;
&lt;td&gt;87&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Gemma 4 immediately stands out. 167 tok/s is the fastest generation speed of any model we've tested that also scored perfectly — faster than Sonnet 4.6's 104 tok/s from Round 1. It wrote 1,966 tokens (171 lines) in under 12 seconds.&lt;/p&gt;

&lt;p&gt;Devstral remains the wall-clock champion at 9.97 seconds total, though Codestral edges it on TTFT (1.75s vs 2.11s).&lt;/p&gt;

&lt;p&gt;Kimi K2 is in a different universe. 68.9 seconds before the first token appeared (that's prompt evaluation at 1.5 tok/s across 579 GB of weights). Then 19 minutes of generation at 0.6 tok/s. Total wall clock including model load: &lt;strong&gt;25 minutes&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Results: Quality
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Syntax&lt;/th&gt;
&lt;th&gt;Features (X/10)&lt;/th&gt;
&lt;th&gt;Functional (X/7)&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Gemma 4 27B&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;10/10&lt;/td&gt;
&lt;td&gt;7/7&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;100&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Devstral 24B&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;10/10&lt;/td&gt;
&lt;td&gt;7/7&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;100&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek R1 14B&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;10/10&lt;/td&gt;
&lt;td&gt;7/7&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;100&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 3.5 MoE 35B&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;10/10&lt;/td&gt;
&lt;td&gt;7/7&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;100&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codestral 22B&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;9/10&lt;/td&gt;
&lt;td&gt;7/7&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;94&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kimi K2 1T&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;10/10&lt;/td&gt;
&lt;td&gt;6/7&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;94&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Five out of six models scored 100.&lt;/strong&gt; That's up from three in Round 1.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Fixes Worked
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Qwen: 28 → 100.&lt;/strong&gt; With the token limit raised from 4,096 to 16,384, Qwen wrote 5,020 tokens — a complete 144-line program with ANSI color codes, proper error handling, and clean argparse subparsers. The speed is still absurd (142.5 tok/s with a 27-second cold start), but now it finishes what it starts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DeepSeek R1: 60 → 100.&lt;/strong&gt; The clarified prompt ("NOT interactive input") worked. DeepSeek built an argparse-based CLI with proper subparsers, colorama integration, and structured error handling. It still uses &lt;code&gt;&amp;lt;think&amp;gt;&lt;/code&gt; blocks (the 1,451 tokens include reasoning), but the final code is correct.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Codestral: 60 → 94.&lt;/strong&gt; Also switched to argparse, passing all 7 functional tests. But it missed error handling entirely — no &lt;code&gt;try/except&lt;/code&gt; blocks, no input validation. Its &lt;code&gt;complete&lt;/code&gt; command also silently deletes the record instead of marking it done. Functional but sloppy.&lt;/p&gt;

&lt;h3&gt;
  
  
  The New Models
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Gemma 4&lt;/strong&gt; wrote the most polished code of any model in either round. 171 lines with a dedicated &lt;code&gt;Colors&lt;/code&gt; class for ANSI escape codes, emoji status indicators (✅, ⏳, 🎉, 🗑️), full &lt;code&gt;try/except/finally&lt;/code&gt; blocks on every database operation, and a clean argparse architecture. It writes like a senior developer who actually cares about user experience.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kimi K2&lt;/strong&gt; wrote clean, minimal code — 87 lines with &lt;code&gt;with&lt;/code&gt; context managers for database connections (the most Pythonic approach of any model), proper &lt;code&gt;sys.exit(1)&lt;/code&gt; on errors, and a formatted table output. It scored 94 instead of 100 because one functional test failed: the delete command reported "Task 2 not found" due to the model storing its database at &lt;code&gt;~/.todo.db&lt;/code&gt; (a global path) instead of a relative path. Stale data from an earlier test run interfered. The code logic is correct — it's a test isolation issue, not a bug.&lt;/p&gt;

&lt;h2&gt;
  
  
  Style Comparison: How Each Model Writes
&lt;/h2&gt;

&lt;p&gt;The code style differences are telling:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemma 4&lt;/strong&gt; (171 lines): Enterprise polish. ANSI color class, emoji, docstrings on every function, defensive error handling everywhere. The code you'd put in a demo.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Qwen 3.5&lt;/strong&gt; (144 lines): Also polished — ANSI codes, structured table output, exit-on-error patterns. More defensive than Gemma but less decorative.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Devstral&lt;/strong&gt; (98 lines): Minimal and correct. Flat functions, no class, CURRENT_TIMESTAMP in SQL. The code you'd actually ship.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kimi K2&lt;/strong&gt; (87 lines): Even more minimal. Context managers everywhere, zero waste. Reads like it was written by someone who's read a lot of production Python.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DeepSeek R1&lt;/strong&gt; (84 lines): Compact with colorama dependency — the only model that imported an external library. Risky in an isolated test environment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Codestral&lt;/strong&gt; (80 lines): The shortest, and it shows. No error handling, buggy complete command. Brevity at the cost of correctness.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Speed Tiers
&lt;/h2&gt;

&lt;p&gt;Round 2 reveals three distinct performance tiers for local inference:&lt;/p&gt;

&lt;h3&gt;
  
  
  Tier 1: VRAM-Native (~10-35 seconds)
&lt;/h3&gt;

&lt;p&gt;Models that fit entirely in the RTX 5090's 32 GB VRAM. Response times competitive with cloud APIs.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Total Time&lt;/th&gt;
&lt;th&gt;Tok/s&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Devstral 24B&lt;/td&gt;
&lt;td&gt;14 GB&lt;/td&gt;
&lt;td&gt;9.97s&lt;/td&gt;
&lt;td&gt;70.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codestral 22B&lt;/td&gt;
&lt;td&gt;12 GB&lt;/td&gt;
&lt;td&gt;10.01s&lt;/td&gt;
&lt;td&gt;82.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemma 4 27B&lt;/td&gt;
&lt;td&gt;9.6 GB&lt;/td&gt;
&lt;td&gt;11.77s&lt;/td&gt;
&lt;td&gt;167.1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek R1 14B&lt;/td&gt;
&lt;td&gt;9 GB&lt;/td&gt;
&lt;td&gt;12.44s&lt;/td&gt;
&lt;td&gt;116.7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 3.5 MoE 35B&lt;/td&gt;
&lt;td&gt;23 GB&lt;/td&gt;
&lt;td&gt;35.23s&lt;/td&gt;
&lt;td&gt;142.5&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Tier 2: NVMe-Offloaded (~19 minutes)
&lt;/h3&gt;

&lt;p&gt;Models too large for VRAM, paging from NVMe via mmap. Functional but glacial.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Total Time&lt;/th&gt;
&lt;th&gt;Tok/s&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Kimi K2 1T&lt;/td&gt;
&lt;td&gt;579 GB&lt;/td&gt;
&lt;td&gt;1,141s&lt;/td&gt;
&lt;td&gt;0.6&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The gap between tiers is &lt;strong&gt;~100x&lt;/strong&gt;. Gemma 4 at 167 tok/s vs Kimi K2 at 0.6 tok/s. Both wrote correct code. One took 12 seconds, the other took 19 minutes.&lt;/p&gt;

&lt;p&gt;This isn't a criticism of Kimi K2 — it's a 1 trillion parameter model running on hardware that costs less than a month of cloud API credits. The fact that it works at all is the story. The fact that it wrote correct, clean, well-structured code is the punchline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Round 1 vs Round 2: Combined Leaderboard
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Round&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Tok/s&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gemma 4 27B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;R2&lt;/td&gt;
&lt;td&gt;9.6 GB&lt;/td&gt;
&lt;td&gt;167.1&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sonnet 4.6&lt;/td&gt;
&lt;td&gt;R1&lt;/td&gt;
&lt;td&gt;Cloud&lt;/td&gt;
&lt;td&gt;104.2&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Devstral 24B&lt;/td&gt;
&lt;td&gt;R2&lt;/td&gt;
&lt;td&gt;14 GB&lt;/td&gt;
&lt;td&gt;70.5&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Opus 4.6&lt;/td&gt;
&lt;td&gt;R1&lt;/td&gt;
&lt;td&gt;Cloud&lt;/td&gt;
&lt;td&gt;74.3&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 3.5 MoE 35B&lt;/td&gt;
&lt;td&gt;R2&lt;/td&gt;
&lt;td&gt;23 GB&lt;/td&gt;
&lt;td&gt;142.5&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek R1 14B&lt;/td&gt;
&lt;td&gt;R2&lt;/td&gt;
&lt;td&gt;9 GB&lt;/td&gt;
&lt;td&gt;116.7&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codestral 22B&lt;/td&gt;
&lt;td&gt;R2&lt;/td&gt;
&lt;td&gt;12 GB&lt;/td&gt;
&lt;td&gt;82.5&lt;/td&gt;
&lt;td&gt;94&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kimi K2 1T&lt;/td&gt;
&lt;td&gt;R2&lt;/td&gt;
&lt;td&gt;579 GB&lt;/td&gt;
&lt;td&gt;0.6&lt;/td&gt;
&lt;td&gt;94&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Gemma 4 is now the fastest model with a perfect score — local or cloud. A 9.6 GB model running on consumer hardware, outperforming Anthropic's Sonnet 4.6 on raw throughput while matching it on code quality.&lt;/p&gt;

&lt;p&gt;The local-vs-cloud gap hasn't just closed. On this task, local won.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Configuration matters more than model selection.&lt;/strong&gt; Three models went from failing to perfect with two setting changes. If your local models are underperforming, check your token limits and prompt clarity before blaming the model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The prompt is still the variable.&lt;/strong&gt; Round 1's "ambiguous CLI" issue was a prompt problem, not a model problem. Six words ("NOT interactive input") fixed two models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;VRAM is the cliff.&lt;/strong&gt; The performance difference between "fits in VRAM" and "doesn't fit in VRAM" is 100x. There's no gradual degradation — you're either generating at 70-167 tok/s or you're at 0.6. If your model fits, you're competitive with cloud. If it doesn't, you're watching paint dry.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Big models can still write good code slowly.&lt;/strong&gt; Kimi K2 at 0.6 tok/s is impractical for interactive coding. But for batch processing, overnight code generation, or "I need an answer and I don't care when" use cases, a 1T model on consumer NVMe is a real option that didn't exist a year ago.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemma 4 is the new default.&lt;/strong&gt; Fastest throughput, perfect score, smallest download, most polished output. If you're running a homelab with a single GPU, it's the model to install first.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next: Gemma vs Opus — A Real Fight
&lt;/h2&gt;

&lt;p&gt;Round 1 tested a toy todo app. Round 2 fixed the settings and added models. Both rounds answered a useful question: can local models write correct code for a well-defined task?&lt;/p&gt;

&lt;p&gt;The answer is yes. Five out of six scored perfect. That question is settled.&lt;/p&gt;

&lt;p&gt;The next question is harder: &lt;strong&gt;can a local model replace my daily driver on a real task?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;My daily driver is Opus 4.6. It's what I use for everything on &lt;a href="https://vibescoder.dev" rel="noopener noreferrer"&gt;vibescoder.dev&lt;/a&gt; — features, refactors, debugging, the works. It's also a cloud model with per-token costs, rate limits, and a dependency on someone else's infrastructure.&lt;/p&gt;

&lt;p&gt;Gemma 4 just beat every model in the benchmark on speed and matched the best on quality. It runs locally on my 5090 at 167 tok/s with zero API costs. The obvious question: can it actually do the job?&lt;/p&gt;

&lt;p&gt;Round 3 will be a head-to-head. Gemma 4 vs Opus 4.6, same task, but not a toy. We're going to pick a real feature from the vibescoder.dev backlog — something that touches multiple files, requires architectural decisions, and has enough ambiguity to separate a good model from a great one. The kind of task I'd normally hand to Opus without thinking.&lt;/p&gt;

&lt;p&gt;If Gemma holds up, local-first AI coding isn't just viable for benchmarks. It's viable for production.&lt;/p&gt;

&lt;h2&gt;
  
  
  By the Numbers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;6&lt;/strong&gt; local models benchmarked (up from 4 local + 2 cloud in Round 1)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5&lt;/strong&gt; perfect scores (up from 3)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;579 GB&lt;/strong&gt; downloaded over 3 hours 27 minutes for Kimi K2&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;375 seconds&lt;/strong&gt; to load 579 GB into memory-mapped NVMe&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;68.9 seconds&lt;/strong&gt; for Kimi K2's first token&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1,140 seconds&lt;/strong&gt; (19 minutes) for Kimi K2's total generation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;9.6 GB&lt;/strong&gt; for Gemma 4 — smallest model, highest score + speed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;167.1 tok/s&lt;/strong&gt; from Gemma 4 — fastest perfect-scoring model across both rounds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;0.6 tok/s&lt;/strong&gt; from Kimi K2 — slowest, but correct&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;16,384&lt;/strong&gt; token limit that saved Qwen from another truncation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2 GPU layers&lt;/strong&gt; out of ~60+ that fit in VRAM for Kimi K2&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3&lt;/strong&gt; Round 1 bugs fixed by configuration changes, not model changes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1&lt;/strong&gt; llama-cli conversation mode bug worked around with llama-server&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;0&lt;/strong&gt; API costs for everything&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>benchmark</category>
      <category>homelab</category>
    </item>
  </channel>
</rss>
