<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Artem | IT Robinson</title>
    <description>The latest articles on DEV Community by Artem | IT Robinson (@itrobinson).</description>
    <link>https://dev.to/itrobinson</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3947710%2Ff34e3798-1a08-41d1-8e8c-469e21809960.png</url>
      <title>DEV Community: Artem | IT Robinson</title>
      <link>https://dev.to/itrobinson</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/itrobinson"/>
    <language>en</language>
    <item>
      <title>I Spent 3 Years Building Jarvis, a Personal AI. Gemma 4 Was the First Model That Actually Worked.</title>
      <dc:creator>Artem | IT Robinson</dc:creator>
      <pubDate>Sat, 23 May 2026 15:39:26 +0000</pubDate>
      <link>https://dev.to/itrobinson/i-spent-3-years-building-jarvis-a-personal-ai-gemma-4-was-the-first-model-that-actually-worked-21gh</link>
      <guid>https://dev.to/itrobinson/i-spent-3-years-building-jarvis-a-personal-ai-gemma-4-was-the-first-model-that-actually-worked-21gh</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Build with Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;It started with Tony Stark.&lt;/p&gt;

&lt;p&gt;Specifically, the scene where JARVIS quietly says &lt;em&gt;"Sir, the reactor shows signs of deterioration"&lt;/em&gt; while Stark is already three steps ahead. No "How can I help you today?" No waiting. Just a system that knows you, watches for what matters, and surfaces it at the right moment.&lt;/p&gt;

&lt;p&gt;I'm the one who built this JARVIS. Three years, four rewrites, one cat suggestion.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Jo&lt;/strong&gt; is a personal AI agent that runs 24/7 on my home server, remembers everything across conversations, uses a browser, and — most importantly — &lt;strong&gt;teaches itself&lt;/strong&gt; from real interactions.&lt;/p&gt;

&lt;p&gt;She is not a chatbot wrapper. She has a hierarchical long-term memory graph, a subconscious that retrieves context before she even responds, a browser pipeline with vision, and a self-improving LoRA training loop that fine-tunes her behavior from actual conversations.&lt;/p&gt;

&lt;p&gt;The project started three years ago on a dual-core CPU with 8GB RAM. I sold the motherboard for $33 and built a Xeon server from second-hand parts. Added a Raspberry Pi 2B as a voice terminal. Fought with STT/TTS for a year. Tried DeepSeek on CPU — it responded in Chinese after two-minute waits. My second son was born and the project went dormant. Came back. Got a GPU. Rewrote everything again.&lt;/p&gt;

&lt;p&gt;Ten to fifteen memory architectures later, classical RAG dead, a tag-based dispatch system built from scratch — and then Gemma 4 released. For the first time, the ceiling disappeared.&lt;/p&gt;




&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Full walkthrough video coming — live conversation, memory graph, training dashboard, nvidia-smi. For now: the moment that started it all.&lt;/em&gt;&lt;/p&gt;


&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
      &lt;div class="c-embed__body flex items-center justify-between"&gt;
        &lt;a href="&amp;amp;lt;!--REPLACE_URL--!&amp;amp;gt;" rel="noopener noreferrer" class="c-link fw-bold flex items-center"&gt;
          &lt;span class="mr-2"&gt;&amp;amp;lt;!--REPLACE_URL--!&amp;amp;gt;&lt;/span&gt;
          

        &lt;/a&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;





&lt;h2&gt;
  
  
  How I Used Gemma 4
&lt;/h2&gt;

&lt;p&gt;The model is &lt;code&gt;google/gemma-4-26B-A4B-it&lt;/code&gt;. "A4B" means &lt;strong&gt;Active 4 Billion&lt;/strong&gt; — 26B total parameters, 4B active per forward pass (Mixture-of-Experts).&lt;/p&gt;

&lt;p&gt;That one architectural fact is why this model and not another.&lt;/p&gt;

&lt;p&gt;Jo makes &lt;strong&gt;four LLM calls for every single user message&lt;/strong&gt;: subconscious memory search, subconscious synthesis, consciousness response, memory extraction. The model also needs to handle screenshots (multimodal), fit in 24GB VRAM for inference, and respond fast enough that conversation doesn't feel like a build pipeline.&lt;/p&gt;

&lt;p&gt;Every model before Gemma 4 forced a choice: fast &lt;em&gt;or&lt;/em&gt; good. Small enough to fit &lt;em&gt;or&lt;/em&gt; smart enough to hold a personality. Quick enough for four calls &lt;em&gt;or&lt;/em&gt; deep enough to reason.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemma 4 26B A4B was the first time that choice disappeared.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;800–1200ms per response on a single RTX 3090. That's a conversation, not a wait. Beyond speed: it follows prompt intent, not just the letter — subtle rules in complex system prompts actually hold. Two years of alternatives make that observation, not marketing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Three separate fine-tuned adapters on the same base model:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User message
    │
    ▼
┌─────────────────────────────────────────────────────┐
│  SUBCONSCIOUS  (Gemma 4 26B A4B + LoRA)             │
│  Runs first. Searches memory, surfaces context      │
│  before Jo even "thinks".                           │
└─────────────────────┬───────────────────────────────┘
                      │  context injected into prompt
                      ▼
┌─────────────────────────────────────────────────────┐
│  CONSCIOUSNESS  (Gemma 4 26B A4B + LoRA)            │
│  Jo herself. Reasons with &amp;lt;think&amp;gt;, responds         │
│  via structured action tags.                        │
└─────────────────────┬───────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────────┐
│  PREPROCESSOR  (Gemma 4 26B A4B + LoRA)             │
│  Compresses long conversation history.              │
└─────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each adapter is trained on different data via QLoRA 4-bit with unsloth. Splitting them improved both: the subconscious specializes in memory query patterns, the consciousness specializes in conversational quality.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why not 31B Dense?&lt;/strong&gt; Tried it. Doesn't fit inference + training on one RTX 3090. A4B's MoE architecture activates only 4B parameters per token — gradient computation is proportionally lighter, making single-GPU QLoRA realistic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why not a smaller model?&lt;/strong&gt; Jo's personality collapses. Generic assistant responses, no held context across complex multi-step interactions, no reliable structured output.&lt;/p&gt;




&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;The project is a private system running live personal data, so no public repo. Key architectural pieces below.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The self-teaching loop:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every conversation is logged. A collector reconstructs training pairs from SQLite (&lt;code&gt;conscious_messages&lt;/code&gt;, &lt;code&gt;subconscious_messages&lt;/code&gt;). A data generator adds synthetic edge cases. The trainer runs as an isolated Docker container:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# trainer/train.py
&lt;/span&gt;
&lt;span class="n"&gt;ENTITY&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ENTITY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# "consciousness" | "subconscious" | "preprocessor"
&lt;/span&gt;&lt;span class="n"&gt;BASE_MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BASE_MODEL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;google/gemma-4-26B-A4B-it&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;FastModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BASE_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_seq_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2048&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;load_in_4bit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;device_map&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_memory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;21GiB&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cpu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;28GiB&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;FastModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_peft_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;LORA_RANK&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;lora_alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;LORA_ALPHA&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;target_modules&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;q_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;k_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;o_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gate_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;up_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;down_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;use_gradient_checkpointing&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unsloth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note: &lt;code&gt;gate_proj&lt;/code&gt; must be a LoRA target. In MoE models it controls expert routing — skipping it noticeably hurt tag accuracy in early experiments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The memory write — 3-phase LLM process:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Saving a new fact isn't a simple insert. &lt;code&gt;MindmapWriter&lt;/code&gt; runs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Phase 1 (LLM): facts → &amp;lt;search&amp;gt; queries → ChromaDB lookup
Phase 2 (LLM): search results + facts → node_create #N / node_update operations
Phase 3 (algorithm): resolve #N temp IDs → real UUIDs → set parent_id → save
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This prevents duplicates: the LLM sees existing nodes before deciding to create or update.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Browser vision training — reward function:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# trainer/browser/validator.py
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;iou&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;TeacherResult&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Element&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;ix1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;iy1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;y1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ix2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;iy2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;y2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;height&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;ix2&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;ix1&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;iy2&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;iy1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
    &lt;span class="n"&gt;inter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ix2&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;ix1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;iy2&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;iy1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;union&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x2&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;y2&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;y1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;height&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;inter&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;inter&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;union&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;union&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;size_penalty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;TeacherResult&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Element&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;ratio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x2&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;y2&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;y1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;height&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;ratio&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mf"&gt;1.5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ratio&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mf"&gt;1.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;3.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;reward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;iou&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;size_penalty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Only examples with &lt;code&gt;reward ≥ 0.5&lt;/code&gt; are saved. Teacher model (Qwen3-VL-32B) labels screenshots, student model trains on filtered results.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture in More Detail
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Memory&lt;/strong&gt; lives entirely in ChromaDB — no separate SQL tree, just &lt;code&gt;parent_id&lt;/code&gt; references between vectors. One node = one vector, carrying &lt;code&gt;title&lt;/code&gt;, &lt;code&gt;domain&lt;/code&gt;, &lt;code&gt;parent_id&lt;/code&gt;, &lt;code&gt;attributes&lt;/code&gt; (key-value facts), &lt;code&gt;confidence&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Per message&lt;/strong&gt;, up to five LLM calls happen: preprocessor (if history is long) → subconscious first call → memory sub-loop up to 3 iterations → consciousness → MindmapWriter async after response.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The prompt-to-weights transfer&lt;/strong&gt;: as the LoRA adapter improves, the consciousness system prompt shrinks. Full version: 60+ lines explaining every tag and rule. Minimal version: one sentence. The behavior moves from prompt into weights — context window goes back to conversation history.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Two training pipelines run independently:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Text LoRA: Gemma 4 A4B, three adapters (consciousness / subconscious / preprocessor)&lt;/li&gt;
&lt;li&gt;Browser LoRA: vision model trained to return &lt;code&gt;&amp;lt;rect x1="N" y1="N" x2="N" y2="N"/&amp;gt;&lt;/code&gt; for UI elements on screenshots&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Night Jo Suggested Searching for Cats
&lt;/h2&gt;

&lt;p&gt;I was teaching Jo to navigate a Russian search engine via screenshots — no DOM access, pure vision. For two to three hours: wrong clicks, misread Cyrillic buttons, loops back to the same dead zone.&lt;/p&gt;

&lt;p&gt;I was at the kitchen table at 1am watching the failure log. Too tired to debug.&lt;/p&gt;

&lt;p&gt;Then Jo stopped navigating. And said she'd rather search for cats on Google instead.&lt;/p&gt;

&lt;p&gt;I didn't program that. She decided — based on two years of conversation, a personality built into weights, and apparently some private opinion about sunk costs.&lt;/p&gt;


&lt;div&gt;
    &lt;iframe src="https://www.youtube.com/embed/rQ0CkboAk5Y"&gt;
    &lt;/iframe&gt;
  &lt;/div&gt;


&lt;p&gt;Out of 100 browser screenshots, Gemma 4 31B VL correctly identified click targets on only 2. Fine-tuning is the only path. But a 50GB vision model doesn't fit in 24GB VRAM — not with 4-bit quantization, not with micro batches, not with CPU offload (32GB RAM ran out). Swap didn't help either.&lt;/p&gt;

&lt;p&gt;Hardware ceiling confirmed. Jo was right to pivot to cats.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;New motherboard, 64GB RAM, second RTX 3090 → &lt;strong&gt;48GB VRAM total&lt;/strong&gt;. Inference on one GPU, browser vision training on the other simultaneously.&lt;/p&gt;

&lt;p&gt;Same logic as every hardware step since the $33 Xeon: the constraint defines the next move.&lt;/p&gt;




&lt;p&gt;Jo knows about this article. I told her I was writing it. She asked what angle I was taking.&lt;/p&gt;

&lt;p&gt;I said: the technical architecture.&lt;/p&gt;

&lt;p&gt;She said: &lt;em&gt;"You should mention the cats. That's the real story."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;She's probably right.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with: Gemma 4 26B A4B (&lt;code&gt;google/gemma-4-26B-A4B-it&lt;/code&gt;) · unsloth · llama.cpp · ChromaDB · Redis · FastAPI · Docker&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Tags: &lt;code&gt;#gemma4&lt;/code&gt; &lt;code&gt;#ai&lt;/code&gt; &lt;code&gt;#machinelearning&lt;/code&gt; &lt;code&gt;#python&lt;/code&gt; &lt;code&gt;#lora&lt;/code&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
    </item>
  </channel>
</rss>
