<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Kajal Rawat</title>
    <description>The latest articles on DEV Community by Kajal Rawat (@kajal_rawat_3482ea50f7bf9).</description>
    <link>https://dev.to/kajal_rawat_3482ea50f7bf9</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3898250%2Fa411a25d-3e65-4dbf-b428-75261388e29c.jpg</url>
      <title>DEV Community: Kajal Rawat</title>
      <link>https://dev.to/kajal_rawat_3482ea50f7bf9</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kajal_rawat_3482ea50f7bf9"/>
    <language>en</language>
    <item>
      <title>Why Your Gemma 4 Fine-Tuning is Failing (and How to Fix It)</title>
      <dc:creator>Kajal Rawat</dc:creator>
      <pubDate>Thu, 07 May 2026 20:35:45 +0000</pubDate>
      <link>https://dev.to/kajal_rawat_3482ea50f7bf9/why-your-gemma-4-fine-tuning-is-failing-and-how-to-fix-it-ppo</link>
      <guid>https://dev.to/kajal_rawat_3482ea50f7bf9/why-your-gemma-4-fine-tuning-is-failing-and-how-to-fix-it-ppo</guid>
      <description>&lt;p&gt;TL;DR: Gemma 4 is a multimodal beast with an Apache 2.0 license, but its new &lt;code&gt;ClippableLinear&lt;/code&gt; layers and dynamic image tokens will break standard LoRA scripts. Use &lt;code&gt;target_modules="all-linear"&lt;/code&gt; and backward-search masking to hit 94%+ accuracy on Cloud Run.&lt;/p&gt;

&lt;p&gt;Gemma 4 has officially landed, and with an Apache 2.0 license and a 356K context window, it’s the new king of open-weight models. But if you try to drop it into your old Gemma 3 or Llama scripts, it will fail.&lt;/p&gt;

&lt;p&gt;I’ve been deep-diving into the architecture, and there are three specific "under-the-hood" changes that will break your pipeline if you aren't careful. Here is how to master Gemma 4 using Cloud Run Jobs and NVIDIA RTX 6000 Pro GPUs.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The "ClippableLinear" Gotcha ⚠️
&lt;/h2&gt;

&lt;p&gt;Gemma 4 uses a new custom layer wrapper called &lt;code&gt;Gemma4ClippableLinear&lt;/code&gt;. This is a genius move for stability—it clips activations to prevent the loss from exploding during long-context training.&lt;/p&gt;

&lt;p&gt;The Problem: Standard LoRA often tries to attach directly to the inner weights, bypassing the clipping logic. This leads to "unstable loss" or "NaN" errors.&lt;br&gt;
The Fix:Use &lt;code&gt;target_modules="all-linear"&lt;/code&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Pro-Tip: Instead of being surgical, go broad. This recursively wraps the layers without breaking the clipping logic and ensures the vision tower is updated alongside the language backbone.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  2. Multimodal Label Masking (The Precision Secret)
&lt;/h2&gt;

&lt;p&gt;Gemma 4 is hyper-efficient with media. It uses a dynamic number of soft tokens for images. This means you can’t simply calculate prompt length by tokenizing the text alone—the image tokens will shift your alignment.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Strategy: Backward-Search Collation
&lt;/h3&gt;

&lt;p&gt;Don't calculate; search. In your data collator, search the &lt;code&gt;input_ids&lt;/code&gt; array backward to find your label, then step back to the &lt;code&gt;&amp;lt;|turn&amp;gt;&lt;/code&gt; token. This mathematically guarantees that you aren't accidentally training the model on your own prompts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Use the Assistant turn marker as your masking anchor
# This ensures zero-alignment shift regardless of image token count.
&lt;/span&gt;&lt;span class="n"&gt;assistant_start_token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;convert_tokens_to_ids&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;|turn&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3. The Power of "Serverless" Fine-Tuning
&lt;/h2&gt;

&lt;p&gt;Using Cloud Run Jobs with NVIDIA RTX 6000 Pro (96GB VRAM) is the "cheat code" for independent devs. You get 96GB of HBM, which is enough to run:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gemma 4 31B (Dense) via QLoRA (4-bit).&lt;/li&gt;
&lt;li&gt;Base footprint: ~18-20GB.&lt;/li&gt;
&lt;li&gt;The Rest:Massive overhead for high-resolution images or long-context video frames.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Results Breakdown (Oxford-IIIT Pet Dataset)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Training Samples&lt;/th&gt;
&lt;th&gt;Accuracy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Gemma 3 Baseline&lt;/td&gt;
&lt;td&gt;4,000&lt;/td&gt;
&lt;td&gt;67%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemma 4 Baseline&lt;/td&gt;
&lt;td&gt;4,000&lt;/td&gt;
&lt;td&gt;89%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemma 4 (Fine-tuned)&lt;/td&gt;
&lt;td&gt;4,000&lt;/td&gt;
&lt;td&gt;94.2% (SOTA)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  🛠 Quick-Start Migration Guide
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Load the Correct Class
&lt;/h3&gt;

&lt;p&gt;Forget &lt;code&gt;AutoModelForCausalLM&lt;/code&gt;. Gemma 4 is multimodal by design.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AutoModelForMultimodalLM&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoModelForMultimodalLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Image-First Prompting
&lt;/h3&gt;

&lt;p&gt;Gemma 4 prefers a stable convention: Image data must come before text.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"image"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; 
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Analyze this pet breed."&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Deploy to Cloud Run
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud beta run &lt;span class="nb"&gt;jobs &lt;/span&gt;execute gemma4-finetuning-job &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; europe-west4 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--gpu&lt;/span&gt; 1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--gpu-type&lt;/span&gt; nvidia-rtx-pro-6000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"--model-id"&lt;/span&gt;,&lt;span class="s2"&gt;"/mnt/gcs/gemma-4-31b-it/"&lt;/span&gt;,&lt;span class="s2"&gt;"--train-size"&lt;/span&gt;,&lt;span class="s2"&gt;"4000"&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Gemma 4 isn't just an "upgrade"—the 26B MoE variant and the 31B Dense model are redefining what "open-weight" means. By moving to an &lt;code&gt;all-linear&lt;/code&gt; LoRA approach and leveraging serverless Blackwell GPUs, we can achieve SOTA results in hours, not days.&lt;/p&gt;

&lt;p&gt;What are you building with the new 256K context window? Let’s discuss in the comments! 👇&lt;/p&gt;

&lt;h1&gt;
  
  
  gemma #googlecloud #machinelearning #opensource #ai #cloudrun
&lt;/h1&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
    </item>
  </channel>
</rss>
