<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Olivier Lacombe</title>
    <description>The latest articles on DEV Community by Olivier Lacombe (@olivier_lacombe).</description>
    <link>https://dev.to/olivier_lacombe</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3294575%2F21b02150-1e5e-4b0e-8666-02cae6d0a7c9.jpg</url>
      <title>DEV Community: Olivier Lacombe</title>
      <link>https://dev.to/olivier_lacombe</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/olivier_lacombe"/>
    <language>en</language>
    <item>
      <title>Introducing Gemma 4 12B: a unified, encoder-free multimodal model</title>
      <dc:creator>Olivier Lacombe</dc:creator>
      <pubDate>Fri, 05 Jun 2026 16:51:47 +0000</pubDate>
      <link>https://dev.to/googleai/introducing-gemma-4-12b-a-unified-encoder-free-multimodal-model-3ge5</link>
      <guid>https://dev.to/googleai/introducing-gemma-4-12b-a-unified-encoder-free-multimodal-model-3ge5</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Gemma 4 12B is designed to bring high-performance multimodal intelligence directly to your laptop, combining mobile-first efficiency with advanced reasoning.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Today, we are introducing Gemma 4 12B, our latest model designed to bring agentic multimodal intelligence directly to laptops. Bridging the gap between our edge-friendly E4B and our more advanced 26B Mixture of Experts (MoE), Gemma 4 12B packages powerful capabilities inside a reduced memory footprint. It is also our first mid-sized model to feature native audio inputs.&lt;/p&gt;

&lt;p&gt;Thanks to the developer community, &lt;a href="https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/" rel="noopener noreferrer"&gt;Gemma 4&lt;/a&gt; models have now crossed 150 million downloads. You've built everything from &lt;a href="https://www.youtube.com/watch?v=OhaIA3bYwmg" rel="noopener noreferrer"&gt;wearable robotic arms&lt;/a&gt; for physical assistance to &lt;a href="https://deepmind.google/models/gemma/gemmaverse/hirundo/" rel="noopener noreferrer"&gt;enterprise-grade AI security&lt;/a&gt;. We're excited to see what you build with this latest addition.&lt;/p&gt;

&lt;p&gt;Here's an overview of what makes Gemma 4 12B unique:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Novel unified architecture:&lt;/strong&gt; No multimodal encoders. The vision and audio inputs flow directly into the LLM backbone.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Advanced reasoning:&lt;/strong&gt; Benchmark performance nearing our 26B model, unlocking powerful multi-step reasoning and agentic workflows.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Laptop ready:&lt;/strong&gt; Small enough to run locally with just 16GB of VRAM or unified memory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open and accessible:&lt;/strong&gt; Released under an Apache 2.0 license with support across the developer ecosystem.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Drafter-ready:&lt;/strong&gt; Gemma 4 12B comes equipped with Multi-Token Prediction (MTP) drafters to reduce latency.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Together, these features bring advanced multimodal capabilities to everyday hardware without sacrificing speed or reasoning. Let's now take a closer look at how Gemma 4 12B achieves this.&lt;/p&gt;

&lt;h3&gt;
  
  
  Run state-of-the-art agents locally
&lt;/h3&gt;

&lt;p&gt;Gemma 4 12B delivers performance nearing our larger 26B MoE model on standard benchmarks, but at less than half the total memory footprint. Small enough to run locally on consumer laptops with 16GB of RAM, it unlocks powerful multimodal and agentic experiences right on your machine.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ddtwuug5uwrkzwlakig.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ddtwuug5uwrkzwlakig.webp" alt="Gemma 4 12B Benchmark" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Experience a uniquely efficient, unified architecture
&lt;/h2&gt;

&lt;p&gt;What makes Gemma 4 12B stand out is its streamlined approach to processing visual and audio inputs. Traditional multimodal models typically rely on separate encoders to translate images and audio before passing those representations to the language model. Because these split encoders add latency and increase memory usage, we trained Gemma 4 12B with an encoder-free architecture to integrate audio and vision input directly.&lt;/p&gt;

&lt;p&gt;Here is how Gemma 4 12B processes multimodal inputs natively:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vision:&lt;/strong&gt; We replaced Gemma 4's vision encoder with a lightweight embedding module consisting of a single matrix multiplication, positional embedding and normalizations. This allows the LLM backbone to take over visual processing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audio:&lt;/strong&gt; We simplified audio processing even further. We removed the audio encoder entirely and projected the raw audio signal into the same dimensional space as text tokens.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For developers who want a breakdown, head over to our companion Gemma 4 12B &lt;a href="https://developers.googleblog.com/gemma-4-12b-the-developer-guide/" rel="noopener noreferrer"&gt;Developer Guide&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt; &lt;br&gt;
  &lt;iframe src="https://www.youtube.com/embed/Q5a7dAREbXM"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;



&lt;center&gt;&lt;small&gt;See native audio processing in action: Watch Gemma 4 12B transcribe, format, and translate voice inputs entirely offline using the Google AI Edge Eloquent app.&lt;/small&gt;&lt;/center&gt;




&lt;h2&gt;
  
  
  Get started today
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Try it yourself&lt;/strong&gt;: Experiment with a couple of clicks in &lt;a href="https://lmstudio.ai/models/gemma-4" rel="noopener noreferrer"&gt;LM Studio&lt;/a&gt;, &lt;a href="https://ollama.com/library/gemma4" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt;, &lt;a href="https://developers.google.com/edge/gallery" rel="noopener noreferrer"&gt;Google AI Edge Gallery App&lt;/a&gt;, the &lt;a href="https://ai.google.dev/edge/eloquent" rel="noopener noreferrer"&gt;Google AI Edge Eloquent&lt;/a&gt; app and the &lt;a href="https://ai.google.dev/edge/litert-lm/cli" rel="noopener noreferrer"&gt;LiteRT-LM CLI&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Download the weights&lt;/strong&gt;: Download the pre-trained and instruction-tuned checkpoints directly from &lt;a href="https://huggingface.co/collections/google/gemma-4" rel="noopener noreferrer"&gt;Hugging Face&lt;/a&gt; and &lt;a href="https://www.kaggle.com/models/google/gemma-4" rel="noopener noreferrer"&gt;Kaggle&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integrate &amp;amp; learn:&lt;/strong&gt; Review the &lt;a href="https://ai.google.dev/gemma/docs/core" rel="noopener noreferrer"&gt;developer documentation&lt;/a&gt; and the &lt;a href="https://ai.google.dev/gemma/docs/capabilities/text/basic" rel="noopener noreferrer"&gt;quick start notebook&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use your favorite development tools&lt;/strong&gt;: Implement local inference pipelines with &lt;a href="https://huggingface.co/google/gemma-4-12B-it" rel="noopener noreferrer"&gt;Hugging Face Transformers&lt;/a&gt;, &lt;a href="https://huggingface.co/collections/ggml-org/gemma-4" rel="noopener noreferrer"&gt;llama.cpp&lt;/a&gt;, &lt;a href="https://huggingface.co/collections/mlx-community/gemma-4" rel="noopener noreferrer"&gt;MLX&lt;/a&gt;, &lt;a href="https://docs.sglang.io/cookbook/autoregressive/Google/Gemma4" rel="noopener noreferrer"&gt;SGLang&lt;/a&gt;, and &lt;a href="https://docs.vllm.ai/projects/recipes/en/latest/Google/Gemma4.html" rel="noopener noreferrer"&gt;vLLM&lt;/a&gt;, or fine-tune with efficiency using &lt;a href="https://unsloth.ai/docs/models/gemma-4" rel="noopener noreferrer"&gt;Unsloth&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unlock Agentic Development with Gemma Skills:&lt;/strong&gt; To support agents to build with the latest Gemma advancements, we are releasing our official &lt;a href="https://github.com/google-gemma/gemma-skills" rel="noopener noreferrer"&gt;Skills Repository&lt;/a&gt;. This is a library of skills designed specifically to enable agents to build with Gemma models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deploy your way:&lt;/strong&gt; Spin up endpoints in production using Google Cloud. Deploy your way through &lt;a href="https://console.cloud.google.com/agent-platform/publishers/google/model-garden/gemma4;publisherModelVersion=gemma-4-12b-it" rel="noopener noreferrer"&gt;Gemini Enterprise Agent Platform Model Garden&lt;/a&gt;, &lt;a href="https://codelabs.developers.google.com/codelabs/cloud-run/cloud-run-gpu-rtx-pro-6000-gemma4-vllm" rel="noopener noreferrer"&gt;Cloud Run&lt;/a&gt; and &lt;a href="https://docs.cloud.google.com/kubernetes-engine/docs/tutorials/serve-gemma-gpu-vllm" rel="noopener noreferrer"&gt;GKE&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>gemma</category>
      <category>google</category>
    </item>
  </channel>
</rss>
