Gemma 4: Google's Open-Weight AI That Actually Runs on Your Machine
#ai #machinelearning #opensource #gemma
If you've been watching the open-weight AI space, April 2025 was a big month. Google dropped Gemma 4 — and it's not just another incremental update. It's the most capable open model family Google has shipped yet, and it comes with something developers have been waiting for: native audio and vision, right out of the box.
Let's break down what's actually new, what it means for developers, and whether it's worth your attention.
What Is Gemma 4?
Gemma 4 is Google DeepMind's fourth-generation family of open-weight language models, released under the Apache 2.0 license. That means you can download the weights, fine-tune them, and deploy them commercially — no licensing fees, no usage restrictions, no vendor lock-in.
The family spans four sizes:
| Model | Architecture | Best For |
|---|---|---|
| E2B | Dense (effective 2B) | Mobile / browser (Pixel, Chrome) |
| E4B | Dense (effective 4B) | Edge / on-device |
| 26B A4B | Mixture-of-Experts | High-throughput servers |
| 31B | Dense | Server-grade + local workstations |
The "E" in E2B/E4B stands for effective parameters — Google uses a technique called Per-Layer Embeddings (PLE) that squeezes more capability out of smaller parameter counts, making them unusually powerful for on-device use.
What's Actually New
🎙️ Native Multimodality (Audio + Vision)
Previous Gemma releases were text-only or had limited image support bolted on. Gemma 4 ships with native support for text, images (variable aspect ratio), video, and audio — with audio natively supported on the E2B and E4B models. This isn't a wrapper; it's baked into the architecture.
🧠 Built-in Thinking Mode
All Gemma 4 models support configurable reasoning/thinking modes — the model can think step-by-step before answering. This is a big deal for tasks like math, code debugging, and agentic workflows where chain-of-thought makes a real difference.
📖 Massive Context Windows
- Small models (E2B, E4B): 128K token context
- Medium/large models (26B, 31B): 256K token context
That's enough to feed entire codebases, long documents, or multi-turn conversation histories in a single call.
🔧 Function Calling + Agentic Support
Gemma 4 includes native function calling and a system prompt role — meaning you can build proper tool-using agents without hacks. Google's own Agent Development Kit (ADK) has first-class Gemma 4 support if you want a framework to build on.
🌍 140+ Languages
The pre-training data covers more than 140 languages, with a knowledge cutoff of January 2025.
How Does It Compare to Llama 4?
Both dropped around the same time. Key differences:
- Architecture: Llama 4 uses MoE across the board for efficiency; Gemma 4 mixes dense and MoE depending on the size tier.
- Multimodality: Both support it natively; Gemma 4's audio support on small models is a notable edge for on-device use cases.
- License: Both Apache 2.0 — roughly equivalent freedom.
Neither is universally "better" — it depends on your task and deployment target.
Where Can You Run It?
- Locally: Hugging Face + Ollama + LM Studio all support Gemma 4 weights
- Cloud: Google Cloud Vertex AI (Model Garden), Cloud Run with NVIDIA Blackwell GPUs
- On-device: Pixel phones, Chrome browser (E2B/E4B)
- Fine-tuning: Vertex AI has an end-to-end guide for fine-tuning the 31B on TPUs
My Take
Gemma 4 is the first time I've felt like Google is genuinely competing in the open-weight space rather than just participating. The E4B hitting 128K context with native audio/vision on a phone is kind of wild when you think about it.
For developers, the Apache 2.0 license and the range of sizes mean you can prototype locally on a laptop with the 4B, then scale to the 26B MoE in production without changing your code. That workflow is actually practical now.
The built-in thinking mode and function calling make it a real candidate for agentic applications — not just chat. If you've been building with closed APIs for cost or capability reasons, Gemma 4 is worth a serious eval.
Get Started
What are you building with Gemma 4? Drop a comment — I'm especially curious if anyone's tried the audio features on-device yet. 👋
Top comments (0)