Puram Arjun

Posted on May 15

Gemma 4: Google's Open-Weight AI That Actually Runs on Your Machine

#webdev #programming #productivity #javascript

Gemma 4: Google's Open-Weight AI That Actually Runs on Your Machine

#ai #machinelearning #opensource #gemma

If you've been watching the open-weight AI space, April 2025 was a big month. Google dropped Gemma 4 — and it's not just another incremental update. It's the most capable open model family Google has shipped yet, and it comes with something developers have been waiting for: native audio and vision, right out of the box.

Let's break down what's actually new, what it means for developers, and whether it's worth your attention.

What Is Gemma 4?

Gemma 4 is Google DeepMind's fourth-generation family of open-weight language models, released under the Apache 2.0 license. That means you can download the weights, fine-tune them, and deploy them commercially — no licensing fees, no usage restrictions, no vendor lock-in.

The family spans four sizes:

Model	Architecture	Best For
E2B	Dense (effective 2B)	Mobile / browser (Pixel, Chrome)
E4B	Dense (effective 4B)	Edge / on-device
26B A4B	Mixture-of-Experts	High-throughput servers
31B	Dense	Server-grade + local workstations

The "E" in E2B/E4B stands for effective parameters — Google uses a technique called Per-Layer Embeddings (PLE) that squeezes more capability out of smaller parameter counts, making them unusually powerful for on-device use.

What's Actually New

🎙️ Native Multimodality (Audio + Vision)

Previous Gemma releases were text-only or had limited image support bolted on. Gemma 4 ships with native support for text, images (variable aspect ratio), video, and audio — with audio natively supported on the E2B and E4B models. This isn't a wrapper; it's baked into the architecture.

🧠 Built-in Thinking Mode

All Gemma 4 models support configurable reasoning/thinking modes — the model can think step-by-step before answering. This is a big deal for tasks like math, code debugging, and agentic workflows where chain-of-thought makes a real difference.

📖 Massive Context Windows

Small models (E2B, E4B): 128K token context
Medium/large models (26B, 31B): 256K token context

That's enough to feed entire codebases, long documents, or multi-turn conversation histories in a single call.

🔧 Function Calling + Agentic Support

Gemma 4 includes native function calling and a system prompt role — meaning you can build proper tool-using agents without hacks. Google's own Agent Development Kit (ADK) has first-class Gemma 4 support if you want a framework to build on.

🌍 140+ Languages

The pre-training data covers more than 140 languages, with a knowledge cutoff of January 2025.

How Does It Compare to Llama 4?

Both dropped around the same time. Key differences:

Architecture: Llama 4 uses MoE across the board for efficiency; Gemma 4 mixes dense and MoE depending on the size tier.
Multimodality: Both support it natively; Gemma 4's audio support on small models is a notable edge for on-device use cases.
License: Both Apache 2.0 — roughly equivalent freedom.

Neither is universally "better" — it depends on your task and deployment target.

Where Can You Run It?

Locally: Hugging Face + Ollama + LM Studio all support Gemma 4 weights
Cloud: Google Cloud Vertex AI (Model Garden), Cloud Run with NVIDIA Blackwell GPUs
On-device: Pixel phones, Chrome browser (E2B/E4B)
Fine-tuning: Vertex AI has an end-to-end guide for fine-tuning the 31B on TPUs

My Take

Gemma 4 is the first time I've felt like Google is genuinely competing in the open-weight space rather than just participating. The E4B hitting 128K context with native audio/vision on a phone is kind of wild when you think about it.

For developers, the Apache 2.0 license and the range of sizes mean you can prototype locally on a laptop with the 4B, then scale to the 26B MoE in production without changing your code. That workflow is actually practical now.

The built-in thinking mode and function calling make it a real candidate for agentic applications — not just chat. If you've been building with closed APIs for cost or capability reasons, Gemma 4 is worth a serious eval.

Get Started

What are you building with Gemma 4? Drop a comment — I'm especially curious if anyone's tried the audio features on-device yet. 👋

DEV Community

Gemma 4: Google's Open-Weight AI That Actually Runs on Your Machine

Gemma 4: Google's Open-Weight AI That Actually Runs on Your Machine

What Is Gemma 4?

What's Actually New

🎙️ Native Multimodality (Audio + Vision)

🧠 Built-in Thinking Mode

📖 Massive Context Windows

🔧 Function Calling + Agentic Support

🌍 140+ Languages

How Does It Compare to Llama 4?

Where Can You Run It?

My Take

Get Started

Top comments (0)