akartit

Posted on Apr 4 • Originally published at kartit.net

I Tested Every Gemma 4 Model Locally on My MacBook - What Actually Works

#ai #machinelearning #gemma #opensource

Audio ASR in 3 languages, image understanding, full-stack app generation, coding, and agentic behavior -- all running on a MacBook M4 Pro with 24GB RAM.

Interactive version with playable audio, live charts, and the working React app: gemma4-benchmark.pages.dev

Google just released Gemma 4 -- their new family of open-source multimodal models. Four sizes, Apache-2.0 licensed, supports text + image + audio.

I spent a day testing every variant. Real audio files. Real images. Code that has to compile and run. Here is my honest report.

The Gemma 4 Family

E2B -- Dense 2.3B, Text/Image/Audio, 4 GB at 4-bit. Phones and edge.
E4B -- Dense 4.5B, Text/Image/Audio, 5.5 GB at 4-bit. Laptops.
26B-A4B -- MoE 4B active/26B total, Text/Image, 16-18 GB at 4-bit.
31B -- Dense 31B, Text/Image, 17-20 GB at 4-bit. Maximum quality.

Speed Benchmarks

Ollama: E2B 95 tok/s | E4B 57 tok/s | 26B ~2 tok/s (swap) | 31B won't fit

Unsloth MLX: E2B 81 tok/s (3.6 GB) | E4B 49 tok/s (5.6 GB)

Ollama is 15-20% faster. Unsloth MLX uses 40% less memory.

Audio ASR: 3 Languages

Tested via Ollama OpenAI-compatible endpoint. Only E2B and E4B support audio.

Listen to all test audio samples: Audio Player

English ASR

E4B (1.0s): Perfect transcription. Every word correct with punctuation.

E2B (2.8s): Garbled -- missing words, no punctuation.

French ASR

E4B (1.6s): Perfect transcription with all French accents correct.

E2B (4.1s): Fragmented, missing most of the sentence.

Arabic ASR

E4B (6.0s): Perfect Arabic transcription -- every word correct.

E2B (6.0s): Garbled -- wrong words, disordered.

Speech Translation (E4B)

French to English: "All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience..."

Arabic to English: "Hello, I am an artificial intelligence model. Today we will test speech recognition in the Arabic language..."

E4B is dramatically better than E2B for audio across all 3 languages.

Image Understanding

Test 1: Thai Temple -- Landmark Identification

E4B (54 tok/s): Thailand, Bangkok, Wat Phra Kaew (Temple of the Emerald Buddha) within the Grand Palace.

E2B (88 tok/s): Thailand, Bangkok, Grand Palace (less specific).

Test 2: AI-Generated Tokyo + Japanese OCR

AI-generated with nano-banana / Gemini

Both models correctly read Japanese kanji: 新宿ラーメン通り (Shinjuku Ramen Street)

Test 3: Venice Seagull

E4B: "A magnificent seagull perches watchfully atop a sculpted pedestal. The backdrop is a rich study in contrasting architectural styles..."

Full-Stack App Generation

E4B generated a 155-line working React + Tailwind Task Manager:

Try it live: gemma4-benchmark.pages.dev/task_manager.html

E2B failed -- code fragments instead of single file.

Coding: Compile and Run

Script	E2B	E4B
Fibonacci	PASS	PASS
Sieve of Eratosthenes	PASS	PASS
JSON processor	PASS	PASS
HTTP request	PASS	PASS
React single file	FAIL	PASS

Agentic Multi-Step Reasoning

6-step blog platform design. Both completed 6/6 steps. E4B output was 57% longer with more detail.

Why 26B Fails on 24GB

Community reports from r/LocalLLaMA suggest Gemma 4 has a KV cache memory issue (not verified on our hardware):

31B at 262K context: ~22GB just for KV cache (on top of model)
Google did not adopt KV-reducing techniques from Qwen 3.5
Workaround: --ctx-size 8192 --cache-type-k q4_0 --parallel 1

Official Benchmarks

Final Verdict

E4B -- The Sweet Spot -- 8.5/10

Perfect ASR in 3 languages. Working React app. Japanese OCR. 57 tok/s. 5.6 GB.

E2B -- Speed Demon -- 7/10

95 tok/s. 3.6 GB. Python works. Audio garbled. Failed complex HTML gen.

26B-A4B -- Heartbreaker -- 2/10 on 24GB

Amazing benchmarks (88.3% AIME). ~2 tok/s on 24GB. Needs 32GB+.

Quick Start

brew install ollama
ollama pull gemma4:e4b
ollama run gemma4:e4b

For 24GB MacBook: ollama run gemma4:e4b is the answer.

Tested April 3, 2026. MacBook Pro M4 Pro, 24GB, macOS Sequoia.

Interactive version: gemma4-benchmark.pages.dev

Sources: Google Model Card | HuggingFace Blog | Ollama | Unsloth Guide

Top comments (1)

Wojciech von Falkenstein • Apr 16

Would it be wise to revert the MacBook (I have the same specs and want to test) to a "naked" state or can I use the local model alongside all other apps and data?
Thanks in advance for a short reply!