DEV Community

Cover image for I Ran Every Gemma 4 Model on My Home Lab. E4B Crushes E2B. Here's the Data.
Shane Castile
Shane Castile

Posted on

I Ran Every Gemma 4 Model on My Home Lab. E4B Crushes E2B. Here's the Data.

Gemma 4 Challenge: Write about Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Write About Gemma 4


Google released four Gemma 4 variants. Everyone's comparing them on synthetic benchmarks nobody actually cares about. I ran all four on my home lab hardware with real tasks. The results surprised me.

Test machine: Ryzen 7 5700X, RTX 1060 6GB, 32GB RAM. LM Studio, 4-bit quantization.


The Models

Model Effective Params 4-bit Size Architecture
E2B ~2.3B 1.5GB Dense
E4B ~4.5B 2.1GB Dense
26B MoE ~4B active / 26B total 13GB Mixture of Experts
31B ~31B 16GB Dense

Test 1: Vision — Book Spine Reading

Point a camera at a bookshelf. Can it read the titles?

Model Time Books Found Quality
E2B 83s 0 — returned "NONE" ❌ Can't read spines
E4B 25s 6 titles, correctly identified ✅ Reliable
26B MoE OOM on 12GB ❌ Doesn't fit
31B OOM on 12GB ❌ Doesn't fit

This is the whole story. For multimodal tasks, E2B is not a smaller version of E4B — it's a fundamentally less capable vision model. It couldn't read a single book spine. E4B found 6.

If you're building anything with images, E2B is not an option. Period.


Test 2: Text — Technical Explanation

"Explain TCP vs UDP in 3 sentences."

Model Time Tokens Speed Answer Quality
E2B 93s 256 (hit limit) 2.8 t/s Mediocre — rambling
E4B 20s 113 5.7 t/s Concise and accurate

E4B was 4.6x faster and produced a better answer in fewer tokens. This flips the "smaller = faster" assumption — E4B's reasoning is more efficient, so it finishes sooner.


Test 3: Structured Output — JSON Generation

"Return a JSON array of 10 programming languages with year created and creator."

Model Valid JSON? Correct fields? Time
E2B ✅ Yes ❌ 3/10 wrong years 45s
E4B ✅ Yes ✅ All correct 12s

E2B hallucinated creation dates. E4B nailed every one.


Test 4: Vision + Reasoning Shelfie Pipeline

The real test. Run my Shelfie app — detect books from a photo → enrich with metadata → generate recommendations.

Model Detection Enrichment Total Works?
E2B Found 0 books N/A
E4B 16 books, 106s 2 batches, 280s ~8 min
26B/31B OOM

Only E4B completes the full pipeline on consumer hardware. Eight minutes for a full shelf catalog with recommendations isn't instant — but it costs $0 and stays local.


The Memory Wall

Here's what "runs on consumer hardware" actually means for each model on my RTX 1060 6GB:

Model VRAM Needed (4-bit) Fits 12GB? Room for Context?
E2B ~1.5GB ✅ Yes ✅ Ton of room
E4B ~2.1GB ✅ Yes ✅ Plenty of room
26B MoE ~13GB ❌ No
31B ~16GB ❌ No

The two big models literally don't fit on a 3200-class GPU. You need a 3090 (24GB) minimum for 31B, and even then you'll have barely any context window left.

For reference, the 31B dense model requires ~800MB more VRAM per million tokens of context. That 24GB 3090? It fits the model plus maybe 30K context. Not the advertised 256K.


The Decision Tree I Wish I'd Had

Ask yourself these questions in order:

1. Does it need to process images?

  • Yes → E4B minimum. E2B's vision is unusably bad.
  • No → Continue to Q2.

2. Does it fit in 6GB VRAM?

  • Yes → E4B 4-bit (~2.1GB) gives you room for context.
  • No → E2B or you need a bigger GPU.

3. Is it a one-off task or a repeated workload?

  • One-off → Cloud API (OpenRouter free tier has E4B).
  • Repeated → Local E4B. No per-token cost.

4. Do you need maximum reasoning quality?

  • Yes → 31B dense, but you need 24GB+ VRAM.
  • No → E4B is fine. I honestly couldn't tell the difference on book identification.

The Brutal Truth

E2B is marketing. "Runs on your phone!" Yeah, and it can't read a book spine. The gap between E2B and E4B for multimodal tasks isn't incremental — it's the difference between "works" and "doesn't work."

E4B is the model that makes local AI actually useful. It fits on a 3060, runs vision tasks reliably, generates structured output, and is faster than E2B because it reasons more efficiently.

26B MoE and 31B are for people with server GPUs. If you have a 4090 or an A100, they're incredible. If you have a gaming GPU, they're paperweights.

I picked E4B for Shelfie and it was the right call. Sixteen books, full metadata, personalized recommendations — all running on my home lab for free.

E4B is the unsung hero of the Gemma 4 family. The benchmarks won't tell you this. Real usage will.


Try Shelfie: github.com/scastile/shelfie

Top comments (0)