I Replaced Claude with Gemma 4 for a Weekend — Here's What Broke

I run five websites from Sydney and use AI models daily — for blog drafts, code fixes, SEO analysis, quick research. Most of my workflow runs on Claude Sonnet because it's consistent and doesn't need babysitting. So when Google dropped Gemma 4 on April 2, 2026 under Apache 2.0, I figured I'd stress-test it over a weekend before forming any opinions.

Short version: it's genuinely impressive in places, mildly annoying in others, and the license alone changes a lot of the math.

What Gemma 4 Actually Is

Gemma 4 is Google's latest open-weight model family. Released April 2, 2026. Apache 2.0 license, which means you can use it commercially, modify it, redistribute it — no royalties, no restrictions on derivative works. That's meaningful.

The family ships in several sizes: 4B, 12B, 27B, and a new 96B variant. The 27B is the one most people will actually run locally (needs roughly 20GB VRAM in full precision, or 12GB quantized to Q4).

It's multimodal — image understanding built in, not bolted on. And there's genuine agentic scaffolding baked into the instruction-tuned variants, meaning it handles multi-step tool use more coherently than Gemma 3 did.

What I Actually Tested

Test 1: Code generation for a Next.js component

I gave it a prompt I regularly use with Claude: build me a React component that fetches data from a Supabase table, handles loading/error states, and renders a responsive table.

Gemma 4 27B (via Ollama, quantized) produced working code on the second attempt. First attempt had a minor type error in TypeScript. Second attempt fixed it without me explaining what was wrong.

Claude Sonnet would have nailed this on the first try. But Claude costs money per token. Gemma 4 running locally costs electricity.

Test 2: Document analysis (multimodal)

I threw a screenshot of a GA4 analytics dashboard at it and asked it to summarize traffic trends. Gemma 4 read the numbers correctly but its interpretation was generic. It told me sessions were down 14% without offering any hypothesis about why. Claude tends to make inferences. Gemma 4 reports rather than reasons.

Test 3: SEO content editing

I fed it a 1,200-word blog post and asked it to identify thin sections. This went better than expected. It flagged two genuinely weak paragraphs, suggested adding a comparison table, and offered three alternative headline options that were actually good.

The Surprise (Good and Bad)

Good surprise: The 12B model is more capable than it has any right to be. I ran it on a machine with 8GB VRAM and it handled most single-turn tasks at a quality level I'd compare to GPT-3.5 era.

Bad surprise: Agentic tasks with multi-step tool use hit context length issues faster than expected. Around step four of a five-step workflow, it started losing track of earlier context.

Also: it's verbose by default. Ask it a yes/no question with nuance, it writes three paragraphs.

How It Compares

	Gemma 4 27B	Claude Sonnet 4.5	GPT-4o
Cost	Free (local) or ~$0.10/M via API	~$3/$15 per M tokens	~$2.5/$10 per M tokens
Context	128K	200K	128K
Code quality	Good, 2nd attempt	Excellent, 1st attempt	Very good
License	Apache 2.0 (fully open)	Proprietary	Proprietary

The license column is doing more work than it looks. If you need AI costs that don't scale with usage, or on-prem deployment for compliance, Gemma 4 is now a serious option.

Who Should Use This

Worth it if: self-hosting, compliance requirements, fine-tuning experiments, or budget-conscious.

Stick with Claude/GPT if: you need top-tier multi-step reasoning, heavy document inference, or don't want to manage infrastructure.

I'm not switching my main workflow off Claude. But I've moved quick classification tasks and a couple of internal scripts to a local Gemma 4 12B instance. That's probably $30-40/month in API calls I won't be making.

Not a revolution, but a genuine shift in what's viable to run without a credit card.