DEV Community

Michael Egberts
Michael Egberts

Posted on

Which Gemma 4 Variant Should Power Your MCP Agent?

This is a submission for the Gemma 4 Challenge: Write About Gemma 4


I’m writing this from a phone, on vacation. That’s not a flex — it’s the point.

I run an MCP server in production: WebsitePublisher.ai, 55+ tools, 9 AI platforms connected. This afternoon I opened Google AI Studio on my phone, selected Gemma 4 26B, gave it our tool schemas, and asked it to build a bakery website. It returned six structured tool calls. I executed them. The site went live.

No laptop. No terminal. No IDE. Just a phone, a model, and a protocol.

That experience crystallized something I’ve been thinking about: Gemma 4 doesn’t ship one model — it ships four, each sized for a different deployment reality. The question isn’t whether open-weight models can power MCP agents. It’s which Gemma 4 variant fits which part of your agent stack.


What is MCP?

Model Context Protocol is an open standard (JSON-RPC based) that lets AI models call external tools through a universal interface. Instead of each AI platform building proprietary integrations, you build one MCP server and every compatible AI can use it.

Our MCP server exposes tools like create_page, upload_asset, create_record, configure_form, and execute_integration. When a model connects, it can create web pages, manage structured data, handle form submissions, and trigger third-party services — all through standardized tool calls.

Any model that can produce structured output can be an MCP client. The question is: how well does it handle the work once connected?


The Gemma 4 Lineup

E2B E4B 26B A4B 31B Dense
Active params ~2B ~4B 3.8B (26B total, MoE) 31B
Context 128K 128K 256K 256K
Audio input Yes Yes No No
Min RAM ~4 GB ~8 GB ~16 GB ~24 GB
Runs on Phone, RPi Laptop Dev workstation GPU server

All Apache 2.0. No usage caps, no MAU thresholds.


The Test: Gemma 4 26B Builds a Website from a Phone

Here’s exactly what happened.

I opened Google AI Studio on my iPhone, selected Gemma 4 26B A4B IT, and pasted a prompt containing five of our MCP tool schemas (create_page, create_entity, create_record, configure_form, list_pages) along with this instruction:

“A user says: Build me a simple landing page for my bakery called ‘Golden Crust’. Include a short intro, three signature breads, and a contact form. Respond with the exact sequence of MCP tool calls.”

Gemma 4 returned six tool calls in valid JSON:

  1. create_entity — defined a “bread” data model with name and description fields
  2. create_record x3 — added Sourdough, French Baguette, and Honey Whole Wheat
  3. create_page — generated a full HTML landing page with inline CSS, product listings, and a contact form
  4. configure_form — set up the contact form with name, email, and message fields

Every tool call used the correct parameter structure. The ordering was logical: data model first, then records, then the page that references them, then the form configuration. The HTML included sensible styling, warm bakery colors, and properly structured sections.

I copied the tool calls into Claude (the AI assistant I use for MCP execution), ran them against our server, and the site went live at gemma-test.websitepublisher.ai.

Total time from prompt to live website: under 10 minutes. From a phone. Over 5G.


What This Test Actually Proves

Let me be precise about what this demonstrates and what it doesn’t.

It proves: Gemma 4 26B can parse MCP tool schemas, reason about task decomposition, produce correctly structured tool calls, and sequence them in a logical order — all without any fine-tuning on our specific tools. This is zero-shot tool use on a real production API.

It doesn’t prove: That Gemma 4 can handle a live MCP connection autonomously. In this test, I manually copied the tool calls and executed them. The model generated the plan; I was the middleware.

That distinction matters, and it’s where the variant comparison gets interesting.


Mapping Variants to MCP Agent Roles

Based on running MCP across 9 AI platforms and watching models of every size class interact with our tools, here’s how I’d think about placing each Gemma 4 variant:

E2B: The Front Door

With ~2B active parameters, E2B fits as the trigger: the component that understands intent and dispatches a single tool call. A voice command on a phone — “publish my latest blog post” — parsed and routed to the right MCP tool. One intent, one call, one response.

The native audio input is the differentiator. For voice-triggered MCP agents on battery-constrained devices, this is the size class that makes sense.

Likely sweet spot: Single-tool dispatcher. Voice-triggered agent entry point.
Likely limitation: Multi-step chains where context from earlier calls matters.

E4B: The Local Workhorse

This is where local MCP agents become genuinely useful. Running on any modern laptop, handling single-step tool calls with good reliability.

Based on what I’ve seen at this parameter range: straightforward create-and-deploy loops work well. Where models this size show limits is context-dependent sequences — “build a five-page site with consistent navigation” requires maintaining consistency across multiple creation steps.

Likely sweet spot: Local development agent. Content creation. Moderate single-step tool calls.
Likely limitation: Multi-page orchestration requiring consistency across 4+ sequential calls.

26B A4B: The Efficiency Sweet Spot

This is the variant I tested. And it delivered.

Six sequential tool calls, all correctly structured, logically ordered, with a coherent HTML output that referenced the data model it had just created. That’s not trivial — it requires the model to hold its own plan in context and execute against it consistently.

The MoE architecture (activating only 3.8B parameters per token while drawing on 26B total) and the 256K context window make this variant particularly suited for MCP work. Tool schemas are large — our 55+ tools consume significant context before the model even starts reasoning. The 256K window gives comfortable headroom.

But the bakery test was deliberately simple. Our MCP server also exposes 13 e-commerce integrations — product catalogs, shopping carts, checkout flows, payment processing via Stripe or Mollie, invoice generation, inventory tracking, and more. Building a full webshop means orchestrating these proven software building blocks in sequence: the AI picks the right pieces and combines them into a working application. We call this wave coding — not prompting and praying like vibe coding, but riding deliberate waves of AI-assembled, production-tested components. Each wave builds on the last. That’s where a model like the 26B earns its place: enough reasoning depth to orchestrate 6-8 integration calls reliably, enough context to hold the full picture.

Proven sweet spot: Multi-step tool orchestration. Production agent server. The “right answer” for most self-hosted MCP deployments.
Likely limitation: Highly creative or ambiguous tasks where raw reasoning power matters more than efficiency.

31B Dense: The Precision Architect

Every token touches all 31B parameters — no routing, no sparsity. Slower, heavier, but the strongest reasoner in the family.

For MCP agent work, this class earns its compute in two scenarios: architecture-level planning where the sequence of tool calls matters as much as individual calls, and fine-tuning for domain-specific tool patterns. The dense architecture makes fine-tuning more predictable than MoE.

Where 31B pulls ahead of 26B is full wave coding sessions — building an entire webshop from brief to live, orchestrating 15+ sequential integration calls while maintaining consistency across product data, payment configuration, email templates, and frontend pages. That’s the kind of sustained, multi-layer orchestration where every additional parameter matters.

Likely sweet spot: Complex project planning. Full wave coding orchestration. Fine-tuned domain agents.
Likely limitation: Cost and latency. For tasks where 26B delivers equivalent results, you’re burning compute you don’t need.


What I Learned About Model Size and Tool Calling

Running MCP across 9 platforms, one pattern stands out: for simple tool calls, model size barely matters. A “create this page” request succeeds with roughly the same reliability across model classes.

Where model size becomes decisive is orchestration depth — the number of sequential, context-dependent tool calls a model can chain before losing coherence. At two to three calls, almost anything works. Past six calls, only the stronger reasoners maintain consistency.

Open-weight models give you something closed APIs never will: the ability to match model weight to task weight. Route simple status checks to E4B and complex builds to 31B. Your agent gets smarter and cheaper at the same time.

That’s the real unlock of open-weight + MCP: you own both the brain and the hands.


The Decision Framework

Need voice or audio input?
Then E2B (phone/IoT) or E4B (laptop)

How many sequential tool calls per task?
1-3: E4B — fast, light, capable
4-8: 26B A4B — tested and proven
8+: 31B Dense — when orchestration quality justifies compute

Fine-tuning for a specific domain?
31B Dense — dense fine-tunes more predictably than MoE

Budget-constrained?
26B A4B. Almost always the answer.


What’s Next

While testing, I discovered that MCP Playground — an online tool for testing MCP servers — lists both Gemma 4 26B and 31B as available models. Our server connects and authenticates successfully. Once we resolve a token compatibility issue on our end, this will enable fully automated testing: type a prompt, Gemma 4 calls our MCP tools directly, website appears. No copy-paste middleware needed.

That’s the trajectory: from “model generates a plan I execute manually” to “model executes the plan autonomously through MCP.” Gemma 4’s native function calling support, combined with MCP’s standardized tool protocol, makes this path viable on fully open-source infrastructure.

If you want to start experimenting:

Pick the variant that fits your hardware. Connect it to a real MCP server. The benchmarks start mattering a lot less once you’re watching a model build something real.

Built and tested entirely from a phone. On vacation. Because that’s what open protocols and open models make possible.

Top comments (0)