DEV Community: Manglesh

I Spent 3 Days Fighting Gemma 4's API So You Don't Have To: The Honest Developer Guide

Manglesh — Sun, 24 May 2026 18:17:55 +0000

This is a submission for the Gemma 4 Challenge:
Write About Gemma 4

The Honest Truth Nobody Tells You About Building With Gemma 4

I just spent 3 days building a full-stack app
(Bondmap — a relationship network mapper) with
Gemma 4 as its AI brain.

I hit every wall possible.

Wrong model names. Thinking mode leaking 500 words
of internal reasoning into my UI. The systemInstruction
field being ignored. 404s, 400s, and a lot of confusion.

This post is everything I wish I'd known on Day 1.

First — Which Gemma 4 Model Do You Actually Need?

The official docs list these variants:

Model ID	Size	Best For
`gemma-4-e2b-it`	2B	Edge, mobile, Raspberry Pi
`gemma-4-e4b-it`	4B	Browser, lightweight apps
`gemma-4-31b-it`	31B Dense	Server, complex reasoning ✅
`gemma-4-26b-a4b-it`	26B MoE	High throughput + thinking mode

The mistake I made: I used gemma-4-9b-it — a model
that doesn't exist. The API returned a 404 and I spent
an hour debugging the wrong thing.

Rule 1: Copy model names from the official docs exactly.
There is no 9B. There is no 7B. The four above are it.

Setting Up The API (The Right Way)

Get your free API key at aistudio.google.com.
No credit card. No setup. Just a Google account.

The base endpoint:

https://generativelanguage.googleapis.com/v1beta/models/{MODEL_ID}:generateContent?key={YOUR_KEY}

Minimal working request:

const response = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemma-4-31b-it:generateContent?key=${API_KEY}`,
  {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      contents: [{
        role: "user",
        parts: [{ text: "Hello Gemma 4!" }]
      }]
    })
  }
);
const data = await response.json();
console.log(data.candidates[0].content.parts[0].text);

That's literally all you need to get started.

The Biggest Gotcha: Thinking Mode

The gemma-4-26b-a4b-it model (the MoE variant) has
thinking mode enabled by default.

This means instead of responding with:

"Rahul is your 1st-degree connection — he's your
brother!"

It responds with 400 words of internal reasoning like:

"* Persona: Bondmap AI. * Core Task: Explain connection.

Draft 1: Rahul is... * Tone Check: Warm? Yes.

Word Count: 68. Perfect. * Final answer: Rahul is..."

All of that showed up in my app's UI. Users would have
seen the model's entire thought process.

Two ways to fix this:

Fix 1 — Use gemma-4-31b-it instead
This model doesn't have thinking mode. Clean output
every time. This is what I ultimately switched to.

Fix 2 — Disable thinking budget (only for 26b-a4b)

// In your generation config
thinkingConfig.put("thinkingBudget", 0);

Note: This only works on gemma-4-26b-a4b-it.
Using it on gemma-4-31b-it throws a 400 error.

systemInstruction: Separate Field, Not Combined Text

This one took me a while. I was combining my system
prompt and user query into one message like this:

// ❌ WRONG — model treats instructions as conversation
userPart.put("text", systemPrompt + "\n\n" + userQuery);

The model would then analyze the instructions
instead of following them. It would respond to
"Keep responses under 100 words" as if it were a
question to answer.

The fix is to use systemInstruction as a
completely separate field in the request body:

// ✅ CORRECT — model treats this as binding rules
Map<String, Object> systemInstruction = new LinkedHashMap<>();
Map<String, Object> systemPart = new LinkedHashMap<>();
systemPart.put("text", "Your rules here...");
systemInstruction.put("parts", List.of(systemPart));
body.put("systemInstruction", systemInstruction);

// User message is ONLY the question
Map<String, Object> userContent = new LinkedHashMap<>();
userContent.put("role", "user");
userPart.put("text", userQuery); // Just the question

When structured this way, Gemma 4 follows the system
instructions reliably and only responds to the
actual user question.

The 128K Context Window Is The Real Superpower

Everyone talks about multimodal. The feature that
actually changed how I architect apps is the
128,000 token context window.

For my relationship network app, I load the user's
entire social graph — every person, every
relationship, every label — directly into the
context window. Then Gemma 4 reasons across the
whole graph in one shot.

No RAG. No vector database. No chunking.

Just:

systemPrompt = rules + ENTIRE network graph as text
userMessage = "How am I connected to Rahul?"

Gemma 4 traces multi-hop paths (A knows B who knows C)
and explains them in warm natural language.

For reference — 128K tokens fits roughly:

An entire novel (90,000 words)
A full codebase (hundreds of files)
Months of conversation history
Your entire relationship network

This changes what's architecturally possible. You don't
need a search layer for many use cases — just load
the data and let the model reason.

Multimodal: It Just Works

Sending an image to Gemma 4 is straightforward:

body: JSON.stringify({
  contents: [{
    parts: [
      { text: "What's in this image?" },
      {
        inline_data: {
          mime_type: "image/jpeg",
          data: base64ImageString // remove the data:image/jpeg;base64, prefix
        }
      }
    ]
  }]
})

I used this for a "photograph your group photo"
feature. Gemma 4 reads body language, setting, and
context to suggest what relationships the people
in the photo might have. No extra model needed —
same API, same endpoint.

Which Model Should YOU Use?

After building with all of them, here's my honest take:

Use gemma-4-e2b-it (2B) if:

You're building for Raspberry Pi or mobile edge
Latency matters more than response quality
Simple Q&A or classification tasks

Use gemma-4-e4b-it (4B) if:

Browser-based deployment
Moderate reasoning tasks
You want fast responses on a laptop (via Ollama)

Use gemma-4-31b-it (31B) if:

Server-side application ← This is probably you
Complex reasoning, multi-hop logic
You need clean output without thinking mode
Best balance of quality and reliability

Use gemma-4-26b-a4b-it (26B MoE) if:

You specifically want thinking/reasoning mode
High-throughput use cases
You don't mind managing the thinkingBudget setting

What Open-Source At This Level Actually Means

Gemma 4 31B runs on a single high-end GPU.
The 4B model runs on a laptop.

That means:

Your users' data never leaves their device
No per-token cost at scale
No vendor lock-in
Full control over the model behavior
Deploy in countries with data residency requirements

For my relationship network app, the privacy angle
is real — people's family and social connections are
sensitive data. Running Gemma 4 locally means that
data stays local. That's a genuine competitive
advantage over apps that send everything to OpenAI.

We're at a point where open-source models are
genuinely competitive with proprietary ones for
real production use cases. Gemma 4 31B isn't
"almost as good as GPT-4." For focused tasks with
good prompting, it's indistinguishable.

That changes the calculus for every developer
building AI-powered products.

Bondmap: AI-Powered Relationship Network That Maps How You're Connected to Everyone Using Gemma 4

Manglesh — Sun, 24 May 2026 18:14:29 +0000

This is a submission for the Gemma 4 Challenge:
Build with Gemma 4

What I Built

Bondmap is a social relationship network where anyone
can map their real-world connections — family, friends,
colleagues, romantic partners, long-distance relationships
— as a beautiful interactive visual graph.

The core idea: you add people and define how you know them.
Bondmap's AI (powered by Gemma 4) then reasons across your
entire network to explain hidden connections, trace
relationship paths, and suggest people you may know through
others.

The problem it solves: Most people have no clear picture
of how everyone in their life is actually connected.
LinkedIn shows work connections. WhatsApp shows contacts.
But nobody shows you that your college friend's brother
works with your current colleague — until now.

Key Features:

Interactive D3.js network map with color-coded connections (purple = family, green = friends, blue = work, pink = romantic)
Add people and define relationship types and labels (brother, mentor, childhood friend, long-distance etc)
Ask Gemma 4 in plain English: "How am I connected to Rahul?"
Upload a group photo — Gemma 4 vision analyzes who might be in it and suggests relationship types from context
1st, 2nd, and 3rd degree connection discovery
Spring Boot backend + React frontend

Demo

[🔗 Live App — bondmap.web.app]

[📹 Video Walkthrough — paste your video link here]

Key demo moments:

Adding people with different relationship types
Asking "How am I connected to [person]?" and getting a warm natural language answer from Gemma 4
Uploading a group photo and watching Gemma 4 analyze the context

Code

[🐙 GitHub Repository — github.com/MangleshKumar1/bondmap]

Tech Stack:

Frontend: React + Vite + D3.js
Backend: Java Spring Boot
Database: Firebase Firestore
AI: Gemma 4 via Google AI Studio API
Hosting: Firebase Hosting + Railway (backend)

How I Used Gemma 4

Model chosen: gemma-4-31b-it

I specifically chose the Gemma 4 31B Dense model for
three reasons:

1. Relationship Path Reasoning
When a user asks "How am I connected to Anjali?", Gemma 4
receives the entire relationship network as context and
traces the path across multiple hops — A knows B who knows C
— in a single inference call. This multi-hop graph reasoning
in natural language is exactly where the 31B model's
reasoning depth matters. A smaller model gave vague answers;
31B gave precise, warm, human-readable explanations every time.

2. The 128K Context Window
The entire relationship network — every person, every
connection, every label — is loaded into Gemma 4's context
window in one shot. No RAG, no chunking, no vector database.
Gemma 4 holds the whole graph in memory and reasons across
it holistically. For a social graph with dozens of people
and relationships, this is only possible with a large
context window.

3. Multimodal Vision for Photo Analysis
When a user uploads a group photo, Gemma 4 analyzes the
image — reads the occasion, body language, and positioning
— and suggests who these people might be and what
relationship types they have. This multimodal capability
is native to Gemma 4 and required no additional models.

Why not a smaller Gemma 4 model?
I tested gemma-4-e4b-it (4B) for the connection reasoning
task. The responses were correct but shallow — it couldn't
reliably trace 2nd and 3rd degree connections across a
larger network. The 31B model handled complex multi-hop
paths accurately every time, which is the core AI feature
of the app.

Architecture:
User asks question
↓
React frontend → Spring Boot API
↓
AIService.java builds prompt:
systemInstruction = rules + full network graph
user message = the question only
↓
Gemma 4 31B reasons across entire graph
↓
Clean natural language response
↓
Displayed in UI

Gemma 4 is not a wrapper here — it IS the relationship
intelligence engine. Every insight, every path explanation,
every photo analysis runs through Gemma 4.