DEV Community

Haji Rufai
Haji Rufai

Posted on

Gemma 4 in the Browser: Why Zero-Backend AI Apps Are the Future (And How to Build One)

Gemma 4 Challenge: Write about Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Write about Gemma 4.

Most AI apps follow the same pattern: build a backend, proxy API calls through your server, manage user sessions in a database, deploy to the cloud, pay for hosting.

But what if you skipped all of that?

What if your entire AI application was a single HTML file that talks directly to Gemma 4 from the browser — no backend, no server, no database, no hosting costs?

This isn't a thought experiment. I built one. And the experience taught me things about Gemma 4 that change how I think about AI application architecture.


The Zero-Backend Pattern

Here's the architecture in its entirety:

User's Browser ──(HTTPS)──> Google AI Studio API
                                    │
                              Gemma 4 31B Dense
                                    │
User's Browser <──(JSON)────────────┘
Enter fullscreen mode Exit fullscreen mode

That's it. No Express server. No AWS Lambda. No Supabase. No Vercel. The browser makes direct API calls to Google's Generative Language API, which runs Gemma 4.

The user provides their own API key — free from Google AI Studio. Their data never touches a third-party server. The app itself is hosted as a static file on GitHub Pages.

This sounds limiting. It's actually liberating.


Why Gemma 4 Makes This Possible

Not every model works for zero-backend apps. You need a specific combination of capabilities:

1. Free API Access with No Credit Card

Google AI Studio offers Gemma 4 with a generous free tier — no credit card required. This is critical. If your zero-backend app asks users to bring their own key, that key needs to be free to obtain. Any friction (billing setup, waitlists, approval) kills adoption.

Gemma 4's free tier delivers:

  • 31B Dense and 27B MoE models
  • 128K context window
  • Multimodal input (text + images)
  • Rate limits generous enough for personal use

2. 128K Context That Actually Works

Zero-backend means no database. No Redis cache. No conversation history stored on a server. Everything lives in the browser's memory.

This makes the context window your only storage mechanism for conversation state. And Gemma 4's 128K window is enormous:

A typical chat session:
- System prompt:     ~800 tokens
- 20 message pairs:  ~10,000 tokens
- Working memory:    ~11,000 tokens total

Available context:   128,000 tokens
Utilization:         ~8.5%
Enter fullscreen mode Exit fullscreen mode

You could run a conversation with hundreds of exchanges before hitting the limit. For comparison, GPT-3.5's 4K context would overflow after 3-4 exchanges in the same application.

This matters because zero-backend apps can't implement sliding window memory, RAG retrieval, or conversation summarization without a server. The model's context window IS your database.

3. Built-in Reasoning (Thinking Tokens)

Gemma 4 has native chain-of-thought reasoning. The API returns special tokens marked thought: true — the model's internal deliberation before it responds.

Why does this matter for zero-backend apps? Because you can't add reasoning on the server side. There's no backend to implement chain-of-thought prompting, self-consistency checking, or multi-step reasoning pipelines.

With Gemma 4, the reasoning happens inside the model:

[
  {
    "text": "Let me analyze this answer...\n- Structure: uses STAR framework ✓\n- Specificity: mentions numbers ✓\n- Gap: no mention of lessons learned",
    "thought": true
  },
  {
    "text": "Strong answer! Your use of the STAR framework was excellent..."
  }
]
Enter fullscreen mode Exit fullscreen mode

Your browser-side code just filters out thought: true parts and displays the final response. The heavy cognitive lifting happens in the model, not your infrastructure.

4. Multi-Provider Availability

Here's something underappreciated about Gemma 4: because it's open-source, it's available from multiple API providers. If Google AI Studio is overloaded (500/503 errors), you can fall back to:

  • OpenRouter — aggregates multiple model providers
  • NVIDIA NIM — optimized inference
  • HuggingFace — the open-source hub

All serving the same Gemma 4 model. All accessible from the browser. This gives zero-backend apps a resilience layer that proprietary models can't match.


The Practical Benefits

$0 Operating Costs — Forever

No server means no hosting bill. GitHub Pages is free. Cloudflare Pages is free. Even opening index.html from your desktop works. Your AI app costs nothing to run regardless of how many users it has.

This isn't just cost savings — it's a fundamentally different business model. You never have to worry about a viral moment bankrupting you with compute costs. The users bring their own API credits.

Privacy by Architecture

When there's no backend, there's literally nowhere for user data to leak. API keys stay in the browser. Conversation content goes directly to the model API and back. No logs, no analytics database, no third-party tracking.

This is privacy by design, not privacy by policy. You don't need a privacy policy because you never have access to user data in the first place.

Instant Deployment, Zero DevOps

git push origin main
# That's it. Your app is live.
Enter fullscreen mode Exit fullscreen mode

No Docker containers. No CI/CD pipelines. No environment variables to configure. No database migrations. No SSL certificates to manage. Just static files served by a CDN.

Works Offline (Mostly)

The HTML/CSS/JS loads once and is cached. If you're using a local Gemma 4 instance via Ollama, the entire stack works without an internet connection. Even the Tailwind CSS is loaded from CDN on first visit and cached.


The Real Tradeoffs

Let's be honest about what you give up:

No User Accounts

Without a backend, there's no authentication system. You can use LocalStorage for persistence, but it's device-specific and clearable. For many use cases (tools, utilities, practice apps), this is fine. For SaaS products, it's a dealbreaker.

Rate Limits Are Per-User

Since each user has their own API key, they hit their own rate limits. You can't pool capacity or implement queuing. If the free tier's rate limit is too low for a specific use case, the user needs to upgrade their own Google AI account.

No Server-Side Processing

You can't do background jobs, scheduled tasks, or heavy computation. Everything must happen in the browser during the user's session. For most AI chat applications, this is perfectly fine. For data pipelines or batch processing, look elsewhere.

API Key UX

Asking users to "bring your own API key" adds friction. You need clear instructions, a test button, and graceful error handling. This is manageable but not zero-effort.


How to Build One: The Minimal Recipe

Here's everything you need for a zero-backend Gemma 4 app:

1. The API Call (Under 20 Lines)

async function askGemma(conversationHistory, apiKey) {
  const response = await fetch(
    `https://generativelanguage.googleapis.com/v1beta/` +
    `models/gemma-4-31b-it:generateContent?key=${apiKey}`,
    {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        contents: conversationHistory,
        generationConfig: {
          temperature: 0.75,
          maxOutputTokens: 2048,
          topP: 0.95
        }
      })
    }
  );
  const data = await response.json();
  // Filter out thinking tokens, return visible response
  return data.candidates[0].content.parts
    .filter(p => !p.thought)
    .map(p => p.text)
    .join('');
}
Enter fullscreen mode Exit fullscreen mode

2. Conversation History (Array in Memory)

const history = [];

// Add user message
history.push({
  role: 'user',
  parts: [{ text: userMessage }]
});

// Get response
const reply = await askGemma(history, apiKey);

// Add model response
history.push({
  role: 'model',
  parts: [{ text: reply }]
});
Enter fullscreen mode Exit fullscreen mode

3. Multi-Provider Fallback

const PROVIDERS = {
  google: {
    url: (model, key) =>
      `https://generativelanguage.googleapis.com/` +
      `v1beta/models/${model}:generateContent?key=${key}`,
    format: 'google'
  },
  openrouter: {
    url: () => 'https://openrouter.ai/api/v1/chat/completions',
    format: 'openai'
  }
};
Enter fullscreen mode Exit fullscreen mode

4. Retry Logic for Reliability

async function fetchWithRetry(url, options, retries = 3) {
  for (let i = 0; i < retries; i++) {
    const res = await fetch(url, options);
    const data = await res.json();
    if (!data.error ||
        ![500, 503].includes(data.error.code))
      return data;
    await new Promise(r =>
      setTimeout(r, 1500 * (i + 1))
    );
  }
}
Enter fullscreen mode Exit fullscreen mode

That's it. Four patterns. Copy these into any HTML file, add a UI, and you have a zero-backend AI app powered by Gemma 4.


When This Pattern Fits

Great for:

  • Developer tools and utilities
  • Educational apps and tutors
  • Practice and training tools
  • Personal productivity assistants
  • Prototypes and hackathon projects
  • Privacy-sensitive applications

Not ideal for:

  • Multi-user collaborative apps
  • Apps requiring server-side secrets
  • Heavy background processing
  • Production SaaS with billing

The Bigger Picture

Zero-backend AI apps aren't a compromise — they're a category. They trade server-side control for:

  • Zero operational cost
  • Perfect privacy
  • Instant deployment
  • Infinite scalability (each user brings their own compute)

Gemma 4 is uniquely suited for this pattern because it combines free API access, massive context windows, built-in reasoning, and multi-provider availability. No other model family checks all four boxes simultaneously.

The best tool for the job is the one people will actually use. And "free, private, works instantly" removes every barrier except motivation.

If you're building something with Gemma 4, consider whether you really need that backend. You might be surprised how far a single HTML file can take you.


This article draws on lessons learned building Interview Coach, a zero-backend AI interview practice tool powered by Gemma 4. Source code on GitHub (MIT License).

Top comments (0)