Thy Alpha

Posted on May 11 • Edited on May 23 • Originally published at hajirufai.github.io

I Built an AI Interview Coach with Gemma 4 — Zero Backend, 100% Free

#devchallenge #gemmachallenge #gemma

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

Interview Coach — an AI-powered interview practice tool that uses Google Gemma 4 to conduct realistic mock interviews, evaluate your answers in real-time, and generate detailed performance reports.

It runs entirely in the browser. No backend, no server, no accounts. Just Gemma 4's brain and your ambition.

Key Features

6 Practice Modes — Behavioral (STAR), Technical, System Design, Assessment, Certification, Case Study
📄 Resume + JD Aware — Paste your resume and job description, and Gemma 4 tailors questions to the exact role
🎤 Voice Mode — Speak your answers and hear feedback read aloud (browser-native, zero cost)
📸 Image Analysis — Upload coding screenshots or whiteboard diagrams for visual review
📊 Real-Time Scoring — Mid-session scorecards + final performance report with 7-day study plan
⏱️ Session Timer — Track your practice duration
📥 Report Download — Save your results as a text file
4 AI Providers — Google AI Studio, OpenRouter, NVIDIA NIM, HuggingFace (all free tiers)
🌙 Dark Mode + full mobile responsiveness
🔑 API Key Persistence — Remembers your key across sessions

Demo

Live Demo: hajirufai.github.io/gemma4-interview-coach (bring your own free API key from Google AI Studio)

Code

GitHub Repo: github.com/hajirufai/gemma4-interview-coach

The entire app is a single HTML file — no build step, no dependencies, no framework. Clone and open.

git clone https://github.com/hajirufai/gemma4-interview-coach.git
open index.html

How I Used Gemma 4 (And Why Nothing Else Would Work)

I didn't just use Gemma 4 — I chose it because three specific capabilities made my app dramatically better than it could be with any other free model.

1. Thinking Tokens → The Secret to Actual Interview Coaching

This is the killer feature. When Gemma 4 evaluates an interview answer, it doesn't just generate a response — it thinks first. The API returns parts flagged with thought: true, which are the model's internal reasoning before it commits to feedback.

Here's a real example of what happens when a candidate gives a vague behavioral answer:

{
  "candidates": [{
    "content": {
      "parts": [
        {
          "text": "Let me analyze this answer against STAR criteria...\n- Situation: Mentioned 'a project' but no specifics — what project? what team?\n- Task: 'Had to deliver on time' — vague, no ownership shown\n- Action: MISSING — they said what the team did, not what THEY did\n- Result: 'On time' — no metrics, no business impact quantified\n- Pattern note: This is the 3rd answer where they avoid specific numbers.\n  Previous answers also lacked quantification.\n- Calibration: For a Senior Engineer role, interviewers expect revenue/user impact.\n- Strategy: Acknowledge the topic choice, then push hard for specifics.",
          "thought": true
        },
        {
          "text": "Good topic choice — leading a team through delivery pressure is exactly what they'll ask about at the senior level. But your answer needs more teeth.\n\nRight now you're telling me *what happened*. I need to hear *what YOU specifically did*. Three things to add:\n\n1. **Specific numbers**: How big was the team? What was the deadline? What was at stake in revenue or users?\n2. **Your decisions**: What trade-offs did YOU make? What did you push back on?\n3. **Measurable result**: 'On time' isn't enough — did you beat the timeline? Save cost? Improve a metric?\n\nI've noticed across your last few answers that you tend to skip quantification. At the senior level, that's the difference between 'hire' and 'no hire'. Let's redo this one with numbers."
        }
      ]
    }
  }],
  "usageMetadata": {
    "promptTokenCount": 1847,
    "candidatesTokenCount": 198,
    "totalTokenCount": 2523,
    "thoughtsTokenCount": 478
  }
}

478 tokens of structured internal reasoning before producing the feedback. The model checks STAR criteria, notices patterns across multiple answers, calibrates to the experience level, and plans its coaching strategy — all before writing a single word of feedback.

Without thinking tokens, you get: "Good answer! Try adding more details next time." With them, you get feedback that sounds like an experienced interviewer who's been paying attention the whole session.

This is not prompt engineering. I didn't ask the model to "think step by step." Gemma 4 does this automatically via its architecture. No other free model has this capability built in.

2. 128K Context Window → The Session IS the Database

Interview coaching isn't a one-shot Q&A. It's a 15-20 turn conversation where the coach needs to:

Process the candidate's full resume (~800 tokens) and job description (~1,200 tokens) upfront
Remember the answer to Q1 when evaluating Q8
Notice patterns ("You keep avoiding specifics — let me push harder")
Generate a final report that references the entire session

Resume + JD:       ~2,000 tokens
System prompt:     ~1,000 tokens
Per Q&A round:     ~500 tokens × 15 rounds = 7,500 tokens
Thinking tokens:   ~400 tokens × 15 rounds = 6,000 tokens
Final report:      ~2,000 tokens
────────────────────────────────────
Total:             ~18,500 tokens ← fits easily in 128K

With older free models (4K-8K context), I would need a backend for conversation management, summarization pipelines, and retrieval. With Gemma 4's 128K window, the context window IS the database. Zero backend needed.

3. 31B Dense vs 26B MoE → Two Models, Two Purposes

I expose both variants in the app because they serve different coaching styles:

Variant	What It Does Best	Interview Use
31B Dense	Deep multi-criteria reasoning	Detailed STAR analysis, comprehensive final reports, complex system design evaluation
26B MoE	Fast responses (~4B active params)	Quick back-and-forth during rapid-fire behavioral questions, warm-up rounds

This isn't just offering "two models." It's matching model architecture to interview coaching pedagogy. Deep evaluation needs all 31B parameters active. Conversational flow benefits from the MoE's speed.

Why Not GPT-4o / Claude / Llama?

Requirement	GPT-4o	Claude	Llama 3	Gemma 4
Free tier (no credit card)	❌	❌	❌	✅
128K context	✅	✅	❌ (8K)	✅
Native thinking tokens	❌	❌	❌	✅
Open source	❌	❌	✅	✅
Browser-direct API	❌ (CORS)	❌ (CORS)	❌	✅
Multiple free providers	❌	❌	Partial	✅ (4 providers)

Gemma 4 is the only model that hits all six requirements simultaneously. That's not a convenience choice — it's the only model that makes this architecture possible.

The Zero-Backend Philosophy

Because Gemma 4 is free, open, and accessible via browser-direct HTTPS:

Privacy: Your API key and answers never touch a third-party server
$0 hosting: Static file on GitHub Pages
No lock-in: Single HTML file, MIT licensed, fork and customize
Works offline (except for model API calls)

91% of candidates who fail assessments never practiced under realistic conditions. Premium prep tools cost $30-50/month. Gemma 4 makes genuine, high-quality interview practice accessible to everyone.