<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sameer Ali khan</title>
    <description>The latest articles on DEV Community by Sameer Ali khan (@sameer_alikhan).</description>
    <link>https://dev.to/sameer_alikhan</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2629442%2F3d575141-f0a7-4381-9fa6-d5ad3fb37a46.png</url>
      <title>DEV Community: Sameer Ali khan</title>
      <link>https://dev.to/sameer_alikhan</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sameer_alikhan"/>
    <language>en</language>
    <item>
      <title>InterviewAce: Real-Time AI Mock Interviews with Gemini Live API &amp; Google ADK</title>
      <dc:creator>Sameer Ali khan</dc:creator>
      <pubDate>Mon, 16 Mar 2026 13:02:51 +0000</pubDate>
      <link>https://dev.to/sameer_alikhan/interviewace-real-time-ai-mock-interviews-with-gemini-live-api-google-adk-2nmf</link>
      <guid>https://dev.to/sameer_alikhan/interviewace-real-time-ai-mock-interviews-with-gemini-live-api-google-adk-2nmf</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;📢 &lt;strong&gt;Disclosure:&lt;/strong&gt; I created this blog post as part of my submission to the &lt;strong&gt;Gemini Live Agent Challenge&lt;/strong&gt; hackathon. The project — InterviewAce — was built specifically for this competition using Google AI models and Google Cloud. &lt;strong&gt;#GeminiLiveAgentChallenge&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;blockquote&gt;
&lt;p&gt;🔗 &lt;a href="https://interviewace-117780891544.us-central1.run.app/" rel="noopener noreferrer"&gt;Live Demo&lt;/a&gt; · 💻 &lt;a href="https://github.com/SameerAliKhan-git/IntyerviewBit" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; · 📹 &lt;a href="https://youtu.be/JrjhgB5Ib_0" rel="noopener noreferrer"&gt;Demo Video&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Problem Nobody Talks About
&lt;/h2&gt;

&lt;p&gt;Everyone knows technical interviews are hard. But here is what nobody says out loud — &lt;strong&gt;most people never actually practice them.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not because they are lazy. Because real practice is expensive and inaccessible:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Professional mock interview services charge &lt;strong&gt;$150–$300 per session&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Asking friends to interview you is awkward and rarely useful&lt;/li&gt;
&lt;li&gt;AI chatbots give you text responses — but real interviews are not text conversations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And here is the deeper issue: the things that fail candidates are &lt;strong&gt;not the answers&lt;/strong&gt; — they are the delivery. The "um"s and "uh"s. The slouched posture. The rambling answer that never reaches a conclusion. The eye contact that breaks every time you think.&lt;/p&gt;

&lt;p&gt;No text-based AI tool has ever addressed this. Until now.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built: InterviewAce
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;InterviewAce&lt;/strong&gt; is a real-time, multimodal AI interview coach that puts you in a &lt;strong&gt;pixel-perfect Google Meet replica&lt;/strong&gt; with an AI hiring manager called &lt;strong&gt;Coach Ace&lt;/strong&gt; — who simultaneously:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🗣️ &lt;strong&gt;Speaks&lt;/strong&gt; to you with sub-500ms voice latency via Gemini 2.5 Flash Native Audio&lt;/li&gt;
&lt;li&gt;👀 &lt;strong&gt;Watches&lt;/strong&gt; your body language live through your webcam&lt;/li&gt;
&lt;li&gt;🎤 &lt;strong&gt;Detects&lt;/strong&gt; filler words in real time ("um", "uh", "like", "you know")&lt;/li&gt;
&lt;li&gt;📊 &lt;strong&gt;Scores&lt;/strong&gt; your answers across Confidence, Clarity, Content and STAR structure&lt;/li&gt;
&lt;li&gt;🔍 &lt;strong&gt;Searches Google&lt;/strong&gt; live to give hallucination-free company-specific context&lt;/li&gt;
&lt;li&gt;📝 &lt;strong&gt;Generates&lt;/strong&gt; a full performance report with downloadable transcript&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No typing. No text boxes. Just a real conversation with an AI that actually watches and listens.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tech Stack at a Glance
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AI Agent&lt;/td&gt;
&lt;td&gt;Google ADK + Gemini 2.5 Flash Native Audio&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Live Streaming&lt;/td&gt;
&lt;td&gt;Gemini Live API (bidiGenerateContent)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Backend&lt;/td&gt;
&lt;td&gt;Python, FastAPI, Uvicorn, WebSockets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Frontend&lt;/td&gt;
&lt;td&gt;Vanilla JavaScript, Web Audio API, MediaDevices API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grounding&lt;/td&gt;
&lt;td&gt;ADK built-in google_search + local knowledge base&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Infrastructure&lt;/td&gt;
&lt;td&gt;Google Cloud Run, Docker, Cloud Build&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  System Architecture
&lt;/h2&gt;

&lt;p&gt;Here is the complete picture of how every component connects — from your microphone all the way to Gemini and back:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp7rbn7ssy0kzaidrtit8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp7rbn7ssy0kzaidrtit8.png" alt=" "&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────────────────────┐
│                      🖥️  BROWSER (Vanilla JS)                       │
│                                                                      │
│  🎤 Microphone (PCM 16kHz)  ──┐                                     │
│  📷 Camera (JPEG 1fps)  ──────┼──▶  WebSocket Client ◀────────┐    │
│                                │         │                      │    │
│  🔊 Audio Player  ◀────────────┼─────────┘                     │    │
│  💬 Closed Captions ◀──────────┤      Audio + Images + JSON    │    │
│  📊 Live Analytics ◀───────────┘                               │    │
└────────────────────────────────────────────────────────────────┼────┘
                                                                  │ WebSocket
┌─────────────────────────────────────────────────────────────────▼───┐
│                    ⚙️  FASTAPI BACKEND (Python)                      │
│                                                                      │
│   WebSocket Server (main.py)                                         │
│          │                                                           │
│          ▼                                                           │
│   LiveRequestQueue ──▶ ADK Runner ──▶ InMemorySessionService        │
└──────────────────────────────────────────────────────────────────────┘
                    │                        ▲
           Bidi Stream                  Audio + Tool Results
                    │                        │
┌───────────────────▼────────────────────────┴────────────────────────┐
│                      🤖  GOOGLE ADK AGENT                            │
│                                                                      │
│     Gemini 2.5 Flash Native Audio + Vision                           │
│          │                                                           │
│          ▼  Autonomous Tool Calls (silent, every 2-3 answers)        │
│   ┌──────────────────────────────────────────────────────────────┐   │
│   │                   🔧 11 CUSTOM TOOLS                         │   │
│   │                                                              │   │
│   │  TIER 1 — Core Analysis:                                     │   │
│   │   save_session_feedback  │  detect_filler_words              │   │
│   │   analyze_body_language  │  evaluate_star_method             │   │
│   │                                                              │   │
│   │  TIER 2 — Deep Coaching:                                     │   │
│   │   analyze_voice_confidence  │  get_improvement_tips          │   │
│   │   fetch_grounding_data      │  adjust_difficulty_level       │   │
│   │                                                              │   │
│   │  TIER 3 — Session Reporting:                                 │   │
│   │   get_session_history  │  save_session_recording             │   │
│   │   generate_session_report                                    │   │
│   │                                                              │   │
│   │  GROUNDING:  google_search (ADK built-in)                    │   │
│   └──────────────────────────────────────────────────────────────┘   │
└──────────────────────────────────────────────────────────────────────┘
                               │
┌──────────────────────────────▼──────────────────────────────────────┐
│                        ☁️  GOOGLE CLOUD                              │
│                                                                      │
│      Cloud Run (Serverless Container)  +  Container Registry        │
└──────────────────────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Data Flow — Step by Step
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;STEP 1 ── User speaks
          Mic → PCM audio (16kHz) → WebSocket → FastAPI → LiveRequestQueue

STEP 2 ── Camera streams
          Webcam → JPEG frame (1fps, 320×240) → WebSocket → FastAPI → LiveRequestQueue

STEP 3 ── Gemini responds
          LiveRequestQueue → bidiGenerateContent → Gemini 2.5 Flash
          Gemini audio bytes → WebSocket → Browser AudioPlayer → User hears voice

STEP 4 ── Background tools fire (silently, every 2-3 answers)
          Gemini calls → detect_filler_words()
          Gemini calls → analyze_body_language()
          Gemini calls → evaluate_star_method()
          Gemini calls → save_session_feedback()
          Tool results → JSON side-channel → WebSocket → Sidebar updates live

STEP 5 ── Transcription
          Gemini → Input + Output transcription → Closed Captions rendered in UI

STEP 6 ── Session ends
          User clicks End Interview
          → generate_session_report()
          → Full modal: scores, breakdown, downloadable transcript
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Project File Structure
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;IntyerviewBit/
├── README.md
├── cloudbuild.yaml                     ← Google Cloud Build CI/CD
└── interviewace/
    ├── Dockerfile                      ← Cloud Run container
    ├── .env.example
    ├── requirements.txt
    └── app/
        ├── main.py                     ← FastAPI + WebSocket server
        └── interview_coach_agent/
        │   ├── agent.py                ← ADK Agent + 11 tools registered
        │   ├── prompts.py              ← Coach Ace persona + instructions
        │   ├── tools.py                ← All 10 custom tool implementations
        │   └── grounding_data.py       ← Verified local coaching knowledge base
        └── static/
            ├── index.html              ← Single-page Google Meet replica
            ├── css/
            │   └── style.css           ← Complete Meet-style CSS
            └── js/
                ├── app.js              ← Main app logic + WebSocket client
                ├── audio-player.js     ← PCM audio playback engine
                ├── audio-recorder.js   ← Mic capture + 48kHz→16kHz downsample
                └── camera.js           ← Adaptive webcam frame capture
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Agent — Coach Ace
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;COACH ACE — FULL TOOL MAP
────────────────────────────────────────────────────────────────────

TIER 1 — Core Analysis  (fires silently every 2-3 answers)
┌───────────────────────────┬──────────────────────────────────────┐
│ Tool                      │ What It Does                         │
├───────────────────────────┼──────────────────────────────────────┤
│ save_session_feedback     │ Scores 4 dimensions 0-100:           │
│                           │ Confidence, Clarity, Content,        │
│                           │ Body Language                        │
├───────────────────────────┼──────────────────────────────────────┤
│ detect_filler_words       │ Counts um / uh / like / you know.    │
│                           │ Updates live sidebar counter + tips  │
├───────────────────────────┼──────────────────────────────────────┤
│ analyze_body_language     │ Rates posture, eye contact,          │
│                           │ expression from live camera frame    │
├───────────────────────────┼──────────────────────────────────────┤
│ evaluate_star_method      │ Checks S-T-A-R answer structure.     │
│                           │ Lights up S T A R badges in real time│
└───────────────────────────┴──────────────────────────────────────┘

TIER 2 — Deep Coaching
┌───────────────────────────┬──────────────────────────────────────┐
│ analyze_voice_confidence  │ Pace, volume, tone, pause analysis   │
│ get_improvement_tips      │ Targeted coaching per weakness       │
│ fetch_grounding_data      │ Pulls from verified local KB         │
│ adjust_difficulty_level   │ Scales question difficulty up/down   │
└───────────────────────────┴──────────────────────────────────────┘

TIER 3 — Session Management
┌───────────────────────────┬──────────────────────────────────────┐
│ get_session_history       │ Retrieves scores from past sessions  │
│ save_session_recording    │ Persists transcript + all metrics    │
│ generate_session_report   │ Builds full post-interview breakdown │
└───────────────────────────┴──────────────────────────────────────┘

GROUNDING
┌───────────────────────────┬──────────────────────────────────────┐
│ google_search             │ ADK built-in. Live web search for    │
│ (ADK built-in)            │ company-specific interview facts     │
└───────────────────────────┴──────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Architechture -
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F748o1jlevyms0m3gfv61.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F748o1jlevyms0m3gfv61.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Dual Grounding System
&lt;/h2&gt;

&lt;p&gt;Early in development, Coach Ace would confidently hallucinate Amazon Leadership Principles or invent Google interview formats. I fixed it with two grounding layers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CANDIDATE ASKS: "What is Google's interview process like?"
                              │
                              ▼
                ┌─────────────────────────┐
                │     GROUNDING ROUTER    │
                └────────┬────────────────┘
                         │
            ┌────────────┴────────────┐
            ▼                         ▼
┌─────────────────────┐   ┌─────────────────────┐
│ fetch_grounding_    │   │   google_search()    │
│ data()              │   │   ADK built-in       │
│                     │   │                     │
│ LOCAL KNOWLEDGE     │   │  LIVE WEB SEARCH    │
│ BASE                │   │                     │
│ (grounding_data.py) │   │  Searches for real, │
│                     │   │  current company    │
│ Covers:             │   │  interview info     │
│ • STAR method       │   │                     │
│ • Body language     │   │  Prevents all       │
│ • Voice delivery    │   │  hallucination of   │
│ • Common mistakes   │   │  company-specific   │
│ • Coaching tips     │   │  facts              │
└──────────┬──────────┘   └──────────┬──────────┘
           │                         │
           └────────────┬────────────┘
                        ▼
             GROUNDED RESPONSE
             (accurate + verified)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Code Deep Dive
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. The WebSocket Bridge (&lt;code&gt;main.py&lt;/code&gt;)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.websocket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/ws&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;websocket_endpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;websocket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;WebSocket&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;websocket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;accept&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;session_service&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;InMemorySessionService&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session_service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;app_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;interviewace&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;candidate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;live_request_queue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LiveRequestQueue&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Start ADK runner — talks to Gemini Live API in background
&lt;/span&gt;    &lt;span class="n"&gt;runner_task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;runner&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;live_request_queue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;websocket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;websocket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;iter_bytes&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parse_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;live_request_queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send_realtime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="nc"&gt;Blob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunk&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audio/pcm;rate=16000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;live_request_queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send_realtime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="nc"&gt;Blob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;frame&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image/jpeg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;runner_task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cancel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. The ADK Agent (&lt;code&gt;agent.py&lt;/code&gt;)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;google_search&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;.tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;save_session_feedback&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;detect_filler_words&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;analyze_body_language&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;evaluate_star_method&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;analyze_voice_confidence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;get_improvement_tips&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;fetch_grounding_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;adjust_difficulty_level&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;get_session_history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;save_session_recording&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;generate_session_report&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;coach_ace&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;coach_ace&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-flash-preview-native-audio-dialog&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Senior AI hiring manager — real-time mock interviews&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;COACH_ACE_PROMPT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;save_session_feedback&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;detect_filler_words&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;analyze_body_language&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;evaluate_star_method&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;analyze_voice_confidence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;get_improvement_tips&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;fetch_grounding_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;adjust_difficulty_level&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;get_session_history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;save_session_recording&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;generate_session_report&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;google_search&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# ADK built-in grounding tool
&lt;/span&gt;    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Silent Background Tool (Example)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;detect_filler_words&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transcript&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Fires autonomously every 2-3 answers.
    User never hears a pause — runs between turns.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;filler_patterns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;um&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;uh&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;like&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;you know&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;basically&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;literally&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;counts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;transcript&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;filler_patterns&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;transcript&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;counts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

    &lt;span class="nf"&gt;update_session_analytics&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filler_words&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;breakdown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;counts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;coaching_tip&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;get_filler_tip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filler_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;breakdown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;counts&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. PCM Audio Engine (&lt;code&gt;audio-recorder.js&lt;/code&gt;)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AudioRecorder&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;onAudioData&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;onAudioData&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;onAudioData&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;targetSampleRate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;16000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Gemini Live expects 16kHz&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;navigator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;mediaDevices&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getUserMedia&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;audio&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;AudioContext&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// Native rate ~48kHz&lt;/span&gt;

        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;scriptProcessor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createScriptProcessor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nx"&gt;scriptProcessor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;onaudioprocess&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;inputData&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;inputBuffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getChannelData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="c1"&gt;// Downsample 48kHz → 16kHz before sending to Gemini&lt;/span&gt;
            &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;downsampled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;downsample&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;inputData&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sampleRate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;16000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pcm16&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;floatTo16BitPCM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;downsampled&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;onAudioData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pcm16&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// → WebSocket → FastAPI → Gemini&lt;/span&gt;
        &lt;span class="p"&gt;};&lt;/span&gt;

        &lt;span class="nx"&gt;source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scriptProcessor&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nx"&gt;scriptProcessor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;destination&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nf"&gt;downsample&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;fromRate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;toRate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ratio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;fromRate&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;toRate&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Float32Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;ratio&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
        &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;ratio&lt;/span&gt;&lt;span class="p"&gt;)];&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. Adaptive Webcam Capture (&lt;code&gt;camera.js&lt;/code&gt;)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CameraCapture&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;captureFrame&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;canvas&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createElement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;canvas&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nx"&gt;canvas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;width&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;320&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// Low-res — body language doesn't need 1080p&lt;/span&gt;
        &lt;span class="nx"&gt;canvas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;height&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;240&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;canvas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;2d&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;drawImage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;videoElement&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;320&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;240&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="c1"&gt;// JPEG 60% quality — sufficient for vision, minimal bandwidth&lt;/span&gt;
        &lt;span class="nx"&gt;canvas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toBlob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;blob&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;blob&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;arrayBuffer&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;buf&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;onFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
            &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;image/jpeg&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="mf"&gt;0.6&lt;/span&gt;
        &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nf"&gt;adaptFrameRate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;networkQuality&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Drop to 0.33fps (1 frame/3 sec) under poor network&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.33&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;networkQuality&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The UI — Google Meet Replica
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌──────────────────────────────────────────────────────────────────────┐
│  InterviewAce  [Google logo]          ⏱ 12:34      [Participants]    │
├─────────────────────────────────────────────┬────────────────────────┤
│                                             │   📊 LIVE ANALYTICS    │
│  ┌─────────────────┐  ┌─────────────────┐   │                        │
│  │                 │  │                 │   │  Confidence  ████▓░    │
│  │  COACH ACE      │  │  ELENA          │   │  Clarity     ███░░░    │
│  │  AI Interviewer │  │  AI Notetaker   │   │  STAR Score  ████░░    │
│  │                 │  │                 │   │  Body Lang.  ███▓░░    │
│  │ [Volume Rings]  │  │ [Volume Rings]  │   │                        │
│  └─────────────────┘  └─────────────────┘   │  🎤 Filler Words: 3   │
│                                             │  um(2)  uh(1)          │
│  ┌──────────────────────────────────────┐   │                        │
│  │                                      │   │  👁 Eye Contact  ●●    │
│  │            YOU                       │   │  🧍 Posture      ●○    │
│  │        (Live Webcam)                 │   │  😊 Expression   ●●    │
│  │                                      │   │                        │
│  │  [Equalizer bars animate when mic]   │   │  STAR Badges:          │
│  └──────────────────────────────────────┘   │  [S✓][T✓][A○][R○]     │
│                                             │                        │
│  💬 CC: "...tell me about a time you had    │  💡 Use 'I' not 'we'   │
│          to debug a production issue..."    │                        │
├─────────────────────────────────────────────┴────────────────────────┤
│  [🎤 Mic]  [📷 Cam]  [CC]  [Chat]  [People]    [🔴 End Interview]   │
└──────────────────────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Built in &lt;strong&gt;100% Vanilla JavaScript&lt;/strong&gt; — no React, no Vue, no Angular. Frameworks add event loop overhead that impacts PCM audio timing. At 16kHz, every millisecond of jitter is audible.&lt;/p&gt;




&lt;h2&gt;
  
  
  Cloud Deployment Pipeline
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Developer pushes to GitHub
         │
         ▼
Google Cloud Build (cloudbuild.yaml)
         │
         ├── docker build -t gcr.io/PROJECT/interviewace .
         ├── docker push → Container Registry
         └── gcloud run deploy interviewace
                  │
                  ├── Region:         us-central1
                  ├── Memory:         1Gi
                  ├── Port:           8080
                  ├── session-affinity: TRUE  ← critical for WebSockets
                  └── allow-unauthenticated:  TRUE
         │
         ▼
Cloud Run Serverless Container
         ├── Scales to zero when idle  (cost: $0)
         ├── Handles WebSocket connections persistently
         └── Scales instantly on demand
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key lesson learned the hard way:&lt;/strong&gt; Cloud Run requires &lt;code&gt;--session-affinity&lt;/code&gt; for any WebSocket-based app. Without it, the load balancer routes mid-session requests to a different container instance, breaking your persistent connection. This cost us hours to debug.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Challenges I Ran Into
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Real-time audio + vision sync&lt;/strong&gt;&lt;br&gt;
Streaming 16kHz PCM audio and JPEG frames simultaneously over one WebSocket without dropped frames required bandwidth-adaptive throttling and decoupled queues.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Barge-in handling&lt;/strong&gt;&lt;br&gt;
When the user speaks mid-response, the agent must stop cleanly without corrupting the audio buffer. Getting reliable barge-in via the ADK streaming layer took multiple iterations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Invisible tool calls&lt;/strong&gt;&lt;br&gt;
All background analysis tools need to feel completely invisible. I tuned them to fire silently between turns and stream analytics to the UI via a JSON side-channel on the same WebSocket.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Company hallucination&lt;/strong&gt;&lt;br&gt;
Early versions confidently invented interview formats. Fixed with two-layer grounding: verified local knowledge base + ADK's built-in Google Search.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Audio latency in production&lt;/strong&gt;&lt;br&gt;
Getting end-to-end voice latency under 500ms on Cloud Run required tuning buffer sizes, optimising the PCM pipeline, and keeping ASGI async throughout.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Lessons
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Native audio models are fundamentally different from text models.&lt;/strong&gt; Design your system for async bidirectional streaming from the ground up.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grounding is non-negotiable for agentic apps.&lt;/strong&gt; Prompt engineering alone is not enough — even capable models hallucinate domain facts without grounding.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vanilla JS outperforms frameworks for latency-sensitive audio/video.&lt;/strong&gt; Full control over the audio pipeline timing matters at the PCM level.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ADK LiveRequestQueue needs careful queue management.&lt;/strong&gt; Keep audio, vision, and tool-call result streams strictly decoupled.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Always add session-affinity to Cloud Run WebSocket services.&lt;/strong&gt; Stateless load balancing breaks persistent connections.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;🔗 &lt;strong&gt;Live Demo:&lt;/strong&gt; &lt;a href="https://interviewace-117780891544.us-central1.run.app/" rel="noopener noreferrer"&gt;https://interviewace-117780891544.us-central1.run.app/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;No API key. No signup. No cost. Click the link, allow mic + camera, and start your mock interview.&lt;/p&gt;

&lt;p&gt;💻 &lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/SameerAliKhan-git/IntyerviewBit" rel="noopener noreferrer"&gt;https://github.com/SameerAliKhan-git/IntyerviewBit&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;📹 &lt;strong&gt;Demo Video:&lt;/strong&gt; &lt;a href="https://youtu.be/JrjhgB5Ib_0" rel="noopener noreferrer"&gt;https://youtu.be/JrjhgB5Ib_0&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built for the Gemini Live Agent Challenge using Google ADK · Gemini Live API · Google Cloud Run&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;#GeminiLiveAgentChallenge #GoogleAI #Gemini #ADK #GoogleCloud #Python #WebDev&lt;/em&gt;&lt;/p&gt;

</description>
      <category>gemini</category>
      <category>geminiliveagentchallenge</category>
      <category>ai</category>
      <category>cloud</category>
    </item>
  </channel>
</rss>
