Parthiban Marimuthu

Posted on Mar 16

Building InterVU: A Real-Time AI Interview Agent with Google's Gemini Live API

#geminiliveagentchallenge #gemini

This post was created for the Gemini Live Agent Challenge.

Gemini Live Agent Challenge: Redefining Interaction: From Static Chatbots to Immersive Experiences - Devpost

Redefining Interaction: From Static Chatbots to Immersive Experiences

geminiliveagentchallenge.devpost.com

Demo

Inspiration

Technical interview preparation is broken.

Most candidates rehearse answers in isolation or practice with
text-based mock interview tools. But real interviews evaluate much more
than just the correctness of an answer.

Interviewers observe:

Eye contact
Confidence
Communication clarity
Body language
Ability to think under pressure

Existing mock interview tools are chatbots. They cannot see you,
hear your tone, or interrupt when you ramble.

Real interviewers do.

So I built InterVU --- a real‑time AI interview agent that can:

👀 See you through your webcam
🎤 Hear your voice in real time
🗣️ Speak naturally like an interviewer
📊 Analyze your interview performance

The goal was to simulate the experience of sitting across from a real
engineering hiring manager.

What is InterVU?

InterVU is a real‑time AI mock interview platform powered by
Google Gemini Live API.

It can:

Analyze job descriptions
Parse candidate resumes
Conduct live interviews
Monitor body language and communication
Generate structured interview reports

Instead of practicing alone, candidates receive real‑time coaching and
a structured feedback report after the interview.

System Architecture

Browser (SPA) ←→ WebSocket ←→ FastAPI (Cloud Run) ←→ Gemini Live API
                                    ↓                      ↓
                               SQLite DB            Gemini Chat API
                                    ↓                      ↓
                              GCS Storage           Structured Reports

Design Goals

The architecture focuses on three core principles.

Low latency

Real‑time voice conversations require extremely fast responses.

Minimal frontend overhead

The frontend uses vanilla JavaScript instead of heavy frameworks to
minimize latency.

Async backend

FastAPI with asyncio manages multiple concurrent streaming
pipelines.

Tech Stack

Layer	Technology
Frontend	HTML, CSS, Vanilla JavaScript
Backend	FastAPI
Streaming	WebSockets
AI	Gemini Live API
Structured Extraction	LangChain
Deployment	Google Cloud Run
Storage	Google Cloud Storage

Phase 1 --- Smart Setup: JD + Resume Parsing

Before the interview begins, InterVU analyzes both the job description
and the candidate's resume.

This ensures the interview is grounded in the actual job
requirements.

skills, parsed_resume = await asyncio.gather(
    parse_job_description(request.job_description),
    parse_resume(request.resume_text),
)

skill_gap = await analyze_skill_gap(skills, parsed_resume)

The system extracts:

Required skills
Preferred skills
Seniority level
Domain
Responsibilities
Programming languages
Frameworks
Projects

This allows the AI interviewer to ask highly relevant questions
instead of generic ones.

Phase 2 --- The 3‑State Interview Engine

The AI interviewer persona Wayne operates using a three‑state
behavior engine.

Interrogation Mode

Wayne cross‑references the resume against the job description.

Example:

"I see Python listed on your resume. Can you describe a project where
you used asynchronous programming?"

Visual Evaluation

During the interview the system analyzes:

Eye contact
Head orientation
Posture
Speaking duration

Wayne can interrupt if the candidate loses focus or rambles.

Tutor Mode

If a candidate struggles with a question:

Wayne explains the concept briefly
Asks a simplified follow‑up
Returns to interview mode

This makes the interview educational instead of discouraging.

Phase 3 --- Real‑Time Conversations with Gemini Live API

The core component is the GeminiLiveSession class managing a
bidirectional streaming connection.

config_kwargs = {
    "system_instruction": types.Content(
        parts=[types.Part(text=system_text)]
    ),
}

Voice interaction is enabled with:

config_kwargs["response_modalities"] = ["AUDIO"]

This allows natural spoken conversations with the AI interviewer.

The Hardest Problem: Turn‑Taking

Real‑time conversations require correct turn detection.

Layer 1 --- Voice Activity Detection

if (!this._isUserSpeaking && rms < 0.03) {
    this._noiseFloor = this._noiseFloor * 0.95 + rms * 0.05;
}

Layer 2 --- Echo Suppression

Microphone input is disabled when the AI speaks to prevent feedback
loops.

Layer 3 --- Silence Watchdog

If the candidate stops speaking for several seconds, the server signals
the model to respond.

Phase 4 --- Concurrent WebSocket Pipelines

The interview session runs three concurrent async tasks.

await asyncio.gather(
    forward_to_gemini(),
    forward_to_browser(),
    timer_task(),
)

Each pipeline handles a different stream of events, enabling
non‑blocking real‑time communication.

Phase 5 --- AI Interview Report

After the interview ends, the transcript is analyzed to generate a
structured report.

response = await client.aio.models.generate_content(
    model=settings.gemini_chat_model,
    contents=prompt,
)

The report includes:

Skill scores
Communication clarity analysis
Body language feedback
Resume accuracy verification
Strengths and improvement areas
Personalized learning plan

Reports are stored in Google Cloud Storage.

Google Cloud Services Used

Service	Purpose
Gemini Live API	Real-time audio/video interview conversation
Gemini Chat API	Resume parsing and interview report generation
Google GenAI SDK	Client library for interacting with Gemini APIs
Cloud Run	Hosting the FastAPI backend
Cloud Storage	Storage for generated interview reports
Cloud Build	Container build and deployment pipeline

GitHub Repository

paartheee / InterVU-AI

InterAI — AI-Powered Mock Interview Platform

InterAI is a real-time AI mock interview agent powered by Google's Gemini Live API. It conducts natural, voice-and-video interviews tailored to a candidate's job description and resume, evaluates technical and soft skills in real-time, and generates structured performance reports.

Category: Live Agents

The Problem

Preparing for interviews is stressful and inefficient. Candidates lack access to realistic, personalized mock interviews that adapt in real-time. Traditional prep tools are static — they can't see you, hear your tone, or respond to your body language.

The Solution

InterAI creates a live AI interviewer ("Wayne") that:

Sees your video feed and analyzes body language and confidence in real-time
Hears your spoken answers with natural turn-taking and barge-in support
Speaks back with a natural voice, asking follow-up questions adapted to your responses
Evaluates you against the actual job description and your resume
Generates a structured scorecard report with…

View on GitHub

The repository includes:

Full source code
Deployment scripts
Gemini Live API integration
Architecture documentation

Conclusion

InterVU demonstrates how real‑time multimodal AI agents can
transform interview preparation.

By combining:

live audio and video streaming
structured reasoning
behavioral analysis
conversational AI

we can create systems that simulate real human interview
experiences, not just chatbot conversations.

The future of AI interfaces isn't chat.

It's conversation.

DEV Community