DEV Community

Parthiban Marimuthu
Parthiban Marimuthu

Posted on

Building InterVU: A Real-Time AI Interview Agent with Google's Gemini Live API

This post was created for the Gemini Live Agent Challenge.

Gemini Live Agent Challenge: Redefining Interaction: From Static Chatbots to Immersive Experiences - Devpost

Redefining Interaction: From Static Chatbots to Immersive Experiences

geminiliveagentchallenge.devpost.com

Demo


Inspiration

Technical interview preparation is broken.

Most candidates rehearse answers in isolation or practice with
text-based mock interview tools. But real interviews evaluate much more
than just the correctness of an answer.

Interviewers observe:

  • Eye contact
  • Confidence
  • Communication clarity
  • Body language
  • Ability to think under pressure

Existing mock interview tools are chatbots. They cannot see you,
hear your tone, or interrupt when you ramble.

Real interviewers do.

So I built InterVU --- a real‑time AI interview agent that can:

  • 👀 See you through your webcam
  • 🎤 Hear your voice in real time
  • 🗣️ Speak naturally like an interviewer
  • 📊 Analyze your interview performance

The goal was to simulate the experience of sitting across from a real
engineering hiring manager
.


What is InterVU?

InterVU is a real‑time AI mock interview platform powered by
Google Gemini Live API.

It can:

  • Analyze job descriptions
  • Parse candidate resumes
  • Conduct live interviews
  • Monitor body language and communication
  • Generate structured interview reports

Instead of practicing alone, candidates receive real‑time coaching and
a structured feedback report after the interview.


System Architecture

Architecture diagram showing browser streaming audio and video through<br>
WebSocket to a FastAPI backend running on Cloud Run, which communicates<br>
with Gemini Live API for real‑time interaction and stores interview<br>
reports in Google Cloud<br>
Storage

Browser (SPA) ←→ WebSocket ←→ FastAPI (Cloud Run) ←→ Gemini Live API
                                    ↓                      ↓
                               SQLite DB            Gemini Chat API
                                    ↓                      ↓
                              GCS Storage           Structured Reports
Enter fullscreen mode Exit fullscreen mode

Design Goals

The architecture focuses on three core principles.

Low latency

Real‑time voice conversations require extremely fast responses.

Minimal frontend overhead

The frontend uses vanilla JavaScript instead of heavy frameworks to
minimize latency.

Async backend

FastAPI with asyncio manages multiple concurrent streaming
pipelines.


Tech Stack

Layer Technology
Frontend HTML, CSS, Vanilla JavaScript
Backend FastAPI
Streaming WebSockets
AI Gemini Live API
Structured Extraction LangChain
Deployment Google Cloud Run
Storage Google Cloud Storage

Phase 1 --- Smart Setup: JD + Resume Parsing

Before the interview begins, InterVU analyzes both the job description
and the candidate's resume
.

This ensures the interview is grounded in the actual job
requirements
.

skills, parsed_resume = await asyncio.gather(
    parse_job_description(request.job_description),
    parse_resume(request.resume_text),
)

skill_gap = await analyze_skill_gap(skills, parsed_resume)
Enter fullscreen mode Exit fullscreen mode

The system extracts:

  • Required skills
  • Preferred skills
  • Seniority level
  • Domain
  • Responsibilities
  • Programming languages
  • Frameworks
  • Projects

This allows the AI interviewer to ask highly relevant questions
instead of generic ones.


Phase 2 --- The 3‑State Interview Engine

The AI interviewer persona Wayne operates using a three‑state
behavior engine
.

Interrogation Mode

Wayne cross‑references the resume against the job description.

Example:

"I see Python listed on your resume. Can you describe a project where
you used asynchronous programming?"

Visual Evaluation

During the interview the system analyzes:

  • Eye contact
  • Head orientation
  • Posture
  • Speaking duration

Wayne can interrupt if the candidate loses focus or rambles.

Tutor Mode

If a candidate struggles with a question:

  1. Wayne explains the concept briefly
  2. Asks a simplified follow‑up
  3. Returns to interview mode

This makes the interview educational instead of discouraging.


Phase 3 --- Real‑Time Conversations with Gemini Live API

The core component is the GeminiLiveSession class managing a
bidirectional streaming connection.

config_kwargs = {
    "system_instruction": types.Content(
        parts=[types.Part(text=system_text)]
    ),
}
Enter fullscreen mode Exit fullscreen mode

Voice interaction is enabled with:

config_kwargs["response_modalities"] = ["AUDIO"]
Enter fullscreen mode Exit fullscreen mode

This allows natural spoken conversations with the AI interviewer.


The Hardest Problem: Turn‑Taking

Real‑time conversations require correct turn detection.

Layer 1 --- Voice Activity Detection

if (!this._isUserSpeaking && rms < 0.03) {
    this._noiseFloor = this._noiseFloor * 0.95 + rms * 0.05;
}
Enter fullscreen mode Exit fullscreen mode

Layer 2 --- Echo Suppression

Microphone input is disabled when the AI speaks to prevent feedback
loops.

Layer 3 --- Silence Watchdog

If the candidate stops speaking for several seconds, the server signals
the model to respond.


Phase 4 --- Concurrent WebSocket Pipelines

The interview session runs three concurrent async tasks.

await asyncio.gather(
    forward_to_gemini(),
    forward_to_browser(),
    timer_task(),
)
Enter fullscreen mode Exit fullscreen mode

Each pipeline handles a different stream of events, enabling
non‑blocking real‑time communication.


Phase 5 --- AI Interview Report

After the interview ends, the transcript is analyzed to generate a
structured report.

response = await client.aio.models.generate_content(
    model=settings.gemini_chat_model,
    contents=prompt,
)
Enter fullscreen mode Exit fullscreen mode

The report includes:

  • Skill scores
  • Communication clarity analysis
  • Body language feedback
  • Resume accuracy verification
  • Strengths and improvement areas
  • Personalized learning plan

Reports are stored in Google Cloud Storage.


Google Cloud Services Used

Service Purpose
Gemini Live API Real-time audio/video interview conversation
Gemini Chat API Resume parsing and interview report generation
Google GenAI SDK Client library for interacting with Gemini APIs
Cloud Run Hosting the FastAPI backend
Cloud Storage Storage for generated interview reports
Cloud Build Container build and deployment pipeline

GitHub Repository

InterAI — AI-Powered Mock Interview Platform

InterAI is a real-time AI mock interview agent powered by Google's Gemini Live API. It conducts natural, voice-and-video interviews tailored to a candidate's job description and resume, evaluates technical and soft skills in real-time, and generates structured performance reports.

Category: Live Agents

The Problem

Preparing for interviews is stressful and inefficient. Candidates lack access to realistic, personalized mock interviews that adapt in real-time. Traditional prep tools are static — they can't see you, hear your tone, or respond to your body language.

The Solution

InterAI creates a live AI interviewer ("Wayne") that:

  • Sees your video feed and analyzes body language and confidence in real-time
  • Hears your spoken answers with natural turn-taking and barge-in support
  • Speaks back with a natural voice, asking follow-up questions adapted to your responses
  • Evaluates you against the actual job description and your resume
  • Generates a structured scorecard report with…

The repository includes:

  • Full source code
  • Deployment scripts
  • Gemini Live API integration
  • Architecture documentation

Conclusion

InterVU demonstrates how real‑time multimodal AI agents can
transform interview preparation.

By combining:

  • live audio and video streaming
  • structured reasoning
  • behavioral analysis
  • conversational AI

we can create systems that simulate real human interview
experiences
, not just chatbot conversations.

The future of AI interfaces isn't chat.

It's conversation.

Top comments (0)