This post was created for the Gemini Live Agent Challenge.
Demo
Inspiration
Technical interview preparation is broken.
Most candidates rehearse answers in isolation or practice with
text-based mock interview tools. But real interviews evaluate much more
than just the correctness of an answer.
Interviewers observe:
- Eye contact
- Confidence
- Communication clarity
- Body language
- Ability to think under pressure
Existing mock interview tools are chatbots. They cannot see you,
hear your tone, or interrupt when you ramble.
Real interviewers do.
So I built InterVU --- a real‑time AI interview agent that can:
- 👀 See you through your webcam
- 🎤 Hear your voice in real time
- 🗣️ Speak naturally like an interviewer
- 📊 Analyze your interview performance
The goal was to simulate the experience of sitting across from a real
engineering hiring manager.
What is InterVU?
InterVU is a real‑time AI mock interview platform powered by
Google Gemini Live API.
It can:
- Analyze job descriptions
- Parse candidate resumes
- Conduct live interviews
- Monitor body language and communication
- Generate structured interview reports
Instead of practicing alone, candidates receive real‑time coaching and
a structured feedback report after the interview.
System Architecture
Browser (SPA) ←→ WebSocket ←→ FastAPI (Cloud Run) ←→ Gemini Live API
↓ ↓
SQLite DB Gemini Chat API
↓ ↓
GCS Storage Structured Reports
Design Goals
The architecture focuses on three core principles.
Low latency
Real‑time voice conversations require extremely fast responses.
Minimal frontend overhead
The frontend uses vanilla JavaScript instead of heavy frameworks to
minimize latency.
Async backend
FastAPI with asyncio manages multiple concurrent streaming
pipelines.
Tech Stack
| Layer | Technology |
|---|---|
| Frontend | HTML, CSS, Vanilla JavaScript |
| Backend | FastAPI |
| Streaming | WebSockets |
| AI | Gemini Live API |
| Structured Extraction | LangChain |
| Deployment | Google Cloud Run |
| Storage | Google Cloud Storage |
Phase 1 --- Smart Setup: JD + Resume Parsing
Before the interview begins, InterVU analyzes both the job description
and the candidate's resume.
This ensures the interview is grounded in the actual job
requirements.
skills, parsed_resume = await asyncio.gather(
parse_job_description(request.job_description),
parse_resume(request.resume_text),
)
skill_gap = await analyze_skill_gap(skills, parsed_resume)
The system extracts:
- Required skills
- Preferred skills
- Seniority level
- Domain
- Responsibilities
- Programming languages
- Frameworks
- Projects
This allows the AI interviewer to ask highly relevant questions
instead of generic ones.
Phase 2 --- The 3‑State Interview Engine
The AI interviewer persona Wayne operates using a three‑state
behavior engine.
Interrogation Mode
Wayne cross‑references the resume against the job description.
Example:
"I see Python listed on your resume. Can you describe a project where
you used asynchronous programming?"
Visual Evaluation
During the interview the system analyzes:
- Eye contact
- Head orientation
- Posture
- Speaking duration
Wayne can interrupt if the candidate loses focus or rambles.
Tutor Mode
If a candidate struggles with a question:
- Wayne explains the concept briefly
- Asks a simplified follow‑up
- Returns to interview mode
This makes the interview educational instead of discouraging.
Phase 3 --- Real‑Time Conversations with Gemini Live API
The core component is the GeminiLiveSession class managing a
bidirectional streaming connection.
config_kwargs = {
"system_instruction": types.Content(
parts=[types.Part(text=system_text)]
),
}
Voice interaction is enabled with:
config_kwargs["response_modalities"] = ["AUDIO"]
This allows natural spoken conversations with the AI interviewer.
The Hardest Problem: Turn‑Taking
Real‑time conversations require correct turn detection.
Layer 1 --- Voice Activity Detection
if (!this._isUserSpeaking && rms < 0.03) {
this._noiseFloor = this._noiseFloor * 0.95 + rms * 0.05;
}
Layer 2 --- Echo Suppression
Microphone input is disabled when the AI speaks to prevent feedback
loops.
Layer 3 --- Silence Watchdog
If the candidate stops speaking for several seconds, the server signals
the model to respond.
Phase 4 --- Concurrent WebSocket Pipelines
The interview session runs three concurrent async tasks.
await asyncio.gather(
forward_to_gemini(),
forward_to_browser(),
timer_task(),
)
Each pipeline handles a different stream of events, enabling
non‑blocking real‑time communication.
Phase 5 --- AI Interview Report
After the interview ends, the transcript is analyzed to generate a
structured report.
response = await client.aio.models.generate_content(
model=settings.gemini_chat_model,
contents=prompt,
)
The report includes:
- Skill scores
- Communication clarity analysis
- Body language feedback
- Resume accuracy verification
- Strengths and improvement areas
- Personalized learning plan
Reports are stored in Google Cloud Storage.
Google Cloud Services Used
| Service | Purpose |
|---|---|
| Gemini Live API | Real-time audio/video interview conversation |
| Gemini Chat API | Resume parsing and interview report generation |
| Google GenAI SDK | Client library for interacting with Gemini APIs |
| Cloud Run | Hosting the FastAPI backend |
| Cloud Storage | Storage for generated interview reports |
| Cloud Build | Container build and deployment pipeline |
GitHub Repository
InterAI — AI-Powered Mock Interview Platform
InterAI is a real-time AI mock interview agent powered by Google's Gemini Live API. It conducts natural, voice-and-video interviews tailored to a candidate's job description and resume, evaluates technical and soft skills in real-time, and generates structured performance reports.
Category: Live Agents
The Problem
Preparing for interviews is stressful and inefficient. Candidates lack access to realistic, personalized mock interviews that adapt in real-time. Traditional prep tools are static — they can't see you, hear your tone, or respond to your body language.
The Solution
InterAI creates a live AI interviewer ("Wayne") that:
- Sees your video feed and analyzes body language and confidence in real-time
- Hears your spoken answers with natural turn-taking and barge-in support
- Speaks back with a natural voice, asking follow-up questions adapted to your responses
- Evaluates you against the actual job description and your resume
- Generates a structured scorecard report with…
The repository includes:
- Full source code
- Deployment scripts
- Gemini Live API integration
- Architecture documentation
Conclusion
InterVU demonstrates how real‑time multimodal AI agents can
transform interview preparation.
By combining:
- live audio and video streaming
- structured reasoning
- behavioral analysis
- conversational AI
we can create systems that simulate real human interview
experiences, not just chatbot conversations.
The future of AI interfaces isn't chat.
It's conversation.

Top comments (0)