DEV Community

Cover image for πŸŽ™οΈ Building an AI-Powered Interview Analyzer on GCP
marcusmayo
marcusmayo

Posted on

πŸŽ™οΈ Building an AI-Powered Interview Analyzer on GCP

A production-grade AI project that listens, scores, and coaches β€” built over a single weekend as part of my Weekend AI Project Series: Adventures in Vibe Coding.

AI interviews are messy. Human feedback is subjective.

So I built a system that listens, transcribes, analyzes, and mentors.

In this deep dive, I’ll show you how I:

  • Deployed a FastAPI backend with Whisper ASR for transcription
  • Integrated 3 NLP models (RoBERTa, Toxic-BERT, mDeBERTa) for sentiment and competency scoring
  • Added Gemini 2.0 Flash for human-like feedback
  • Migrated from Cloud Run to Compute Engine for production workloads

By the end, you’ll see how to turn a weekend experiment into a fully-functional, production-ready AI application β€” the kind of build that gets noticed by both engineers and hiring managers.


πŸš€ Project Overview

This project demonstrates how to build a production-ready AI interview analysis system β€” one that evaluates communication quality, professionalism, and competency in recorded interviews.

It combines:

πŸŽ™οΈ Speech-to-text (ASR) using Whisper

🧠 NLP scoring with RoBERTa, Toxic-BERT, and mDeBERTa

πŸ€– Feedback generation with Gemini 2.0 Flash

The system produces quantitative scores, segment-level analytics, and contextual AI feedback β€” the kind that turns interview recordings into actionable coaching data.

βš™οΈ Architecture Overview

The pipeline runs on Google Cloud Compute Engine (n1-standard-16) with the following key components:

Audio Upload β†’ Whisper ASR β†’ NLP Scoring β†’ Ensemble Aggregation β†’ Gemini Feedback β†’ UI Visualization

Components:

Frontend (HTML + JS): Handles uploads, displays scores, and feedback.

FastAPI Backend (Python 3.11): Routes processing, manages inference requests.

Whisper Models (ASR): Supports tiny β†’ medium variants for speed/accuracy tradeoffs.

NLP Models (Hugging Face):

cardiffnlp/twitter-roberta-base-sentiment

unitary/toxic-bert

MoritzLaurer/mDeBERTa-v3-base-mnli-xnli

LLM Feedback: Powered by Google Gemini 2.0 Flash for summarization and recommendations.

🧠 Core ML Pipeline

Here’s how each component works together:

  1. Transcription (ASR)

The Whisper model transcribes the uploaded interview audio (MP3/WAV/M4A).

from faster_whisper import WhisperModel

model = WhisperModel("tiny", device="cuda")
segments, _ = model.transcribe("interview.m4a")
transcript = " ".join([s.text for s in segments])

  1. NLP Scoring

Each transcript segment is passed through three different models:

from transformers import pipeline

sentiment = pipeline("sentiment-analysis", model="cardiffnlp/twitter-roberta-base-sentiment")
toxicity = pipeline("text-classification", model="unitary/toxic-bert")
competency = pipeline("zero-shot-classification", model="MoritzLaurer/mDeBERTa-v3-base-mnli-xnli")

result = {
"sentiment": sentiment(transcript[:512])[0]["score"],
"toxicity": 1 - toxicity(transcript[:512])[0]["score"], # inverted
"competency": competency(transcript[:512], ["leadership", "communication", "technical skill"])
}

  1. Ensemble Scoring System

The scores are normalized and weighted across five dimensions:

Component Weight Purpose
Sentiment 0.25 Emotional tone
Toxicity 0.20 Professionalism
Competency 0.25 Skill fit
Keywords 0.15 Domain-specific terms
Filler Words 0.15 Clarity of expression

This produces an overall β€œInterview Fit Score” between 0–100.

  1. AI Feedback (Gemini Integration)

After scoring, Gemini 2.0 Flash generates structured feedback:

prompt = f"""
You are an AI interviewer. Based on the following transcript and scores:
{transcript[:2000]}
Scores: {result}
Provide:

  1. 3 Strengths
  2. 3 Areas for Improvement
  3. 2 Next Steps """

response = gemini.generate_text(prompt)

Output Example:

Strengths: Excellent communication and positive tone.

Improvement: Needs stronger technical examples.

Next Steps: Practice STAR method; refine domain language.

🧩 Visualization

The frontend visualizes scores with color-coded progress bars and an NLP-driven performance timeline:

πŸ“Š Dashboard Example

πŸ“Έ Figure 1: Scoring dashboard with live component breakdowns (Example dashboard showing sentiment, toxicity, and competency breakdowns)

πŸ“‹ AI Feedback Example

πŸ“Έ Figure 2: AI-generated interview feedback and improvement plan (Gemini-powered strengths, improvement areas, and next steps)

🧱 Deployment Details

Cloud: Google Cloud Compute Engine

Machine: n1-standard-16 (16 vCPUs, 64GB RAM)

Environment: Dockerized FastAPI service

Storage: Local + Cloud Storage (optional for large files)

Monitoring: Basic logging via Cloud Logging

Note: The system originally ran on Cloud Run, but due to the 32MB file upload limit, it was migrated to Compute Engine for unrestricted workloads.

⚠️ Challenges & Fixes
Issue Root Cause Resolution
Cloud Run upload limit 32MB request cap Migrated to Compute Engine VM
Long Whisper inference Model size vs. time Added model selection (tiny→medium)
Flat score ranges Heuristic-only scoring Replaced with NLP-based segment scoring
Dependency errors Missing faster_whisper Pinned requirements + venv isolation
Frontend API mismatch Response schema drift Added unified response format + error handling
πŸ” Key Learnings

Infrastructure matters β€” Serverless is not always production-friendly.

Speed/Accuracy tradeoff β€” Tiny vs. Medium Whisper can be 8Γ— faster for 90% of the accuracy.

Heuristics β‰  ML β€” Real models make insights meaningful.

UX is part of ML β€” Users need visible progress and clear outcomes.

🧭 Future Roadmap

WhisperX Word-Level Analysis
β†’ Enables clickable word-level scoring visualization.

Role-Aware Rubrics
β†’ Zero-shot matching between candidate responses and job descriptions.

Real-Time SSE Updates
β†’ Show live progress of transcription and analysis in the UI.

🧰 Tech Stack Summary
Category Tools / Services
Cloud GCP Compute Engine
Backend FastAPI, Python 3.11
ML Whisper, RoBERTa, Toxic-BERT, mDeBERTa
LLM Gemini 2.0 Flash
Frontend HTML + JS
Infra Docker, venv, Cloud Logging
πŸ“‚ Project Structure
interview-predictor/
β”œβ”€β”€ app.py # FastAPI backend
β”œβ”€β”€ utils/
β”‚ β”œβ”€β”€ asr_processor.py # Whisper transcription
β”‚ β”œβ”€β”€ nlp_analyzer.py # NLP model scoring
β”‚ β”œβ”€β”€ ensemble_scorer.py # Weighted aggregation
β”‚ β”œβ”€β”€ timeline_analyzer.py # Segment analysis
β”‚ └── llm_feedback.py # Gemini integration
β”œβ”€β”€ static/
β”‚ └── index.html # Frontend UI
β”œβ”€β”€ requirements.txt # Dependencies
└── Dockerfile # Deployment setup

πŸ“ GitHub

🧠 Main portfolio: https://github.com/marcusmayo/machine-learning-portfolio

πŸš€ AI/ML portfolio (new repo): https://github.com/marcusmayo/ai-ml-portfolio-2

(New projects will be added here as the portfolio expands.)

πŸ’¬ Closing Thoughts

This project taught me that the hardest part of AI engineering isn’t model tuning β€” it’s designing systems that work under real-world constraints.

If you’re an ML engineer, data scientist, or product builder exploring AI system design, this project is a great blueprint to start from.

Connect & Collaborate
I’m open for:

🀝 AI Product Coaching

🧠 Consulting on AI/ML System Design

πŸ’Ό Collaborations with startups & innovation teams

Follow my work:
πŸ”— LinkedIn

πŸ”— GitHub

Top comments (1)

Collapse
 
marcusmayo profile image
marcusmayo

πŸ’¬ If you enjoyed this deep dive, follow for more weekend AI projects β€” from cost-optimized MLOps deployments to production-grade LLM builds.