marcusmayo

Posted on Oct 6

🎙️ Building an AI-Powered Interview Analyzer on GCP

#machinelearning #mlops #python #gcp

A production-grade AI project that listens, scores, and coaches — built over a single weekend as part of my Weekend AI Project Series: Adventures in Vibe Coding.

AI interviews are messy. Human feedback is subjective.

So I built a system that listens, transcribes, analyzes, and mentors.

In this deep dive, I’ll show you how I:

Deployed a FastAPI backend with Whisper ASR for transcription
Integrated 3 NLP models (RoBERTa, Toxic-BERT, mDeBERTa) for sentiment and competency scoring
Added Gemini 2.0 Flash for human-like feedback
Migrated from Cloud Run to Compute Engine for production workloads

By the end, you’ll see how to turn a weekend experiment into a fully-functional, production-ready AI application — the kind of build that gets noticed by both engineers and hiring managers.

🚀 Project Overview

This project demonstrates how to build a production-ready AI interview analysis system — one that evaluates communication quality, professionalism, and competency in recorded interviews.

It combines:

🎙️ Speech-to-text (ASR) using Whisper

🧠 NLP scoring with RoBERTa, Toxic-BERT, and mDeBERTa

🤖 Feedback generation with Gemini 2.0 Flash

The system produces quantitative scores, segment-level analytics, and contextual AI feedback — the kind that turns interview recordings into actionable coaching data.

⚙️ Architecture Overview

The pipeline runs on Google Cloud Compute Engine (n1-standard-16) with the following key components:

Audio Upload → Whisper ASR → NLP Scoring → Ensemble Aggregation → Gemini Feedback → UI Visualization

Components:

Frontend (HTML + JS): Handles uploads, displays scores, and feedback.

FastAPI Backend (Python 3.11): Routes processing, manages inference requests.

Whisper Models (ASR): Supports tiny → medium variants for speed/accuracy tradeoffs.

NLP Models (Hugging Face):

cardiffnlp/twitter-roberta-base-sentiment

unitary/toxic-bert

MoritzLaurer/mDeBERTa-v3-base-mnli-xnli

LLM Feedback: Powered by Google Gemini 2.0 Flash for summarization and recommendations.

🧠 Core ML Pipeline

Here’s how each component works together:

Transcription (ASR)

The Whisper model transcribes the uploaded interview audio (MP3/WAV/M4A).

from faster_whisper import WhisperModel

model = WhisperModel("tiny", device="cuda")
segments, _ = model.transcribe("interview.m4a")
transcript = " ".join([s.text for s in segments])

NLP Scoring

Each transcript segment is passed through three different models:

from transformers import pipeline

sentiment = pipeline("sentiment-analysis", model="cardiffnlp/twitter-roberta-base-sentiment")
toxicity = pipeline("text-classification", model="unitary/toxic-bert")
competency = pipeline("zero-shot-classification", model="MoritzLaurer/mDeBERTa-v3-base-mnli-xnli")

result = {
"sentiment": sentiment(transcript[:512])[0]["score"],
"toxicity": 1 - toxicity(transcript[:512])[0]["score"], # inverted
"competency": competency(transcript[:512], ["leadership", "communication", "technical skill"])
}

Ensemble Scoring System

The scores are normalized and weighted across five dimensions:

Component Weight Purpose
Sentiment 0.25 Emotional tone
Toxicity 0.20 Professionalism
Competency 0.25 Skill fit
Keywords 0.15 Domain-specific terms
Filler Words 0.15 Clarity of expression

This produces an overall “Interview Fit Score” between 0–100.

AI Feedback (Gemini Integration)

After scoring, Gemini 2.0 Flash generates structured feedback:

prompt = f"""
You are an AI interviewer. Based on the following transcript and scores:
{transcript[:2000]}
Scores: {result}
Provide:

3 Strengths
3 Areas for Improvement
2 Next Steps """

response = gemini.generate_text(prompt)

Output Example:

Strengths: Excellent communication and positive tone.

Improvement: Needs stronger technical examples.

Next Steps: Practice STAR method; refine domain language.

🧩 Visualization

The frontend visualizes scores with color-coded progress bars and an NLP-driven performance timeline:

📊 Dashboard Example

📋 AI Feedback Example

🧱 Deployment Details

Cloud: Google Cloud Compute Engine

Machine: n1-standard-16 (16 vCPUs, 64GB RAM)

Environment: Dockerized FastAPI service

Storage: Local + Cloud Storage (optional for large files)

Monitoring: Basic logging via Cloud Logging

Note: The system originally ran on Cloud Run, but due to the 32MB file upload limit, it was migrated to Compute Engine for unrestricted workloads.

⚠️ Challenges & Fixes
Issue Root Cause Resolution
Cloud Run upload limit 32MB request cap Migrated to Compute Engine VM
Long Whisper inference Model size vs. time Added model selection (tiny→medium)
Flat score ranges Heuristic-only scoring Replaced with NLP-based segment scoring
Dependency errors Missing faster_whisper Pinned requirements + venv isolation
Frontend API mismatch Response schema drift Added unified response format + error handling

🔍 Key Learnings

Infrastructure matters — Serverless is not always production-friendly.

Speed/Accuracy tradeoff — Tiny vs. Medium Whisper can be 8× faster for 90% of the accuracy.

Heuristics ≠ ML — Real models make insights meaningful.

UX is part of ML — Users need visible progress and clear outcomes.

🧭 Future Roadmap

WhisperX Word-Level Analysis
→ Enables clickable word-level scoring visualization.

Role-Aware Rubrics
→ Zero-shot matching between candidate responses and job descriptions.

Real-Time SSE Updates
→ Show live progress of transcription and analysis in the UI.

🧰 Tech Stack Summary
Category Tools / Services
Cloud GCP Compute Engine
Backend FastAPI, Python 3.11
ML Whisper, RoBERTa, Toxic-BERT, mDeBERTa
LLM Gemini 2.0 Flash
Frontend HTML + JS
Infra Docker, venv, Cloud Logging

📂 Project Structure
interview-predictor/
├── app.py # FastAPI backend
├── utils/
│ ├── asr_processor.py # Whisper transcription
│ ├── nlp_analyzer.py # NLP model scoring
│ ├── ensemble_scorer.py # Weighted aggregation
│ ├── timeline_analyzer.py # Segment analysis
│ └── llm_feedback.py # Gemini integration
├── static/
│ └── index.html # Frontend UI
├── requirements.txt # Dependencies
└── Dockerfile # Deployment setup

📁 GitHub

🧠 Main portfolio: https://github.com/marcusmayo/machine-learning-portfolio

🚀 AI/ML portfolio (new repo): https://github.com/marcusmayo/ai-ml-portfolio-2

(New projects will be added here as the portfolio expands.)

💬 Closing Thoughts

This project taught me that the hardest part of AI engineering isn’t model tuning — it’s designing systems that work under real-world constraints.

If you’re an ML engineer, data scientist, or product builder exploring AI system design, this project is a great blueprint to start from.

Connect & Collaborate
I’m open for:

🤝 AI Product Coaching

🧠 Consulting on AI/ML System Design

💼 Collaborations with startups & innovation teams

Follow my work:
🧠 Weekend AI Project Series