A production-grade AI project that listens, scores, and coaches β built over a single weekend as part of my Weekend AI Project Series: Adventures in Vibe Coding.
AI interviews are messy. Human feedback is subjective.
So I built a system that listens, transcribes, analyzes, and mentors.
In this deep dive, Iβll show you how I:
- Deployed a FastAPI backend with Whisper ASR for transcription
- Integrated 3 NLP models (RoBERTa, Toxic-BERT, mDeBERTa) for sentiment and competency scoring
- Added Gemini 2.0 Flash for human-like feedback
- Migrated from Cloud Run to Compute Engine for production workloads
By the end, youβll see how to turn a weekend experiment into a fully-functional, production-ready AI application β the kind of build that gets noticed by both engineers and hiring managers.
π Project Overview
This project demonstrates how to build a production-ready AI interview analysis system β one that evaluates communication quality, professionalism, and competency in recorded interviews.
It combines:
ποΈ Speech-to-text (ASR) using Whisper
π§ NLP scoring with RoBERTa, Toxic-BERT, and mDeBERTa
π€ Feedback generation with Gemini 2.0 Flash
The system produces quantitative scores, segment-level analytics, and contextual AI feedback β the kind that turns interview recordings into actionable coaching data.
βοΈ Architecture Overview
The pipeline runs on Google Cloud Compute Engine (n1-standard-16) with the following key components:
Audio Upload β Whisper ASR β NLP Scoring β Ensemble Aggregation β Gemini Feedback β UI Visualization
Components:
Frontend (HTML + JS): Handles uploads, displays scores, and feedback.
FastAPI Backend (Python 3.11): Routes processing, manages inference requests.
Whisper Models (ASR): Supports tiny β medium variants for speed/accuracy tradeoffs.
NLP Models (Hugging Face):
cardiffnlp/twitter-roberta-base-sentiment
unitary/toxic-bert
MoritzLaurer/mDeBERTa-v3-base-mnli-xnli
LLM Feedback: Powered by Google Gemini 2.0 Flash for summarization and recommendations.
π§ Core ML Pipeline
Hereβs how each component works together:
- Transcription (ASR)
The Whisper model transcribes the uploaded interview audio (MP3/WAV/M4A).
from faster_whisper import WhisperModel
model = WhisperModel("tiny", device="cuda")
segments, _ = model.transcribe("interview.m4a")
transcript = " ".join([s.text for s in segments])
- NLP Scoring
Each transcript segment is passed through three different models:
from transformers import pipeline
sentiment = pipeline("sentiment-analysis", model="cardiffnlp/twitter-roberta-base-sentiment")
toxicity = pipeline("text-classification", model="unitary/toxic-bert")
competency = pipeline("zero-shot-classification", model="MoritzLaurer/mDeBERTa-v3-base-mnli-xnli")
result = {
"sentiment": sentiment(transcript[:512])[0]["score"],
"toxicity": 1 - toxicity(transcript[:512])[0]["score"], # inverted
"competency": competency(transcript[:512], ["leadership", "communication", "technical skill"])
}
- Ensemble Scoring System
The scores are normalized and weighted across five dimensions:
Component Weight Purpose
Sentiment 0.25 Emotional tone
Toxicity 0.20 Professionalism
Competency 0.25 Skill fit
Keywords 0.15 Domain-specific terms
Filler Words 0.15 Clarity of expression
This produces an overall βInterview Fit Scoreβ between 0β100.
- AI Feedback (Gemini Integration)
After scoring, Gemini 2.0 Flash generates structured feedback:
prompt = f"""
You are an AI interviewer. Based on the following transcript and scores:
{transcript[:2000]}
Scores: {result}
Provide:
- 3 Strengths
- 3 Areas for Improvement
- 2 Next Steps """
response = gemini.generate_text(prompt)
Output Example:
Strengths: Excellent communication and positive tone.
Improvement: Needs stronger technical examples.
Next Steps: Practice STAR method; refine domain language.
π§© Visualization
The frontend visualizes scores with color-coded progress bars and an NLP-driven performance timeline:
π Dashboard Example
π AI Feedback Example
π§± Deployment Details
Cloud: Google Cloud Compute Engine
Machine: n1-standard-16 (16 vCPUs, 64GB RAM)
Environment: Dockerized FastAPI service
Storage: Local + Cloud Storage (optional for large files)
Monitoring: Basic logging via Cloud Logging
Note: The system originally ran on Cloud Run, but due to the 32MB file upload limit, it was migrated to Compute Engine for unrestricted workloads.
β οΈ Challenges & Fixes
Issue Root Cause Resolution
Cloud Run upload limit 32MB request cap Migrated to Compute Engine VM
Long Whisper inference Model size vs. time Added model selection (tinyβmedium)
Flat score ranges Heuristic-only scoring Replaced with NLP-based segment scoring
Dependency errors Missing faster_whisper Pinned requirements + venv isolation
Frontend API mismatch Response schema drift Added unified response format + error handling
π Key Learnings
Infrastructure matters β Serverless is not always production-friendly.
Speed/Accuracy tradeoff β Tiny vs. Medium Whisper can be 8Γ faster for 90% of the accuracy.
Heuristics β ML β Real models make insights meaningful.
UX is part of ML β Users need visible progress and clear outcomes.
π§ Future Roadmap
WhisperX Word-Level Analysis
β Enables clickable word-level scoring visualization.
Role-Aware Rubrics
β Zero-shot matching between candidate responses and job descriptions.
Real-Time SSE Updates
β Show live progress of transcription and analysis in the UI.
π§° Tech Stack Summary
Category Tools / Services
Cloud GCP Compute Engine
Backend FastAPI, Python 3.11
ML Whisper, RoBERTa, Toxic-BERT, mDeBERTa
LLM Gemini 2.0 Flash
Frontend HTML + JS
Infra Docker, venv, Cloud Logging
π Project Structure
interview-predictor/
βββ app.py # FastAPI backend
βββ utils/
β βββ asr_processor.py # Whisper transcription
β βββ nlp_analyzer.py # NLP model scoring
β βββ ensemble_scorer.py # Weighted aggregation
β βββ timeline_analyzer.py # Segment analysis
β βββ llm_feedback.py # Gemini integration
βββ static/
β βββ index.html # Frontend UI
βββ requirements.txt # Dependencies
βββ Dockerfile # Deployment setup
π GitHub
π§ Main portfolio: https://github.com/marcusmayo/machine-learning-portfolio
π AI/ML portfolio (new repo): https://github.com/marcusmayo/ai-ml-portfolio-2
(New projects will be added here as the portfolio expands.)
π¬ Closing Thoughts
This project taught me that the hardest part of AI engineering isnβt model tuning β itβs designing systems that work under real-world constraints.
If youβre an ML engineer, data scientist, or product builder exploring AI system design, this project is a great blueprint to start from.
Connect & Collaborate
Iβm open for:
π€ AI Product Coaching
π§ Consulting on AI/ML System Design
πΌ Collaborations with startups & innovation teams
Follow my work:
π LinkedIn
π GitHub
Top comments (1)
π¬ If you enjoyed this deep dive, follow for more weekend AI projects β from cost-optimized MLOps deployments to production-grade LLM builds.