How I Built CareerForge — A Voice-First AI Career Coach with the Gemini Live API & Google Cloud

#ai #gemini #career #adk

Career counseling is expensive, inaccessible, and often based on generic, outdated advice. I wanted to change that. So I built CareerForge — a voice-first AI career coaching agent that can see your resume, talk to you in a natural conversation, and research the live job market to build you a personalized career roadmap in minutes.

Here's how I built the entire thing using Google's AI models and Google Cloud.

The Architecture: Multi-Agent with Google ADK
CareerForge is powered by a multi-agent architecture built on the Google Agent Development Kit (ADK). Instead of one monolithic LLM call, I split the workload across specialized agents:

forge_live — The live voice agent using Gemini 2.5 Flash's native audio model via the Gemini Live API. It handles bidirectional audio streaming for natural, interruptible conversations.
report_generator — A text-based agent that runs post-session. It has direct access to all analytical tools and synthesizes the conversation into a structured career plan.
Each agent has a distinct personality and toolset, which keeps the system modular and debuggable.

"See": Multimodal Resume Analysis
The first step is the "See" capability. Users upload their resume as a PDF or image, and Gemini 2.5 Flash's multimodal vision analyzes it instantly — extracting skills, job history, education, and certifications. This parsed data pre-populates the onboarding form and gives the live agent full context before the conversation even begins.

python

Vision-based resume extraction using Gemini

resume_data = await analyze_resume_vision(file_b64, content_type)
"Hear & Speak": Live Voice Coaching with the Gemini Live API
This is the heart of CareerForge. The user enters a live, bidirectional audio session powered by the Gemini Live API's native audio streaming.

On the frontend, I built a custom AudioWorklet that captures raw PCM audio from the microphone at 16kHz and streams it over a WebSocket to the FastAPI backend. The backend pipes it into the ADK's run_live() method, which maintains a persistent bidi connection with Gemini.

Audio responses stream back from Gemini as PCM chunks, which are decoded and queued for seamless playback using AudioBufferSourceNode in the browser.

typescript
// Frontend: Mic capture → WebSocket binary frames
worklet.port.onmessage = (e) => {
ws.send(e.data); // Raw PCM bytes
};
The result is a conversation that feels natural — you can interrupt the agent mid-sentence (barge-in), and it responds with emotional intelligence, adapting its coaching style based on your burnout level, risk tolerance, and timeline.

Real-Time Market Data with Google Search Grounding
Generic career advice is useless. That's why CareerForge uses Google Search grounding to pull live job market data during the report generation phase. When you say "I want to transition into AI Product Management," the agent searches for:

Current salary ranges for that role in your location
In-demand skills employers are hiring for right now
Industry growth trends and top employers
This means every career plan is grounded in today's reality, not month-old training data.

Deploying to Google Cloud Run
The entire application runs on Google Cloud Run:

Backend: FastAPI server with WebSocket support, containerized with Docker, deployed with session affinity for persistent live connections.
Frontend: React 19 + Vite static build served via nginx.
I wrote a fully automated
deploy.sh
script that handles the entire deployment pipeline — building containers, setting environment variables, configuring CORS, and routing traffic — with a single command:

bash
./deploy.sh my-project-id us-central1
The Hardest Challenges
Building this was not smooth sailing:

The 1008 Error: The Gemini Live API's native audio model would crash with APIError: 1008 when combined with function calling. I discovered that Python type hints had to be extremely strict (list[str] not list) to generate valid JSON schemas. One wrong type and the entire session would drop.

Event Loop Starvation: Long-running Google Search grounding calls were blocking the async event loop, killing the WebSocket mid-conversation. I had to refactor every tool to be fully async and carefully configure the ADK's thread pool.

Context Loss Between Agents: The nuanced details shared verbally — "I'm burned out," "I can only study 5 hours a week" — were getting lost before report generation. I engineered a transcript extraction pipeline that pulls a rich user profile from the conversation and injects it into every downstream tool call.

The Final Product
After a 10–20 minute voice session, CareerForge generates:

✅ A personalized skill gap analysis
✅ A timeline-aware career roadmap adapted to your constraints
✅ Course and certification recommendations
✅ A downloadable, branded PDF career plan
All powered by Gemini 2.5 Flash, the Gemini Live API, Google Search grounding, Google ADK, and Google Cloud Run.

🔗 GitHub: github.com/VPAI-Grok/careerforge_live

If you're interested in building with the Gemini Live API or Google ADK, I highly recommend diving in. The native audio capabilities are genuinely impressive — and the multi-agent patterns enabled by ADK make complex AI applications much more manageable.

Thanks for reading! Feel free to reach out if you have questions about the implementation.