sanjay kumar

Posted on Dec 23, 2025 • Edited on Jan 3

I Built Real-Time Voice AI in 3 Days (YC Interview)

#webdev #programming #ai #devops

Freya Voice AI Agent Console (Forward-Deployed Engineering Deep Dive)

As part of a YC-style technical assessment, I built Freya Voice AI Agent Console — a production-grade, real-time voice-to-voice AI system focused on low latency, clean architecture, and real-world constraints.

The goal wasn’t just to “make it work”, but to design something that feels forward-deployable: a system you could realistically ship, iterate on, and operate.

👉 Source code:
https://github.com/05sanjaykumar/Freya-Voice-YC25-Assessment

🎯 What Was Built

Freya is a real-time conversational voice AI that supports:

Bidirectional voice-to-voice conversations
Sub-200ms latency streaming
Prompt versioning via a web console
Session metrics and analytics
Fully Dockerized deployment

The project was built in 3 days, prioritizing voice pipelines over chat UIs, similar to real YC engineering evaluations.

🔊 Voice AI Pipeline (End-to-End)

The core pipeline looks like this:

User speaks → audio streamed via WebRTC
Speech-to-Text (STT) → Groq Whisper
LLM reasoning → Groq LLaMA-3.1-8B
Text-to-Speech (TTS) → Cartesia Sonic
Audio streamed back to the user in real time

To improve accuracy and latency:

Silero VAD detects speech boundaries
Audio chunks are processed incrementally
Responses are streamed, not buffered

This architecture mirrors how modern voice assistants are built in production.

🏗️ System Architecture

[ Browser Client ]
     ↓ WebRTC (LiveKit)
[ Next.js Frontend ]
     ↓ HTTP / WS
[ Python Voice Agent ]
     ↓
[ Groq (STT + LLM) ] → [ Cartesia (TTS) ]

Why LiveKit?

LiveKit handles:

WebRTC negotiation
Low-latency audio routing
Session lifecycle management

This avoids re-implementing complex real-time networking logic — a key forward-deployed engineering decision.

🧰 Tech Stack

Frontend

Next.js 15 (App Router)
TypeScript
Tailwind CSS
LiveKit Client SDK

Backend

Python 3.11
LiveKit Agents
Groq APIs (Whisper + LLaMA)
Cartesia TTS
Silero VAD

Infrastructure

Docker & Docker Compose
Multi-service orchestration
Health checks & env-based config

🧠 Prompt Management & Observability

Freya includes a lightweight prompt management system:

Create & version prompts
Edit prompts without restarting sessions
Track which prompt version was used per conversation

Session metrics include:

Duration
Latency
Active connections

These are essential when deploying AI agents into real customer environments.

🚀 Running Locally

git clone https://github.com/05sanjaykumar/Freya-Voice-YC25-Assessment
cd Freya-Voice-YC25-Assessment
cp .env.example .env
docker compose up --build

Visit: http://localhost:3000

You’ll need:

LiveKit credentials
Groq API key
Cartesia API key

🧩 Engineering Tradeoffs (YC-Style)

Some deliberate decisions:

Voice-first over chat: harder, but closer to real-world systems
In-memory storage: faster iteration, easier debugging
Service boundaries early: frontend, agent, and infra separated from day one
Docker everywhere: reproducibility > local hacks

These choices optimize for clarity, deployability, and iteration speed — not just demos.

🛣️ What I’d Build Next

If this were going to production:

PostgreSQL + Redis for state
Horizontal agent scaling
CI/CD (GitHub Actions)
OpenTelemetry for tracing
Session replay & recording
Rate limiting & abuse prevention

🎤 Final Thoughts

This project was less about “AI magic” and more about engineering judgment:
latency, reliability, real-time systems, and clean abstractions.

If you’re preparing for YC-style interviews, Forward Deployed Engineer roles, or real-time AI systems, I hope this serves as a practical reference.

Happy building 🚀

DEV Community