Freya Voice AI Agent Console (Forward-Deployed Engineering Deep Dive)
As part of a YC-style technical assessment, I built Freya Voice AI Agent Console — a production-grade, real-time voice-to-voice AI system focused on low latency, clean architecture, and real-world constraints.
The goal wasn’t just to “make it work”, but to design something that feels forward-deployable: a system you could realistically ship, iterate on, and operate.
👉 Source code:
https://github.com/05sanjaykumar/Freya-Voice-YC25-Assessment
🎯 What Was Built
Freya is a real-time conversational voice AI that supports:
- Bidirectional voice-to-voice conversations
- Sub-200ms latency streaming
- Prompt versioning via a web console
- Session metrics and analytics
- Fully Dockerized deployment
The project was built in 3 days, prioritizing voice pipelines over chat UIs, similar to real YC engineering evaluations.
🔊 Voice AI Pipeline (End-to-End)
The core pipeline looks like this:
- User speaks → audio streamed via WebRTC
- Speech-to-Text (STT) → Groq Whisper
- LLM reasoning → Groq LLaMA-3.1-8B
- Text-to-Speech (TTS) → Cartesia Sonic
- Audio streamed back to the user in real time
To improve accuracy and latency:
- Silero VAD detects speech boundaries
- Audio chunks are processed incrementally
- Responses are streamed, not buffered
This architecture mirrors how modern voice assistants are built in production.
🏗️ System Architecture
[ Browser Client ]
↓ WebRTC (LiveKit)
[ Next.js Frontend ]
↓ HTTP / WS
[ Python Voice Agent ]
↓
[ Groq (STT + LLM) ] → [ Cartesia (TTS) ]
Why LiveKit?
LiveKit handles:
- WebRTC negotiation
- Low-latency audio routing
- Session lifecycle management
This avoids re-implementing complex real-time networking logic — a key forward-deployed engineering decision.
🧰 Tech Stack
Frontend
- Next.js 15 (App Router)
- TypeScript
- Tailwind CSS
- LiveKit Client SDK
Backend
- Python 3.11
- LiveKit Agents
- Groq APIs (Whisper + LLaMA)
- Cartesia TTS
- Silero VAD
Infrastructure
- Docker & Docker Compose
- Multi-service orchestration
- Health checks & env-based config
🧠 Prompt Management & Observability
Freya includes a lightweight prompt management system:
- Create & version prompts
- Edit prompts without restarting sessions
- Track which prompt version was used per conversation
Session metrics include:
- Duration
- Latency
- Active connections
These are essential when deploying AI agents into real customer environments.
🚀 Running Locally
git clone https://github.com/05sanjaykumar/Freya-Voice-YC25-Assessment
cd Freya-Voice-YC25-Assessment
cp .env.example .env
docker compose up --build
Visit: http://localhost:3000
You’ll need:
- LiveKit credentials
- Groq API key
- Cartesia API key
🧩 Engineering Tradeoffs (YC-Style)
Some deliberate decisions:
- Voice-first over chat: harder, but closer to real-world systems
- In-memory storage: faster iteration, easier debugging
- Service boundaries early: frontend, agent, and infra separated from day one
- Docker everywhere: reproducibility > local hacks
These choices optimize for clarity, deployability, and iteration speed — not just demos.
🛣️ What I’d Build Next
If this were going to production:
- PostgreSQL + Redis for state
- Horizontal agent scaling
- CI/CD (GitHub Actions)
- OpenTelemetry for tracing
- Session replay & recording
- Rate limiting & abuse prevention
🎤 Final Thoughts
This project was less about “AI magic” and more about engineering judgment:
latency, reliability, real-time systems, and clean abstractions.
If you’re preparing for YC-style interviews, Forward Deployed Engineer roles, or real-time AI systems, I hope this serves as a practical reference.
Happy building 🚀
Top comments (0)