DEV Community

sanjay kumar
sanjay kumar

Posted on

Building a Real-Time Voice AI System for a YC-Style Assessment

Freya Voice AI Agent Console (Forward-Deployed Engineering Deep Dive)

As part of a YC-style technical assessment, I built Freya Voice AI Agent Console — a production-grade, real-time voice-to-voice AI system focused on low latency, clean architecture, and real-world constraints.

The goal wasn’t just to “make it work”, but to design something that feels forward-deployable: a system you could realistically ship, iterate on, and operate.

👉 Source code:
https://github.com/05sanjaykumar/Freya-Voice-YC25-Assessment


🎯 What Was Built

Freya is a real-time conversational voice AI that supports:

  • Bidirectional voice-to-voice conversations
  • Sub-200ms latency streaming
  • Prompt versioning via a web console
  • Session metrics and analytics
  • Fully Dockerized deployment

The project was built in 3 days, prioritizing voice pipelines over chat UIs, similar to real YC engineering evaluations.


🔊 Voice AI Pipeline (End-to-End)

The core pipeline looks like this:

  1. User speaks → audio streamed via WebRTC
  2. Speech-to-Text (STT) → Groq Whisper
  3. LLM reasoning → Groq LLaMA-3.1-8B
  4. Text-to-Speech (TTS) → Cartesia Sonic
  5. Audio streamed back to the user in real time

To improve accuracy and latency:

  • Silero VAD detects speech boundaries
  • Audio chunks are processed incrementally
  • Responses are streamed, not buffered

This architecture mirrors how modern voice assistants are built in production.


🏗️ System Architecture

[ Browser Client ]
     ↓ WebRTC (LiveKit)
[ Next.js Frontend ]
     ↓ HTTP / WS
[ Python Voice Agent ]
     ↓
[ Groq (STT + LLM) ] → [ Cartesia (TTS) ]
Enter fullscreen mode Exit fullscreen mode

Why LiveKit?

LiveKit handles:

  • WebRTC negotiation
  • Low-latency audio routing
  • Session lifecycle management

This avoids re-implementing complex real-time networking logic — a key forward-deployed engineering decision.


🧰 Tech Stack

Frontend

  • Next.js 15 (App Router)
  • TypeScript
  • Tailwind CSS
  • LiveKit Client SDK

Backend

  • Python 3.11
  • LiveKit Agents
  • Groq APIs (Whisper + LLaMA)
  • Cartesia TTS
  • Silero VAD

Infrastructure

  • Docker & Docker Compose
  • Multi-service orchestration
  • Health checks & env-based config

🧠 Prompt Management & Observability

Freya includes a lightweight prompt management system:

  • Create & version prompts
  • Edit prompts without restarting sessions
  • Track which prompt version was used per conversation

Session metrics include:

  • Duration
  • Latency
  • Active connections

These are essential when deploying AI agents into real customer environments.


🚀 Running Locally

git clone https://github.com/05sanjaykumar/Freya-Voice-YC25-Assessment
cd Freya-Voice-YC25-Assessment
cp .env.example .env
docker compose up --build
Enter fullscreen mode Exit fullscreen mode

Visit: http://localhost:3000

You’ll need:

  • LiveKit credentials
  • Groq API key
  • Cartesia API key

🧩 Engineering Tradeoffs (YC-Style)

Some deliberate decisions:

  • Voice-first over chat: harder, but closer to real-world systems
  • In-memory storage: faster iteration, easier debugging
  • Service boundaries early: frontend, agent, and infra separated from day one
  • Docker everywhere: reproducibility > local hacks

These choices optimize for clarity, deployability, and iteration speed — not just demos.


🛣️ What I’d Build Next

If this were going to production:

  • PostgreSQL + Redis for state
  • Horizontal agent scaling
  • CI/CD (GitHub Actions)
  • OpenTelemetry for tracing
  • Session replay & recording
  • Rate limiting & abuse prevention

🎤 Final Thoughts

This project was less about “AI magic” and more about engineering judgment:
latency, reliability, real-time systems, and clean abstractions.

If you’re preparing for YC-style interviews, Forward Deployed Engineer roles, or real-time AI systems, I hope this serves as a practical reference.

Happy building 🚀

Top comments (0)