DEV Community

Trupt Patel
Trupt Patel

Posted on

๐Ÿง  Build a Real-Time Voice Assistant in the Browser Using LiveKit + Deepgram + OpenAI (+ Cartesia)

What if your app could talk back โ€” like a real person?

This weekend I built a browser-based AI Voice Assistant that listens, understands, and responds โ€” all in real time. No browser plugins, no command syntax, no install โ€” just a mic and a modern web browser.

Itโ€™s like talking to ChatGPT โ€” but with your voice, and without any UI friction.

๐Ÿ› ๏ธ Tech Stack Overview

  • ๐ŸŽค LiveKit โ€“ WebRTC-based real-time audio streaming
  • โœ๏ธ Deepgram โ€“ ASR (speech-to-text) transcription
  • ๐Ÿง  OpenAI โ€“ LLM for understanding and generating replies
  • ๐Ÿ‘๏ธ Cartesia (optional) โ€“ for visual context-aware logic (DOM/UI understanding)

โš™๏ธ Flow Architecture
Hereโ€™s the high-level pipeline:

  • A[User Speaks] --> B[LiveKit Streams Audio]
  • B --> C[Deepgram Transcribes Speech]
  • C --> D[OpenAI Interprets + Responds]
  • D --> E[Response Returned (and Spoken)]

LiveKit handles reliable, low-latency mic audio. Deepgram transcribes voice input in real time. OpenAI then processes the transcription and generates a contextual reply. You can optionally use Web Speech API for speech synthesis โ€” so the assistant actually โ€œtalks back.โ€

๐Ÿ“ฆ Dev Setup
Youโ€™ll need API keys for:

  • LiveKit Cloud or self-hosted server
  • Deepgram API
  • OpenAI Platform

And a basic setup like this:

  1. Start mic capture and send audio to LiveKit room
  2. Pipe LiveKit audio to Deepgram (WebSocket or media pipeline)
  3. On final transcript, send to OpenAI
  4. Get response โ†’ optionally use SpeechSynthesis API

I'll be open-sourcing a simple implementation soon. Let me know if you're interested and Iโ€™ll drop the repo here.

๐ŸŒ Real-World Use Cases

  • AI customer support agents
  • Accessibility tools for hands-free apps
  • Internal smart copilots
  • Voice-controlled AI tutors or dashboards
  • Lightweight browser-based companions

This is a super flexible base โ€” especially with Cartesia if you want the assistant to understand or act on what's on screen.

๐Ÿงต Docs & Links
LiveKit: https://docs.livekit.io

Deepgram: https://developers.deepgram.com/docs/quickstart

OpenAI: https://platform.openai.com/docs

Cartesia: https://cartesia.ai

ai #webdev #openai #livekit #voiceassistant #deepgram #realtimedev #webrtc #llm #hackproject

Top comments (0)