I built a real-time Voice AI assistant that listens, thinks, and talks back โ using entirely open-source tools and APIs.
No ChatGPT wrappers.
No expensive SDKs.
Just raw engineering.
๐ Live Demo
๐ Try it here:
https://huggingface.co/spaces/Kailashalgo/voice-ai-chat
Press and hold the mic button โ speak โ AI replies out loud.
๐ง What This Project Does
The app creates a full voice conversation pipeline:
You speak into the browser
Whisper converts speech โ text
LLaMA 3.3 70B generates a response
gTTS converts text โ speech
Audio plays back instantly
It feels surprisingly natural and fast.
๐ ๏ธ Tech Stack
Layer Tool
๐ค Speech to Text Whisper Large V3 Turbo (Groq API)
๐ง LLM LLaMA 3.3 70B
๐ Text to Speech gTTS
โก Backend FastAPI + Python
๐ Frontend Vanilla HTML/CSS/JS
๐ณ Deployment Docker
โ๏ธ Hosting HuggingFace Spaces
โก Why I Built This
Most AI voice demos online are:
expensive,
closed-source,
or heavily abstracted.
I wanted to understand how real-time voice AI systems actually work under the hood.
This project helped me explore:
streaming workflows,
latency optimization,
speech pipelines,
browser audio APIs,
and LLM orchestration.
๐งฉ System Architecture
The complete flow:
User Voice
โ Whisper STT
โ LLaMA Processing
โ gTTS Voice Generation
โ Browser Playback
Simple architecture โ but extremely powerful.
๐ Project Structure
voice-ai-chat/
โโโ backend/
โ โโโ main.py
โ โโโ stt.py
โ โโโ tts.py
โ โโโ requirements.txt
โโโ frontend/
โ โโโ index.html
โโโ Dockerfile
โโโ .env.example
โโโ README.md
โ๏ธ Running Locally
Clone the repository
git clone https://github.com/kailashv2/voice-ai-chat.git
cd voice-ai-chat
Create virtual environment
python -m venv venv
Install dependencies
pip install -r requirements.txt
Add Groq API key
GROQ_API_KEY=your_key_here
Start FastAPI server
uvicorn main:app --reload
๐ณ Docker Support
docker build -t voice-ai-chat .
docker run -p 7860:7860 -e GROQ_API_KEY=your_key voice-ai-chat
๐ธ Cost
Completely free to build and deploy.
Groq free tier
Whisper via Groq
gTTS
HuggingFace Spaces free hosting
๐ฅ What I Learned
The hardest part wasn't the AI.
It was reducing latency and making conversations feel natural.
Voice interfaces are fundamentally different from text chat:
response speed matters more,
interruptions matter,
audio processing matters,
UX matters a lot.
This project gave me a much deeper understanding of production-grade AI interaction systems.
๐ Live Project
Demo:
https://huggingface.co/spaces/Kailashalgo/voice-ai-chat
GitHub:
https://github.com/kailashv2/voice-ai-chat
๐จโ๐ป Built By
Kailash
Building AI systems, full-stack products, and agentic workflows.
If you found this useful, consider starring the repo โญ
Top comments (1)
github.com/kailashv2