🎤 Building a Real-Time Voice AI Assistant Using Open Source Tools

#ai #python #opensource #showdev

I built a real-time Voice AI assistant that listens, thinks, and talks back — using entirely open-source tools and APIs.

No ChatGPT wrappers.
No expensive SDKs.
Just raw engineering.

🚀 Live Demo

🌐 Try it here:
https://huggingface.co/spaces/Kailashalgo/voice-ai-chat

Press and hold the mic button → speak → AI replies out loud.

🧠 What This Project Does

The app creates a full voice conversation pipeline:

You speak into the browser
Whisper converts speech → text
LLaMA 3.3 70B generates a response
gTTS converts text → speech
Audio plays back instantly

It feels surprisingly natural and fast.

🛠️ Tech Stack
Layer Tool
🎤 Speech to Text Whisper Large V3 Turbo (Groq API)
🧠 LLM LLaMA 3.3 70B
🔊 Text to Speech gTTS
⚡ Backend FastAPI + Python
🌐 Frontend Vanilla HTML/CSS/JS
🐳 Deployment Docker
☁️ Hosting HuggingFace Spaces
⚡ Why I Built This

Most AI voice demos online are:

expensive,
closed-source,
or heavily abstracted.

I wanted to understand how real-time voice AI systems actually work under the hood.

This project helped me explore:

streaming workflows,
latency optimization,
speech pipelines,
browser audio APIs,
and LLM orchestration.
🧩 System Architecture

The complete flow:

User Voice
→ Whisper STT
→ LLaMA Processing
→ gTTS Voice Generation
→ Browser Playback

Simple architecture — but extremely powerful.

📂 Project Structure
voice-ai-chat/
├── backend/
│ ├── main.py
│ ├── stt.py
│ ├── tts.py
│ └── requirements.txt
├── frontend/
│ └── index.html
├── Dockerfile
├── .env.example
└── README.md
⚙️ Running Locally
Clone the repository
git clone https://github.com/kailashv2/voice-ai-chat.git
cd voice-ai-chat
Create virtual environment
python -m venv venv
Install dependencies
pip install -r requirements.txt
Add Groq API key
GROQ_API_KEY=your_key_here
Start FastAPI server
uvicorn main:app --reload
🐳 Docker Support
docker build -t voice-ai-chat .
docker run -p 7860:7860 -e GROQ_API_KEY=your_key voice-ai-chat
💸 Cost