This is a submission for the AssemblyAI Voice Agents Challenge
What I Built
I built a Domain Expert Voice Agent that acts as a real-time STEM tutor. This voice agent helps users understand complex concepts in math, physics, and computer science, using a live microphone input and a responsive LLM-based tutor.
The project addresses the Domain Expert prompt category by:
Demonstrating deep domain expertise via a Hugging Face LLM (zephyr-7b-beta)
Supporting multi-turn conversations
Integrating Universal-Streaming API for low-latency, real-time speech-to-text conversion
Providing meaningful and natural tutoring experiences through dialogue refinement, clarification handling, and context preservation.
GitHub Repository
[https://github.com/aravind048/AI-ML/tree/Assembly_AI]
Technical Implementation & AssemblyAI Integration
This project uses the AssemblyAI Universal-Streaming API to deliver sub-300ms latency transcription for real-time voice interaction.
🧩 Core Tech Stack
- Python 3.10
- AssemblyAI Universal-Streaming (WebSocket)
- HuggingFaceH4/zephyr-7b-beta:featherless-ai via HuggingFace Router API
- pyttsx3 for text-to-speech feedback
- asyncio, websockets, sounddevice, numpy
🧠 Voice-to-LLM Pipeline
# Connect to AssemblyAI's Universal-Streaming WebSocket
async with websockets.connect(ASSEMBLY_WS_URL, extra_headers=headers) as ws:
await send_audio(ws) # Stream microphone audio
await receive_transcripts(ws) # Get live transcription
# Forward user query to LLM for STEM tutoring
response = requests.post(HF_API, headers=headers, json={
"model": "HuggingFaceH4/zephyr-7b-beta:featherless-ai",
"messages": [
{"role": "system", "content": "You are a helpful STEM tutor..."},
{"role": "user", "content": transcript}
]
})
✅ AssemblyAI Highlights
- Ultra-low latency streaming
- Intelligent endpointing for clear turn detection
- Works seamlessly with domain-specific terminology (e.g., "parabola", "Big O notation", "Planck constant")
Top comments (0)