DEV Community

Cover image for STEM Tutor Voice Agent
aravind048
aravind048

Posted on

STEM Tutor Voice Agent

AssemblyAI Voice Agents Challenge: Domain Expert

This is a submission for the AssemblyAI Voice Agents Challenge

What I Built

I built a Domain Expert Voice Agent that acts as a real-time STEM tutor. This voice agent helps users understand complex concepts in math, physics, and computer science, using a live microphone input and a responsive LLM-based tutor.

The project addresses the Domain Expert prompt category by:

Demonstrating deep domain expertise via a Hugging Face LLM (zephyr-7b-beta)

Supporting multi-turn conversations

Integrating Universal-Streaming API for low-latency, real-time speech-to-text conversion

Providing meaningful and natural tutoring experiences through dialogue refinement, clarification handling, and context preservation.

GitHub Repository

[https://github.com/aravind048/AI-ML/tree/Assembly_AI]

Technical Implementation & AssemblyAI Integration

This project uses the AssemblyAI Universal-Streaming API to deliver sub-300ms latency transcription for real-time voice interaction.

🧩 Core Tech Stack

  • Python 3.10
  • AssemblyAI Universal-Streaming (WebSocket)
  • HuggingFaceH4/zephyr-7b-beta:featherless-ai via HuggingFace Router API
  • pyttsx3 for text-to-speech feedback
  • asyncio, websockets, sounddevice, numpy

🧠 Voice-to-LLM Pipeline

# Connect to AssemblyAI's Universal-Streaming WebSocket
async with websockets.connect(ASSEMBLY_WS_URL, extra_headers=headers) as ws:
    await send_audio(ws)  # Stream microphone audio
    await receive_transcripts(ws)  # Get live transcription
Enter fullscreen mode Exit fullscreen mode
# Forward user query to LLM for STEM tutoring
response = requests.post(HF_API, headers=headers, json={
    "model": "HuggingFaceH4/zephyr-7b-beta:featherless-ai",
    "messages": [
        {"role": "system", "content": "You are a helpful STEM tutor..."},
        {"role": "user", "content": transcript}
    ]
})

Enter fullscreen mode Exit fullscreen mode

✅ AssemblyAI Highlights

  • Ultra-low latency streaming
  • Intelligent endpointing for clear turn detection
  • Works seamlessly with domain-specific terminology (e.g., "parabola", "Big O notation", "Planck constant")

Top comments (0)