🩺 HealthMate – A Voice Agent That Thinks and Reasons Before Answering for Medical Awareness and Decision Support
Submission for AssemblyAI Voice Agents Challenge – July 2025
🚀 What is HealthMate?
HealthMate is more than a voice assistant—it’s your trusted health confidant that thinks and reasons like a clinician to deliver safe, reliable medical knowledge. Designed to empower everyone, especially underserved communities, it provides instant, ethical health guidance through voice interaction, bridging gaps in health literacy and accessibility.
- 🎙️ Voice-First Queries: Ask health questions naturally, no typing required.
- 🧠 Reasoned Responses: Simulates clinical reasoning for clear, evidence-based answers.
- 🌍 Global Impact: Targets rural, low-literacy, and non-English-speaking users.
- 🚨 Ethical Core: Never diagnoses, always escalates emergencies, and refers to professionals.
💡 Imagine a medical mentor in your pocket—available 24/7, powered by AI, and grounded in trust.
🎯 Why HealthMate?
The world is grappling with a health information crisis:
- 3.6 billion people lack access to basic healthcare.
- 90% of online health info is misleading or false.
- 50% of rural areas remain underserved, with language and literacy barriers widening the gap.
- Patients often delay care or self-medicate, risking lives.
HealthMate’s Mission: To democratize health literacy with voice-first, ethical, and accessible medical guidance, powered by AI that thinks and reasons before responding, ensuring safety and clarity for all.
Demos & Links
- Live Link: https://healthmate-ai-voice-agent-frontend.vercel.app/
- Demo Video: DEMO_VIDEO_LINK
- Frontend Repository: Frontend Git REPO
- Backend Repository: Backend Git REPO
⚙️ How HealthMate Works
🧭 System Workflow
HealthMate’s brilliance lies in its ability to think and reason like a clinician, ensuring every response is safe, accurate, and helpful. Here’s the flow:
graph TD
A[User Speaks Query] --> B[LiveKit: Streams Audio]
B --> C[AssemblyAI: Speech-to-Text]
C --> D[RAG + LLM Reasoning Engine]
D --> E[ChromaDB: Vector Database]
E --> F[Clinical Reasoning Layer]
F --> G[Safe, Ethical Voice Response]
Step-by-Step Breakdown
Step | Component | What It Does |
---|---|---|
1 | LiveKit | Captures, streams real-time voice from browser |
2 | AssemblyAI | Converts speech to accurate, medical-aware text |
3 | RAG + LLM | Interprets user query and retrieves clinical context |
4 | ChromaDB | Performs vector search in curated medical knowledge base |
5 | Reasoning | Simulates safe, step-wise clinical thinking |
6 | Voice Output | Returns AI response with red flag checks and explanations |
Tech Stack
Tech | Use / Magic |
---|---|
LiveKit | Realtime voice streaming (WebRTC, ~300ms latency) |
AssemblyAI | Universal-Streaming ASR tuned for accurate, medical speech input |
Gemini / GPT | Interprets clinical language and logic, ensures safe RAG-based output |
ChromaDB | Blazing fast vector search over reliable, curated medical data |
FastAPI | Python backend that handles core logic, API routing, and security |
React + Tailwind | Clean, responsive, and user-friendly frontend interface |
Railway | Effortless cloud deployment and auto-scaling for backend services |
.env / Vercel | Secures environment variables and config for safe deployment |
Core Logic: LiveKit + AssemblyAI + Gemini LLM
Purpose:
Enable real-time voice streaming, detect end-of-speech (VAD), convert voice to text via AssemblyAI, and route the transcription to Gemini via FastAPI backend.
Key Components
# Set up LiveKit Voice Activity Detection + AssemblyAI
stt = assemblyai.STT(
api_key=ASSEMBLYAI_API_KEY,
end_of_turn_confidence_threshold=0.7,
min_end_of_turn_silence_when_confident=160,
max_turn_silence=2400,
)
# Define LLM function to call FastAPI's /api/query
def webhook_llm_function(prompt: str) -> str:
response = requests.post("http://localhost:8000/api/query", json={"query": prompt})
return response.json().get("answer", "No answer received.")
# LiveKit Agent Setup
llm = FunctionLLM(func=webhook_llm_function)
agent = Agent(
name="HealthMate Voice Agent",
session_factory=lambda: AgentSession(
stt=stt, # Real-time transcription
llm=llm, # Calls backend for Gemini response
tts=None # No text-to-speech (yet)
),
)
Challenges Faced & How We Solved Them
Challenge | Description | Solution |
---|---|---|
Voice Cutoff Timing | User speech was getting cut too early or too late. | Tuned end_of_turn_confidence_threshold and silence timings in AssemblyAI. |
Audio Sync | Voice stream sometimes lagged between LiveKit and AssemblyAI. | Optimized buffer settings and ensured proper threading in voice stream. |
Slow LLM Response | Gemini API responses created noticeable lags in conversation. | Implemented loading states on frontend and added response caching. |
CORS Errors | Frontend couldn’t connect to FastAPI backend due to CORS policy blocks. | Used fastapi.middleware.cors with permissive settings during dev. |
API Key Leaks | Accidentally committed .env with secrets. |
Added .env to .gitignore and rotated all leaked API keys immediately. |
Screenshots & Demo Flow
Screen | Preview |
---|---|
Home / Intro | ![]() |
How it Works | ![]() |
Ethical & Safe Guards | ![]() |
Impact Potential | ![]() |
Voice Activation | ![]() |
AssemblyAI Transcript & LiveKit | ![]() |
Clinical Reasoning & Output | ![]() |
👥 Team
👨💻 Manoj Kumar Pendem
Solo builder, driven to bridge health gaps through voice-first AI solutions.
Built from scratch with 💪 and ☕ during the AssemblyAI Voice Agents Challenge.
🤝 Let’s Connect!
- 🛠️ GitHub – Explore the code, file feedback, or contribute ideas
- 🔗 LinkedIn – Let’s connect professionally
- 🌍 Collaborations: Open to NGOs, health orgs, and language localization partners
🚀 Conclusion
HealthMate isn’t just another AI project—it’s a step toward making trusted health guidance accessible to every voice, everywhere.
Built with purpose, designed for impact. Let’s reimagine healthcare, one conversation at a time.
Top comments (1)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.