This is a submission for the AssemblyAI Voice Agents Challenge
What I Built
GlobalCare AI is a specialized Domain Expert Voice Agent for medical consultations. Users speak their medical concerns, and the system provides AI-powered medical analysis with voice responses, supporting multiple languages and country-specific emergency protocols.
Domain Expert Voice Agent Category
This submission addresses the Domain Expert Voice Agent prompt with:
- Medical Domain Expertise: Recognition of medical terminology and symptoms
- Emergency Triage: Automated urgency assessment (Critical/High/Medium/Low)
- Cultural Adaptation: Country-specific emergency numbers and medical protocols
- Voice-to-Voice Workflow: Complete audio input to audio response system
- Multi-language Support: 7 languages across 6 countries
Core Problem Addressed
Many people face language barriers and accessibility issues when seeking medical guidance. GlobalCare AI provides voice-first medical consultation that works across languages and cultures.
Demo
GitHub Repository
🏥 International AI Medical Voice Assistant
Advanced voice-to-voice medical consultation with emergency triage across 6 countries in 7 languages
A complete medical AI system that processes voice input, analyzes symptoms using advanced AI, and provides spoken medical guidance with emergency triage capabilities. Built for global healthcare accessibility with support for multiple languages and country-specific emergency protocols.
🖼️ System Overview
Complete medical voice consultation interface with country selection, voice recording, and real-time medical analysis
Emergency alert system with country-specific protocols and multilingual support
System performance dashboard showing real-time metrics and component status
Advanced medical analysis results with urgency assessment and voice response generation
✨ Features
- 🚨 Emergency Triage - Sub-5 second critical symptom detection
- 🎙️ Voice-to-Voice - Complete audio workflow with AI voice responses
- 🌍 International - 6 countries with localized emergency protocols
- 🗣️ Multilingual - 7 languages with cultural medical adaptation
- 💊 Drug Safety - Real-time medication interaction checking
- ⚡…
Technical Implementation & AssemblyAI Integration
1. Medical Vocabulary Boosting for Domain Expertise
Medical terminology requires exceptional accuracy - a misrecognized medication name could have serious consequences. We implemented comprehensive vocabulary boosting using AssemblyAI's word_boost
feature:
class MedicalSpeechProcessor:
def __init__(self, knowledge_base):
self.knowledge_base = knowledge_base
self.medical_terms = self._load_medical_terminology()
def _load_medical_terminology(self):
"""Load comprehensive medical vocabulary for AssemblyAI boosting"""
return {
"symptoms": [
"chest pain", "shortness of breath", "nausea", "vomiting",
"dizziness", "headache", "fever", "cough", "fatigue",
"abdominal pain", "palpitations", "syncope"
],
"medications": [
"acetaminophen", "ibuprofen", "aspirin", "amoxicillin",
"metformin", "lisinopril", "atorvastatin", "amlodipine",
"metoprolol", "sertraline", "fluoxetine", "alprazolam"
],
"body_parts": [
"heart", "lungs", "stomach", "liver", "kidney", "brain",
"spine", "shoulder", "knee", "ankle", "throat", "chest"
],
"medical_conditions": [
"diabetes", "hypertension", "asthma", "heart disease",
"pneumonia", "bronchitis", "arthritis", "depression"
]
}
def configure_transcriber_settings(self, language_code="en", medical_focus=True):
"""Configure AssemblyAI with medical-specific vocabulary boosting"""
# Flatten all medical terms for vocabulary boosting
all_medical_terms = []
if medical_focus:
for category in self.medical_terms.values():
all_medical_terms.extend(category)
# AssemblyAI supports up to 1000 boosted terms
config = aai.TranscriptionConfig(
language_code=language_code,
punctuate=True,
format_text=True,
speech_model=aai.SpeechModel.best, # Use highest quality model
word_boost=all_medical_terms[:1000], # Boost medical vocabulary
boost_param="high" # Maximum boosting for critical terms
)
return config
2. Multilingual Medical Processing
GlobalCare AI supports 7 languages across 6 countries. We mapped user-friendly language names to AssemblyAI's language codes and configured each for medical accuracy:
def process_medical_audio(audio_data, country, language):
"""Process medical audio with language-specific configuration"""
# Map user language selection to AssemblyAI language codes
language_mapping = {
"English": "en_us", "Spanish": "es", "Hindi": "hi",
"Telugu": "te", "Japanese": "ja", "Arabic": "ar", "Mandarin": "zh"
}
language_code = language_mapping.get(language, "en")
# Configure AssemblyAI for the specific language with medical focus
config = speech_processor.configure_transcriber_settings(
language_code, medical_focus=True
)
transcriber = aai.Transcriber(config=config)
# Process audio with medical vocabulary boosting
transcript = transcriber.transcribe(audio_file_path)
return transcript
3. Real-Time Audio Processing Pipeline
Our system processes live audio recordings from Gradio's microphone input. We implemented a robust pipeline that converts numpy audio arrays to AssemblyAI-compatible formats:
def process_audio_file(self, audio_file_path, language="en", country="USA"):
"""Complete audio processing pipeline with AssemblyAI"""
try:
# Step 1: Configure AssemblyAI for medical accuracy
config = self.configure_transcriber_settings(language, medical_focus=True)
transcriber = aai.Transcriber(config=config)
# Step 2: Process audio with Universal-Streaming
print(f"Processing audio with AssemblyAI: {audio_file_path}")
transcript = transcriber.transcribe(audio_file_path)
# Step 3: Handle AssemblyAI response
if transcript.status == aai.TranscriptStatus.error:
return {
"success": False,
"error": transcript.error,
"text": "",
"confidence": 0.0
}
# Step 4: Extract medical information from transcript
result = self._process_medical_transcript(transcript, language, country)
return result
except Exception as e:
return {
"success": False,
"error": str(e),
"text": "",
"confidence": 0.0
}
def _process_medical_transcript(self, transcript, language, country):
"""Process AssemblyAI transcript for medical analysis"""
text = transcript.text.lower()
# Extract medical entities from transcribed text
medical_entities = self._extract_medical_entities(text)
# Detect emergency keywords in transcript
emergency_detected = self._detect_emergency_keywords(text)
# Assess urgency based on transcript content
urgency_level = self._assess_urgency(text, emergency_detected)
# Get country-specific medical information
country_info = self.knowledge_base.countries_data.get(country, {})
return {
"success": True,
"text": transcript.text,
"processed_text": text,
"confidence": getattr(transcript, 'confidence', 0.85),
"language": language,
"country": country,
"emergency_detected": emergency_detected,
"urgency_level": urgency_level,
"medical_entities": medical_entities,
"country_info": country_info,
"requires_immediate_attention": urgency_level in ["Critical", "High"]
}
4. Emergency Detection with AssemblyAI Transcripts
We implemented intelligent emergency detection that analyzes AssemblyAI's transcription output for critical medical keywords:
def _detect_emergency_keywords(self, text):
"""Detect emergency indicators in AssemblyAI transcript"""
emergency_keywords = [
"chest pain", "can't breathe", "difficulty breathing",
"heart attack", "stroke", "unconscious", "severe bleeding",
"overdose", "poisoning", "choking", "seizure"
]
detected_keywords = []
for keyword in emergency_keywords:
if keyword in text:
detected_keywords.append(keyword)
return detected_keywords
def _assess_urgency(self, text, emergency_keywords):
"""Assess medical urgency from AssemblyAI transcript"""
critical_indicators = [
"can't breathe", "chest pain", "heart attack", "stroke",
"unconscious", "severe bleeding", "overdose", "choking"
]
high_indicators = [
"severe pain", "difficulty breathing", "bleeding",
"high fever", "rapid heartbeat", "confusion"
]
# Check for critical emergencies first
for indicator in critical_indicators:
if indicator in text:
return "Critical"
# Check for high urgency symptoms
for indicator in high_indicators:
if indicator in text:
return "High"
# Medium urgency if any emergency keywords detected
if emergency_keywords:
return "Medium"
return "Low"
5. Audio Format Handling for Gradio Integration
We implemented robust audio processing to handle Gradio's numpy audio format and convert it for AssemblyAI processing:
def convert_gradio_audio_for_assemblyai(audio_data):
"""Convert Gradio numpy audio to AssemblyAI-compatible format"""
if isinstance(audio_data, tuple) and len(audio_data) == 2:
sample_rate, audio_array = audio_data
# Create temporary WAV file for AssemblyAI
temp_audio_file = tempfile.NamedTemporaryFile(delete=False, suffix=".wav")
try:
import scipy.io.wavfile as wavfile
import numpy as np
# Convert audio to proper format
if audio_array.dtype != np.int16:
audio_array = np.clip(audio_array, -1.0, 1.0)
audio_array = (audio_array * 32767).astype(np.int16)
# Write WAV file compatible with AssemblyAI
wavfile.write(temp_audio_file.name, sample_rate, audio_array)
except ImportError:
# Fallback using wave module
import wave
import numpy as np
with wave.open(temp_audio_file.name, 'wb') as wav_file:
wav_file.setnchannels(1) # Mono audio
wav_file.setsampwidth(2) # 16-bit
wav_file.setframerate(sample_rate)
if audio_array.dtype != np.int16:
audio_array = np.clip(audio_array, -1.0, 1.0)
audio_array = (audio_array * 32767).astype(np.int16)
wav_file.writeframes(audio_array.tobytes())
temp_audio_file.close()
return temp_audio_file.name
return None
6. Integration Architecture
# Complete AssemblyAI integration workflow
def complete_voice_medical_workflow(audio_data, country, language):
"""End-to-end workflow using AssemblyAI Universal-Streaming"""
# Step 1: Convert audio format
audio_file = convert_gradio_audio_for_assemblyai(audio_data)
# Step 2: Configure AssemblyAI for medical domain
language_code = get_language_code(language)
config = configure_medical_transcription(language_code)
# Step 3: Transcribe with AssemblyAI
transcriber = aai.Transcriber(config=config)
transcript = transcriber.transcribe(audio_file)
# Step 4: Process medical content
medical_analysis = analyze_transcript_for_medical_content(transcript)
# Step 5: Generate response
return create_medical_response(medical_analysis, country, language)
AssemblyAI Universal-Streaming serves as the critical foundation that enables GlobalCare AI to accurately understand medical terminology across multiple languages, making voice-first healthcare accessible globally. The vocabulary boosting feature is particularly crucial for medical applications where precision in recognizing drug names and symptoms can have significant implications for patient care.
Teammate: @tanmaiyeevadloori
Top comments (0)