DEV Community

Cover image for GlobalCare AI: International Medical Voice Assistant
Zakeer
Zakeer

Posted on

GlobalCare AI: International Medical Voice Assistant

AssemblyAI Voice Agents Challenge: Domain Expert

This is a submission for the AssemblyAI Voice Agents Challenge

What I Built

GlobalCare AI is a specialized Domain Expert Voice Agent for medical consultations. Users speak their medical concerns, and the system provides AI-powered medical analysis with voice responses, supporting multiple languages and country-specific emergency protocols.

Domain Expert Voice Agent Category
This submission addresses the Domain Expert Voice Agent prompt with:

  • Medical Domain Expertise: Recognition of medical terminology and symptoms
  • Emergency Triage: Automated urgency assessment (Critical/High/Medium/Low)
  • Cultural Adaptation: Country-specific emergency numbers and medical protocols
  • Voice-to-Voice Workflow: Complete audio input to audio response system
  • Multi-language Support: 7 languages across 6 countries

Core Problem Addressed
Many people face language barriers and accessibility issues when seeking medical guidance. GlobalCare AI provides voice-first medical consultation that works across languages and cultures.

Demo

GitHub Repository

🏥 International AI Medical Voice Assistant

Advanced voice-to-voice medical consultation with emergency triage across 6 countries in 7 languages

A complete medical AI system that processes voice input, analyzes symptoms using advanced AI, and provides spoken medical guidance with emergency triage capabilities. Built for global healthcare accessibility with support for multiple languages and country-specific emergency protocols.

Medical AI Demo Languages Countries Response Time

🖼️ System Overview

part_1

Complete medical voice consultation interface with country selection, voice recording, and real-time medical analysis

Image

Emergency alert system with country-specific protocols and multilingual support

Image

System performance dashboard showing real-time metrics and component status

Image

Advanced medical analysis results with urgency assessment and voice response generation

✨ Features

  • 🚨 Emergency Triage - Sub-5 second critical symptom detection
  • 🎙️ Voice-to-Voice - Complete audio workflow with AI voice responses
  • 🌍 International - 6 countries with localized emergency protocols
  • 🗣️ Multilingual - 7 languages with cultural medical adaptation
  • 💊 Drug Safety - Real-time medication interaction checking

Technical Implementation & AssemblyAI Integration

1. Medical Vocabulary Boosting for Domain Expertise

Medical terminology requires exceptional accuracy - a misrecognized medication name could have serious consequences. We implemented comprehensive vocabulary boosting using AssemblyAI's word_boost feature:

class MedicalSpeechProcessor:
    def __init__(self, knowledge_base):
        self.knowledge_base = knowledge_base
        self.medical_terms = self._load_medical_terminology()

    def _load_medical_terminology(self):
        """Load comprehensive medical vocabulary for AssemblyAI boosting"""
        return {
            "symptoms": [
                "chest pain", "shortness of breath", "nausea", "vomiting", 
                "dizziness", "headache", "fever", "cough", "fatigue", 
                "abdominal pain", "palpitations", "syncope"
            ],
            "medications": [
                "acetaminophen", "ibuprofen", "aspirin", "amoxicillin", 
                "metformin", "lisinopril", "atorvastatin", "amlodipine", 
                "metoprolol", "sertraline", "fluoxetine", "alprazolam"
            ],
            "body_parts": [
                "heart", "lungs", "stomach", "liver", "kidney", "brain",
                "spine", "shoulder", "knee", "ankle", "throat", "chest"
            ],
            "medical_conditions": [
                "diabetes", "hypertension", "asthma", "heart disease", 
                "pneumonia", "bronchitis", "arthritis", "depression"
            ]
        }

    def configure_transcriber_settings(self, language_code="en", medical_focus=True):
        """Configure AssemblyAI with medical-specific vocabulary boosting"""

        # Flatten all medical terms for vocabulary boosting
        all_medical_terms = []
        if medical_focus:
            for category in self.medical_terms.values():
                all_medical_terms.extend(category)

        # AssemblyAI supports up to 1000 boosted terms
        config = aai.TranscriptionConfig(
            language_code=language_code,
            punctuate=True,
            format_text=True,
            speech_model=aai.SpeechModel.best,  # Use highest quality model
            word_boost=all_medical_terms[:1000],  # Boost medical vocabulary
            boost_param="high"  # Maximum boosting for critical terms
        )

        return config
Enter fullscreen mode Exit fullscreen mode

2. Multilingual Medical Processing

GlobalCare AI supports 7 languages across 6 countries. We mapped user-friendly language names to AssemblyAI's language codes and configured each for medical accuracy:

def process_medical_audio(audio_data, country, language):
    """Process medical audio with language-specific configuration"""

    # Map user language selection to AssemblyAI language codes
    language_mapping = {
        "English": "en_us", "Spanish": "es", "Hindi": "hi",
        "Telugu": "te", "Japanese": "ja", "Arabic": "ar", "Mandarin": "zh"
    }

    language_code = language_mapping.get(language, "en")

    # Configure AssemblyAI for the specific language with medical focus
    config = speech_processor.configure_transcriber_settings(
        language_code, medical_focus=True
    )

    transcriber = aai.Transcriber(config=config)

    # Process audio with medical vocabulary boosting
    transcript = transcriber.transcribe(audio_file_path)

    return transcript
Enter fullscreen mode Exit fullscreen mode

3. Real-Time Audio Processing Pipeline

Our system processes live audio recordings from Gradio's microphone input. We implemented a robust pipeline that converts numpy audio arrays to AssemblyAI-compatible formats:

def process_audio_file(self, audio_file_path, language="en", country="USA"):
    """Complete audio processing pipeline with AssemblyAI"""
    try:
        # Step 1: Configure AssemblyAI for medical accuracy
        config = self.configure_transcriber_settings(language, medical_focus=True)
        transcriber = aai.Transcriber(config=config)

        # Step 2: Process audio with Universal-Streaming
        print(f"Processing audio with AssemblyAI: {audio_file_path}")
        transcript = transcriber.transcribe(audio_file_path)

        # Step 3: Handle AssemblyAI response
        if transcript.status == aai.TranscriptStatus.error:
            return {
                "success": False,
                "error": transcript.error,
                "text": "",
                "confidence": 0.0
            }

        # Step 4: Extract medical information from transcript
        result = self._process_medical_transcript(transcript, language, country)
        return result

    except Exception as e:
        return {
            "success": False,
            "error": str(e),
            "text": "",
            "confidence": 0.0
        }

def _process_medical_transcript(self, transcript, language, country):
    """Process AssemblyAI transcript for medical analysis"""
    text = transcript.text.lower()

    # Extract medical entities from transcribed text
    medical_entities = self._extract_medical_entities(text)

    # Detect emergency keywords in transcript
    emergency_detected = self._detect_emergency_keywords(text)

    # Assess urgency based on transcript content
    urgency_level = self._assess_urgency(text, emergency_detected)

    # Get country-specific medical information
    country_info = self.knowledge_base.countries_data.get(country, {})

    return {
        "success": True,
        "text": transcript.text,
        "processed_text": text,
        "confidence": getattr(transcript, 'confidence', 0.85),
        "language": language,
        "country": country,
        "emergency_detected": emergency_detected,
        "urgency_level": urgency_level,
        "medical_entities": medical_entities,
        "country_info": country_info,
        "requires_immediate_attention": urgency_level in ["Critical", "High"]
    }
Enter fullscreen mode Exit fullscreen mode

4. Emergency Detection with AssemblyAI Transcripts

We implemented intelligent emergency detection that analyzes AssemblyAI's transcription output for critical medical keywords:

def _detect_emergency_keywords(self, text):
    """Detect emergency indicators in AssemblyAI transcript"""
    emergency_keywords = [
        "chest pain", "can't breathe", "difficulty breathing", 
        "heart attack", "stroke", "unconscious", "severe bleeding",
        "overdose", "poisoning", "choking", "seizure"
    ]

    detected_keywords = []
    for keyword in emergency_keywords:
        if keyword in text:
            detected_keywords.append(keyword)

    return detected_keywords

def _assess_urgency(self, text, emergency_keywords):
    """Assess medical urgency from AssemblyAI transcript"""
    critical_indicators = [
        "can't breathe", "chest pain", "heart attack", "stroke",
        "unconscious", "severe bleeding", "overdose", "choking"
    ]

    high_indicators = [
        "severe pain", "difficulty breathing", "bleeding", 
        "high fever", "rapid heartbeat", "confusion"
    ]

    # Check for critical emergencies first
    for indicator in critical_indicators:
        if indicator in text:
            return "Critical"

    # Check for high urgency symptoms
    for indicator in high_indicators:
        if indicator in text:
            return "High"

    # Medium urgency if any emergency keywords detected
    if emergency_keywords:
        return "Medium"

    return "Low"
Enter fullscreen mode Exit fullscreen mode

5. Audio Format Handling for Gradio Integration

We implemented robust audio processing to handle Gradio's numpy audio format and convert it for AssemblyAI processing:

def convert_gradio_audio_for_assemblyai(audio_data):
    """Convert Gradio numpy audio to AssemblyAI-compatible format"""
    if isinstance(audio_data, tuple) and len(audio_data) == 2:
        sample_rate, audio_array = audio_data

        # Create temporary WAV file for AssemblyAI
        temp_audio_file = tempfile.NamedTemporaryFile(delete=False, suffix=".wav")

        try:
            import scipy.io.wavfile as wavfile
            import numpy as np

            # Convert audio to proper format
            if audio_array.dtype != np.int16:
                audio_array = np.clip(audio_array, -1.0, 1.0)
                audio_array = (audio_array * 32767).astype(np.int16)

            # Write WAV file compatible with AssemblyAI
            wavfile.write(temp_audio_file.name, sample_rate, audio_array)

        except ImportError:
            # Fallback using wave module
            import wave
            import numpy as np

            with wave.open(temp_audio_file.name, 'wb') as wav_file:
                wav_file.setnchannels(1)  # Mono audio
                wav_file.setsampwidth(2)  # 16-bit
                wav_file.setframerate(sample_rate)

                if audio_array.dtype != np.int16:
                    audio_array = np.clip(audio_array, -1.0, 1.0)
                    audio_array = (audio_array * 32767).astype(np.int16)

                wav_file.writeframes(audio_array.tobytes())

        temp_audio_file.close()
        return temp_audio_file.name

    return None
Enter fullscreen mode Exit fullscreen mode

6. Integration Architecture

# Complete AssemblyAI integration workflow
def complete_voice_medical_workflow(audio_data, country, language):
    """End-to-end workflow using AssemblyAI Universal-Streaming"""

    # Step 1: Convert audio format
    audio_file = convert_gradio_audio_for_assemblyai(audio_data)

    # Step 2: Configure AssemblyAI for medical domain
    language_code = get_language_code(language)
    config = configure_medical_transcription(language_code)

    # Step 3: Transcribe with AssemblyAI
    transcriber = aai.Transcriber(config=config)
    transcript = transcriber.transcribe(audio_file)

    # Step 4: Process medical content
    medical_analysis = analyze_transcript_for_medical_content(transcript)

    # Step 5: Generate response
    return create_medical_response(medical_analysis, country, language)
Enter fullscreen mode Exit fullscreen mode

AssemblyAI Universal-Streaming serves as the critical foundation that enables GlobalCare AI to accurately understand medical terminology across multiple languages, making voice-first healthcare accessible globally. The vocabulary boosting feature is particularly crucial for medical applications where precision in recognizing drug names and symptoms can have significant implications for patient care.

Teammate: @tanmaiyeevadloori

Top comments (0)