Building provider-independent AI software: From Azure to Gemini to Local Whisper with zero code changes
The Problem: Latin America's Forensic Crisis
Latin America faces a silent humanitarian crisis. According to investigative journalism and government reports:
- 52,000+ unidentified bodies in Mexico alone (2006-2023)
- 15,000 forensic specialist deficit in Peru
- 700+ municipalities in Colombia without permanent forensic coverage
Medical examiners spend hours on manual documentation when they should be investigating. The administrative overhead creates "administrative disappearances" — bodies that enter the system but are never matched with missing persons reports.
I built CoronerIA to solve this. Here's how.
The Key Design Decision: AI-Agnostic Architecture
Before diving into features, let me explain the most important architectural decision: the system is completely AI-provider independent.
Why This Matters
| Provider | Pros | Cons |
|---|---|---|
| Azure AI Speech | Best accuracy, enterprise support | Paid, requires stable internet |
| Google Gemini | Free tier, multimodal capabilities | Rate limits on free tier |
| OpenAI Whisper | Open source, runs locally | Requires GPU, slower |
| AWS Transcribe | Good for AWS shops | Paid, another vendor lock-in |
We designed the system to support ALL of them with a single environment variable change. Currently, we use Gemini for development (free tier), but switching to Azure for production requires changing one config line:
# Development (free)
GEMINI_API_KEY=your_key_here
# Production (enterprise)
AZURE_SPEECH_KEY=your_azure_key
AZURE_OPENAI_KEY=your_openai_key
Architecture Overview
Challenge 1: Provider-Agnostic AI with Graceful Fallback
The Problem
Different deployment scenarios need different AI providers:
- Development: Free tier (Gemini, local Whisper)
- Staging: Low-cost cloud (Gemini, OpenAI)
- Production: Enterprise-grade (Azure AI, AWS Transcribe)
- Offline/Rural: Local models only (Whisper)
We needed a single codebase that works with any provider via configuration.
The Solution: Strategy Pattern
# backend/services/speech_service.py
class SpeechMode(Enum):
AZURE = "azure"
EDGE = "edge"
GEMINI = "gemini"
class SpeechService:
"""Unified Speech-to-Text service with Strategy Pattern."""
def __init__(self):
self._mode = self._determine_mode()
self._azure_recognizer = None
self._whisper_model = None
self._gemini_service = None
if self._mode == "gemini":
self._gemini_service = GeminiService()
logger.info(f"SpeechService initialized in mode: {self._mode}")
def _determine_mode(self) -> str:
"""Determines mode based on config and availability."""
effective = settings.get_effective_mode()
# Priority: Gemini > Azure > Local Whisper
if settings.GEMINI_API_KEY:
return "gemini"
if effective == "azure" and settings.AZURE_SPEECH_KEY:
return "azure"
return "edge"
async def transcribe_file(self, audio_path: str) -> str:
"""Transcribes audio file using the selected strategy."""
if self._mode == "azure":
return await self._transcribe_azure(audio_path)
elif self._mode == "gemini":
return await self._gemini_service.transcribe_audio(audio_path)
else:
return await self._transcribe_whisper(audio_path)
Why This Matters
| Benefit | Description |
|---|---|
| Zero downtime | If Azure fails, Gemini takes over. If Gemini fails, local Whisper runs. |
| Cost optimization | Whisper is free but slower. Azure/Gemini are fast but paid. |
| Easy to extend | Adding a new provider = one new method + one enum value. |
Challenge 2: Structured Output from Unstructured Speech
The Problem
Medical examiners dictate freely:
"The victim Juan Pérez García, male, 32 years old, presents a contusion in the thoracic region. Heart weight: 320 grams, congestive appearance..."
We needed to map this to 13 structured protocol sections with 100% JSON consistency.
The Solution: Schema-Enforced Prompting
# backend/services/gemini_service.py
async def extract_entities(self, text: str) -> dict:
"""Extract medico-legal entities using Gemini with structured output."""
prompt = f"""
Act as a Peruvian forensic expert from IMLCF. Analyze this autopsy text and extract structured information.
DICTATION TEXT:
"{text}"
INSTRUCTIONS:
1. Extract "entities": list of objects with "text" and "type"
(ORGAN, WEIGHT, MEASUREMENT, LESION_TYPE, CONDITION, PERSON, AGE, SEX)
2. Extract "mapped_fields": dictionary with field paths and values
FIELD STRUCTURE (use exact paths):
- "datos_generales.fallecido.nombre": deceased name
- "datos_generales.fallecido.edad": age (number)
- "datos_generales.fallecido.sexo": "M" or "F"
- "examen_interno_torax.corazon.peso": weight in grams (number)
- "examen_interno_torax.corazon.descripcion": description
- "causas_muerte.diagnostico_presuntivo.causa_final.texto": final cause
EXAMPLE response:
{{
"entities": [
{{"text": "Juan Rodríguez", "type": "PERSON"}},
{{"text": "23 años", "type": "AGE"}}
],
"mapped_fields": {{
"datos_generales.fallecido.nombre": "Juan",
"datos_generales.fallecido.edad": 23,
"examen_interno_torax.corazon.peso": 320
}}
}}
Respond ONLY with valid JSON, no markdown.
"""
response = self.model.generate_content(prompt)
clean_text = response.text.replace("```
json", "").replace("
```", "").strip()
return json.loads(clean_text)
The 70% Result
In pilot testing with a medical professional:
| Metric | Manual | With CoronerIA | Improvement |
|---|---|---|---|
| Time per case | ~45 min | ~13 min | -71% |
| Typos/errors | Variable | Near-zero | ✓ |
| Field completeness | 70-80% | 95%+ | ✓ |
Challenge 3: Interactive SVG Anatomical Model
The Problem
Text-only documentation is error-prone. We needed visual feedback showing where on the body each finding was detected.
The Solution: Real-Time Organ Detection
// frontend/src/pages/Dictation.tsx
// Detect organs mentioned in transcription
const detectedOrgans = useMemo(() => {
if (!transcript) return []
const text = transcript.toLowerCase()
const organs: string[] = []
if (text.includes('encéfalo') || text.includes('cerebro'))
organs.push('encefalo')
if (text.includes('pulmón derecho') || text.includes('pulmon derecho'))
organs.push('pulmon_derecho')
if (text.includes('corazón') || text.includes('corazon'))
organs.push('corazon')
if (text.includes('hígado') || text.includes('higado'))
organs.push('higado')
if (text.includes('bazo'))
organs.push('bazo')
return organs
}, [transcript])
SVG Highlighting with CSS Variables
// frontend/src/components/AnatomyModel.tsx
const getOrganStyle = (organ: string): React.CSSProperties => {
const highlighted = highlightedOrgans.includes(organ)
const hovered = hoveredOrgan === organ
return {
fill: highlighted
? 'var(--organ-highlighted)' // Red glow
: hovered
? 'var(--organ-hover)' // Light blue
: 'var(--organ-normal)', // Gray
stroke: highlighted ? 'var(--accent-danger)' : 'var(--border-secondary)',
strokeWidth: highlighted ? 2 : 1,
opacity: highlighted ? 1 : 0.7,
cursor: 'pointer',
transition: 'all 0.2s ease',
}
}
Audio Processing Pipeline
AI Provider Fallback Flow
Lessons Learned
1. Build AI-Agnostic from Day One
Don't hard-code your AI provider. We designed for Azure but developed with Gemini (free). Switching is one config change:
# Current: Gemini (free for development)
GEMINI_API_KEY=AIza...
# Future: Azure (production)
# AZURE_SPEECH_KEY=xxx
# AZURE_OPENAI_KEY=xxx
# AZURE_OPENAI_ENDPOINT=https://xxx.openai.azure.com/
2. Supported Providers (Tested)
| Provider | Speech-to-Text | NER/Extraction | Status |
|---|---|---|---|
| Google Gemini | ✅ Gemini 2.0 Flash | ✅ Gemini 2.0 Flash | Currently using |
| Azure AI | ✅ Azure Speech | ✅ Azure OpenAI (GPT-4) | Ready for production |
| OpenAI | ✅ Whisper API | ✅ GPT-4o | Compatible |
| Local | ✅ faster-whisper | ✅ Regex fallback | Offline mode |
3. Start Offline-First
It's infinitely easier to add cloud features to an offline-capable app than to retrofit offline support to a cloud-first app.
4. Validate with Real Users Early
The 70% time reduction came from a real pilot test with a medical professional, not assumptions. This number is defensible in any interview.
Tech Stack Summary
| Layer | Technology | LOC |
|---|---|---|
| Backend | Python, FastAPI, SQLite | 2,240 |
| Frontend | React, TypeScript, Zustand | 4,191 |
| AI | Gemini 2.0, Azure Speech, Whisper | - |
| DevOps | Docker, docker-compose | - |
| Total | ~6,400 |
What's Next?
CoronerIA was submitted to Microsoft Imagine Cup 2026. Whether we advance or not, the project will be open-sourced to help forensic teams globally.
If you're building medical software with AI, I'd love to connect.
GitHub: CoronerIA Repository
Tags: #ai #python #react #fastapi #opensource #healthtech #microsoftimagecup




Top comments (0)