DEV Community

Santiago Palma
Santiago Palma

Posted on

How I Reduced Forensic Documentation Time by 70% with Hybrid AI

Building provider-independent AI software: From Azure to Gemini to Local Whisper with zero code changes


The Problem: Latin America's Forensic Crisis

Latin America faces a silent humanitarian crisis. According to investigative journalism and government reports:

  • 52,000+ unidentified bodies in Mexico alone (2006-2023)
  • 15,000 forensic specialist deficit in Peru
  • 700+ municipalities in Colombia without permanent forensic coverage

Medical examiners spend hours on manual documentation when they should be investigating. The administrative overhead creates "administrative disappearances" — bodies that enter the system but are never matched with missing persons reports.

I built CoronerIA to solve this. Here's how.


The Key Design Decision: AI-Agnostic Architecture

Before diving into features, let me explain the most important architectural decision: the system is completely AI-provider independent.

Why This Matters

Provider Pros Cons
Azure AI Speech Best accuracy, enterprise support Paid, requires stable internet
Google Gemini Free tier, multimodal capabilities Rate limits on free tier
OpenAI Whisper Open source, runs locally Requires GPU, slower
AWS Transcribe Good for AWS shops Paid, another vendor lock-in

We designed the system to support ALL of them with a single environment variable change. Currently, we use Gemini for development (free tier), but switching to Azure for production requires changing one config line:

# Development (free)
GEMINI_API_KEY=your_key_here

# Production (enterprise)
AZURE_SPEECH_KEY=your_azure_key
AZURE_OPENAI_KEY=your_openai_key
Enter fullscreen mode Exit fullscreen mode

Architecture Overview


Challenge 1: Provider-Agnostic AI with Graceful Fallback

The Problem

Different deployment scenarios need different AI providers:

  • Development: Free tier (Gemini, local Whisper)
  • Staging: Low-cost cloud (Gemini, OpenAI)
  • Production: Enterprise-grade (Azure AI, AWS Transcribe)
  • Offline/Rural: Local models only (Whisper)

We needed a single codebase that works with any provider via configuration.

The Solution: Strategy Pattern

# backend/services/speech_service.py

class SpeechMode(Enum):
    AZURE = "azure"
    EDGE = "edge"
    GEMINI = "gemini"


class SpeechService:
    """Unified Speech-to-Text service with Strategy Pattern."""

    def __init__(self):
        self._mode = self._determine_mode()
        self._azure_recognizer = None
        self._whisper_model = None
        self._gemini_service = None

        if self._mode == "gemini":
            self._gemini_service = GeminiService()

        logger.info(f"SpeechService initialized in mode: {self._mode}")

    def _determine_mode(self) -> str:
        """Determines mode based on config and availability."""
        effective = settings.get_effective_mode()

        # Priority: Gemini > Azure > Local Whisper
        if settings.GEMINI_API_KEY:
            return "gemini"
        if effective == "azure" and settings.AZURE_SPEECH_KEY:
            return "azure"
        return "edge"

    async def transcribe_file(self, audio_path: str) -> str:
        """Transcribes audio file using the selected strategy."""
        if self._mode == "azure":
            return await self._transcribe_azure(audio_path)
        elif self._mode == "gemini":
            return await self._gemini_service.transcribe_audio(audio_path)
        else:
            return await self._transcribe_whisper(audio_path)
Enter fullscreen mode Exit fullscreen mode

Why This Matters

Benefit Description
Zero downtime If Azure fails, Gemini takes over. If Gemini fails, local Whisper runs.
Cost optimization Whisper is free but slower. Azure/Gemini are fast but paid.
Easy to extend Adding a new provider = one new method + one enum value.

Challenge 2: Structured Output from Unstructured Speech

The Problem

Medical examiners dictate freely:

"The victim Juan Pérez García, male, 32 years old, presents a contusion in the thoracic region. Heart weight: 320 grams, congestive appearance..."

We needed to map this to 13 structured protocol sections with 100% JSON consistency.

The Solution: Schema-Enforced Prompting

# backend/services/gemini_service.py

async def extract_entities(self, text: str) -> dict:
    """Extract medico-legal entities using Gemini with structured output."""

    prompt = f"""
    Act as a Peruvian forensic expert from IMLCF. Analyze this autopsy text and extract structured information.

    DICTATION TEXT:
    "{text}"

    INSTRUCTIONS:
    1. Extract "entities": list of objects with "text" and "type" 
       (ORGAN, WEIGHT, MEASUREMENT, LESION_TYPE, CONDITION, PERSON, AGE, SEX)
    2. Extract "mapped_fields": dictionary with field paths and values

    FIELD STRUCTURE (use exact paths):
    - "datos_generales.fallecido.nombre": deceased name
    - "datos_generales.fallecido.edad": age (number)
    - "datos_generales.fallecido.sexo": "M" or "F"
    - "examen_interno_torax.corazon.peso": weight in grams (number)
    - "examen_interno_torax.corazon.descripcion": description
    - "causas_muerte.diagnostico_presuntivo.causa_final.texto": final cause

    EXAMPLE response:
    {{
      "entities": [
        {{"text": "Juan Rodríguez", "type": "PERSON"}},
        {{"text": "23 años", "type": "AGE"}}
      ],
      "mapped_fields": {{
        "datos_generales.fallecido.nombre": "Juan",
        "datos_generales.fallecido.edad": 23,
        "examen_interno_torax.corazon.peso": 320
      }}
    }}

    Respond ONLY with valid JSON, no markdown.
    """

    response = self.model.generate_content(prompt)
    clean_text = response.text.replace("```

json", "").replace("

```", "").strip()
    return json.loads(clean_text)
Enter fullscreen mode Exit fullscreen mode

The 70% Result

In pilot testing with a medical professional:

Metric Manual With CoronerIA Improvement
Time per case ~45 min ~13 min -71%
Typos/errors Variable Near-zero
Field completeness 70-80% 95%+

Challenge 3: Interactive SVG Anatomical Model

The Problem

Text-only documentation is error-prone. We needed visual feedback showing where on the body each finding was detected.

The Solution: Real-Time Organ Detection

// frontend/src/pages/Dictation.tsx

// Detect organs mentioned in transcription
const detectedOrgans = useMemo(() => {
    if (!transcript) return []
    const text = transcript.toLowerCase()
    const organs: string[] = []

    if (text.includes('encéfalo') || text.includes('cerebro')) 
        organs.push('encefalo')
    if (text.includes('pulmón derecho') || text.includes('pulmon derecho')) 
        organs.push('pulmon_derecho')
    if (text.includes('corazón') || text.includes('corazon')) 
        organs.push('corazon')
    if (text.includes('hígado') || text.includes('higado')) 
        organs.push('higado')
    if (text.includes('bazo')) 
        organs.push('bazo')

    return organs
}, [transcript])
Enter fullscreen mode Exit fullscreen mode

SVG Highlighting with CSS Variables

// frontend/src/components/AnatomyModel.tsx

const getOrganStyle = (organ: string): React.CSSProperties => {
    const highlighted = highlightedOrgans.includes(organ)
    const hovered = hoveredOrgan === organ

    return {
        fill: highlighted
            ? 'var(--organ-highlighted)'  // Red glow
            : hovered
                ? 'var(--organ-hover)'    // Light blue
                : 'var(--organ-normal)',  // Gray
        stroke: highlighted ? 'var(--accent-danger)' : 'var(--border-secondary)',
        strokeWidth: highlighted ? 2 : 1,
        opacity: highlighted ? 1 : 0.7,
        cursor: 'pointer',
        transition: 'all 0.2s ease',
    }
}
Enter fullscreen mode Exit fullscreen mode

Audio Processing Pipeline


AI Provider Fallback Flow


Lessons Learned

1. Build AI-Agnostic from Day One

Don't hard-code your AI provider. We designed for Azure but developed with Gemini (free). Switching is one config change:

# Current: Gemini (free for development)
GEMINI_API_KEY=AIza...

# Future: Azure (production)
# AZURE_SPEECH_KEY=xxx
# AZURE_OPENAI_KEY=xxx
# AZURE_OPENAI_ENDPOINT=https://xxx.openai.azure.com/
Enter fullscreen mode Exit fullscreen mode

2. Supported Providers (Tested)

Provider Speech-to-Text NER/Extraction Status
Google Gemini ✅ Gemini 2.0 Flash ✅ Gemini 2.0 Flash Currently using
Azure AI ✅ Azure Speech ✅ Azure OpenAI (GPT-4) Ready for production
OpenAI ✅ Whisper API ✅ GPT-4o Compatible
Local ✅ faster-whisper ✅ Regex fallback Offline mode

3. Start Offline-First

It's infinitely easier to add cloud features to an offline-capable app than to retrofit offline support to a cloud-first app.

4. Validate with Real Users Early

The 70% time reduction came from a real pilot test with a medical professional, not assumptions. This number is defensible in any interview.


Tech Stack Summary

Layer Technology LOC
Backend Python, FastAPI, SQLite 2,240
Frontend React, TypeScript, Zustand 4,191
AI Gemini 2.0, Azure Speech, Whisper -
DevOps Docker, docker-compose -
Total ~6,400

What's Next?

CoronerIA was submitted to Microsoft Imagine Cup 2026. Whether we advance or not, the project will be open-sourced to help forensic teams globally.

If you're building medical software with AI, I'd love to connect.

GitHub: CoronerIA Repository


Tags: #ai #python #react #fastapi #opensource #healthtech #microsoftimagecup

Top comments (0)