Santiago Palma

Posted on Jan 15

How I Reduced Forensic Documentation Time by 70% with Hybrid AI

#programming #python #ai #webdev

Building provider-independent AI software: From Azure to Gemini to Local Whisper with zero code changes

The Problem: Latin America's Forensic Crisis

Latin America faces a silent humanitarian crisis. According to investigative journalism and government reports:

52,000+ unidentified bodies in Mexico alone (2006-2023)
15,000 forensic specialist deficit in Peru
700+ municipalities in Colombia without permanent forensic coverage

Medical examiners spend hours on manual documentation when they should be investigating. The administrative overhead creates "administrative disappearances" — bodies that enter the system but are never matched with missing persons reports.

I built CoronerIA to solve this. Here's how.

The Key Design Decision: AI-Agnostic Architecture

Before diving into features, let me explain the most important architectural decision: the system is completely AI-provider independent.

Why This Matters

Provider	Pros	Cons
Azure AI Speech	Best accuracy, enterprise support	Paid, requires stable internet
Google Gemini	Free tier, multimodal capabilities	Rate limits on free tier
OpenAI Whisper	Open source, runs locally	Requires GPU, slower
AWS Transcribe	Good for AWS shops	Paid, another vendor lock-in

We designed the system to support ALL of them with a single environment variable change. Currently, we use Gemini for development (free tier), but switching to Azure for production requires changing one config line:

# Development (free)
GEMINI_API_KEY=your_key_here

# Production (enterprise)
AZURE_SPEECH_KEY=your_azure_key
AZURE_OPENAI_KEY=your_openai_key

Architecture Overview

Challenge 1: Provider-Agnostic AI with Graceful Fallback

The Problem

Different deployment scenarios need different AI providers:

Development: Free tier (Gemini, local Whisper)
Staging: Low-cost cloud (Gemini, OpenAI)
Production: Enterprise-grade (Azure AI, AWS Transcribe)
Offline/Rural: Local models only (Whisper)

We needed a single codebase that works with any provider via configuration.

The Solution: Strategy Pattern

# backend/services/speech_service.py

class SpeechMode(Enum):
    AZURE = "azure"
    EDGE = "edge"
    GEMINI = "gemini"


class SpeechService:
    """Unified Speech-to-Text service with Strategy Pattern."""

    def __init__(self):
        self._mode = self._determine_mode()
        self._azure_recognizer = None
        self._whisper_model = None
        self._gemini_service = None

        if self._mode == "gemini":
            self._gemini_service = GeminiService()

        logger.info(f"SpeechService initialized in mode: {self._mode}")

    def _determine_mode(self) -> str:
        """Determines mode based on config and availability."""
        effective = settings.get_effective_mode()

        # Priority: Gemini > Azure > Local Whisper
        if settings.GEMINI_API_KEY:
            return "gemini"
        if effective == "azure" and settings.AZURE_SPEECH_KEY:
            return "azure"
        return "edge"

    async def transcribe_file(self, audio_path: str) -> str:
        """Transcribes audio file using the selected strategy."""
        if self._mode == "azure":
            return await self._transcribe_azure(audio_path)
        elif self._mode == "gemini":
            return await self._gemini_service.transcribe_audio(audio_path)
        else:
            return await self._transcribe_whisper(audio_path)

Why This Matters

Benefit	Description
Zero downtime	If Azure fails, Gemini takes over. If Gemini fails, local Whisper runs.
Cost optimization	Whisper is free but slower. Azure/Gemini are fast but paid.
Easy to extend	Adding a new provider = one new method + one enum value.

Challenge 2: Structured Output from Unstructured Speech

The Problem

Medical examiners dictate freely:

"The victim Juan Pérez García, male, 32 years old, presents a contusion in the thoracic region. Heart weight: 320 grams, congestive appearance..."

We needed to map this to 13 structured protocol sections with 100% JSON consistency.

The Solution: Schema-Enforced Prompting

# backend/services/gemini_service.py

async def extract_entities(self, text: str) -> dict:
    """Extract medico-legal entities using Gemini with structured output."""

    prompt = f"""
    Act as a Peruvian forensic expert from IMLCF. Analyze this autopsy text and extract structured information.

    DICTATION TEXT:
    "{text}"

    INSTRUCTIONS:
    1. Extract "entities": list of objects with "text" and "type" 
       (ORGAN, WEIGHT, MEASUREMENT, LESION_TYPE, CONDITION, PERSON, AGE, SEX)
    2. Extract "mapped_fields": dictionary with field paths and values

    FIELD STRUCTURE (use exact paths):
    - "datos_generales.fallecido.nombre": deceased name
    - "datos_generales.fallecido.edad": age (number)
    - "datos_generales.fallecido.sexo": "M" or "F"
    - "examen_interno_torax.corazon.peso": weight in grams (number)
    - "examen_interno_torax.corazon.descripcion": description
    - "causas_muerte.diagnostico_presuntivo.causa_final.texto": final cause

    EXAMPLE response:
    {{
      "entities": [
        {{"text": "Juan Rodríguez", "type": "PERSON"}},
        {{"text": "23 años", "type": "AGE"}}
      ],
      "mapped_fields": {{
        "datos_generales.fallecido.nombre": "Juan",
        "datos_generales.fallecido.edad": 23,
        "examen_interno_torax.corazon.peso": 320
      }}
    }}

    Respond ONLY with valid JSON, no markdown.
    """

    response = self.model.generate_content(prompt)
    clean_text = response.text.replace("```

json", "").replace("

```", "").strip()
    return json.loads(clean_text)

The 70% Result

In pilot testing with a medical professional:

Metric	Manual	With CoronerIA	Improvement
Time per case	~45 min	~13 min	-71%
Typos/errors	Variable	Near-zero	✓
Field completeness	70-80%	95%+	✓

Challenge 3: Interactive SVG Anatomical Model

The Problem

Text-only documentation is error-prone. We needed visual feedback showing where on the body each finding was detected.

The Solution: Real-Time Organ Detection

// frontend/src/pages/Dictation.tsx

// Detect organs mentioned in transcription
const detectedOrgans = useMemo(() => {
    if (!transcript) return []
    const text = transcript.toLowerCase()
    const organs: string[] = []

    if (text.includes('encéfalo') || text.includes('cerebro')) 
        organs.push('encefalo')
    if (text.includes('pulmón derecho') || text.includes('pulmon derecho')) 
        organs.push('pulmon_derecho')
    if (text.includes('corazón') || text.includes('corazon')) 
        organs.push('corazon')
    if (text.includes('hígado') || text.includes('higado')) 
        organs.push('higado')
    if (text.includes('bazo')) 
        organs.push('bazo')

    return organs
}, [transcript])

SVG Highlighting with CSS Variables

// frontend/src/components/AnatomyModel.tsx

const getOrganStyle = (organ: string): React.CSSProperties => {
    const highlighted = highlightedOrgans.includes(organ)
    const hovered = hoveredOrgan === organ

    return {
        fill: highlighted
            ? 'var(--organ-highlighted)'  // Red glow
            : hovered
                ? 'var(--organ-hover)'    // Light blue
                : 'var(--organ-normal)',  // Gray
        stroke: highlighted ? 'var(--accent-danger)' : 'var(--border-secondary)',
        strokeWidth: highlighted ? 2 : 1,
        opacity: highlighted ? 1 : 0.7,
        cursor: 'pointer',
        transition: 'all 0.2s ease',
    }
}

Audio Processing Pipeline

AI Provider Fallback Flow

Lessons Learned

1. Build AI-Agnostic from Day One

Don't hard-code your AI provider. We designed for Azure but developed with Gemini (free). Switching is one config change:

# Current: Gemini (free for development)
GEMINI_API_KEY=AIza...

# Future: Azure (production)
# AZURE_SPEECH_KEY=xxx
# AZURE_OPENAI_KEY=xxx
# AZURE_OPENAI_ENDPOINT=https://xxx.openai.azure.com/

2. Supported Providers (Tested)

Provider	Speech-to-Text	NER/Extraction	Status
Google Gemini	✅ Gemini 2.0 Flash	✅ Gemini 2.0 Flash	Currently using
Azure AI	✅ Azure Speech	✅ Azure OpenAI (GPT-4)	Ready for production
OpenAI	✅ Whisper API	✅ GPT-4o	Compatible
Local	✅ faster-whisper	✅ Regex fallback	Offline mode

3. Start Offline-First

It's infinitely easier to add cloud features to an offline-capable app than to retrofit offline support to a cloud-first app.

4. Validate with Real Users Early

The 70% time reduction came from a real pilot test with a medical professional, not assumptions. This number is defensible in any interview.

Tech Stack Summary

Layer	Technology	LOC
Backend	Python, FastAPI, SQLite	2,240
Frontend	React, TypeScript, Zustand	4,191
AI	Gemini 2.0, Azure Speech, Whisper	-
DevOps	Docker, docker-compose	-
Total		~6,400

What's Next?

CoronerIA was submitted to Microsoft Imagine Cup 2026. Whether we advance or not, the project will be open-sourced to help forensic teams globally.

If you're building medical software with AI, I'd love to connect.

GitHub: CoronerIA Repository

Tags: #ai #python #react #fastapi #opensource #healthtech #microsoftimagecup

DEV Community

How I Reduced Forensic Documentation Time by 70% with Hybrid AI

The Problem: Latin America's Forensic Crisis

The Key Design Decision: AI-Agnostic Architecture

Why This Matters

Architecture Overview

Challenge 1: Provider-Agnostic AI with Graceful Fallback

The Problem

The Solution: Strategy Pattern

Why This Matters

Challenge 2: Structured Output from Unstructured Speech

The Problem

The Solution: Schema-Enforced Prompting

The 70% Result

Challenge 3: Interactive SVG Anatomical Model

The Problem

The Solution: Real-Time Organ Detection

SVG Highlighting with CSS Variables

Audio Processing Pipeline

AI Provider Fallback Flow

Lessons Learned

1. Build AI-Agnostic from Day One

2. Supported Providers (Tested)

3. Start Offline-First

4. Validate with Real Users Early

Tech Stack Summary

What's Next?

Top comments (0)