Building an Offline-First AI Medical Triage App That Runs 100% On-Device

Moses Sunday — Mon, 18 May 2026 22:55:50 +0000

The Problem

Community health workers (CHWs) in remote areas face a brutal choice when they encounter a patient with a concerning wound or rash: send a photo to a distant doctor and wait days for a reply, or make a decision with limited training and no specialist backup.

1 billion people live in areas with fewer than 1 physician per 10,000. Smartphone penetration in these regions is surprisingly high (often >60%), but reliable internet is scarce. Cloud-dependent AI tools simply don't work here.

Trij solves this: a progressive web app that runs Google DeepMind's Gemma 4 entirely on-device, delivering AI-assisted triage in under 10 seconds — no internet required, no patient data ever leaves the phone.

Technical Architecture
The Inference Stack
The core challenge was running a meaningful LLM in the browser. We chose WebLLM with WebGPU as the primary path, with two fallbacks:

Auto-detect: WebGPU available? → WebLLM (Gemma 4 E2B, ~1.5B params, quantized)
Ollama running locally? → Ollama API
Neither? → Demo mode (mock data, no real model)

WebLLM loads the Gemma 4 model into the browser's GPU via WebGPU. The first load takes ~30 seconds to download and compile ~1.5GB of model weights (cached for subsequent visits). Subsequent inferences complete in under 10 seconds on devices with 4GB+ RAM.

Offline-First Storage
All patient data is stored in IndexedDB via Dexie.js. The schema:

// Patients interface Patient { id: string; chwUserId: string; identifier: string; ageYears?: number; sex: 'M' | 'F' | 'other'; locationLat?: number; locationLng?: number; createdAt: string; updatedAt:string;
}`

// Assessments interface Assessment { id: string; patientId: string; images: string[]; // base64 locally, URLs when synced condition: string; confidence: number; urgency: 'green' | 'yellow' | 'red'; possibleConditions: Array<{ name: string; probability: number }>; recommendation: string; referralStatus: 'none' | 'pending' | 'active' | 'resolved'; language: string; createdAt: string; }

A background sync engine processes a queue when connectivity returns, uploading records to Supabase with last-write-wins conflict resolution.

Voice-Guided Assessments
Using the Web Speech API for both synthesis and recognition, the app supports 7 languages. The voice flow is a dynamic conversation tree:

"Who is the patient? Say the ID number."
Patient responds → parsed for identifier
"Frame the affected area in the camera."
Photo captured → analyzed
"The assessment shows [condition] with [confidence]% confidence."
Dynamic follow-ups based on the result The conversation state is persistable — if the CHW is interrupted, they can resume where they left off.

Medical Document Analysis
The same Gemma 4 model handles document analysis through carefully crafted prompts. The system prompt instructs the model to:

Extract structured findings from lab reports
Highlight abnormal values
Generate plain-language explanations
Classify the document type (lab report, prescription, referral letter)

Key Challenges & Solutions
Challenge 1: Model Loading Time
The 1.5GB model download on first visit is painful. We show a progress bar with percentage and estimated time, and cache aggressively via the service worker. On subsequent visits, the model loads from cache in ~5 seconds.

Challenge 2: WebGPU Availability
WebGPU is currently Chrome-only on desktop and Android. We detect availability upfront and show a clear message if it's not available, guiding users to Ollama or demo mode.

Challenge 3: Voice Recognition Accuracy
Speech recognition in rural environments (noisy, varying accents) is unreliable. We implemented a hybrid approach: voice input plus a text fallback. Users can speak or type answers.

Challenge 4: Hallucination Risk
Like all LLMs, Gemma 4 can hallucinate. Every assessment includes:

A confidence score (0-100%)
Differential diagnoses with probability breakdown
An auto-suggestion to refer when confidence < 70%
A prominent medical disclaimer
The Stack

Frontend: React 19, TanStack Start (SSR), Vite
AI: WebLLM + WebGPU (Gemma 4 E2B), Ollama fallback
Styling: Tailwind CSS v4, shadcn/ui
Offline Storage: Dexie.js (IndexedDB)
Voice: Web Speech API

Backend: Supabase (Auth, PostgreSQL, Storage, RLS)
PWA: vite-plugin-pwa (installable on Android/iOS)
PDF Generation: jsPDF + qrcode
Try It
The app is live and free to use:

🔗 https://trij.vercel.app

No account needed — you can use demo mode immediately.

Open Source
Trij is Apache 2.0. Contributions welcome:

🐙 https://github.com/Mosss-OS/trij

We need help with:

Additional language packs
UI/UX polish
Real-world testing
Clinical validation
Performance optimization

What's Next

ffline map tiles for the supervisor view
PWA background sync for iOS
More granular condition classification
Integration with common health record systems

📧 triij.app@gmail.com 🐦 https://x.com/Trij_app

Open Source
Trij is Apache 2.0. Contributions welcome:

🐙 https://github.com/Mosss-OS/trij

We need help with:

Additional language packs
UI/UX polish
Real-world testing
Clinical validation
Performance optimization
What's Next
Offline map tiles for the supervisor view
PWA background sync for iOS
More granular condition classification
Integration with common health record systems
Built for the Gemma 4 Good Hackathon (Kaggle x Google DeepMind). Track: Health & Sciences / Global Resilience.

📧 triij.app@gmail.com 🐦 https://x.com/Trij_app

DEV Community: Moses Sunday

Building an Offline-First AI Medical Triage App That Runs 100% On-Device