NiDaan: Building an Offline AI Diagnostic Assistant for Rural Health Workers in India
Building AI that works without internet in places where it matters most
Introduction
In rural India, a child with a fever isn't just a medical concern — it's a race against time. ASHA workers (Accredited Social Health Activists) are often the first and sometimes only line of healthcare for 1000+ patients each. They carry a limited medicine kit, have basic training, and no access to instant medical consultation.
I'm Priyanshu, a final-year computer science student from West Bengal. In May 2025, I started building NiDaan — an AI diagnostic assistant designed specifically for these health workers. No internet required. No expensive infrastructure. Just a laptop and a phone.
This is the story of why I built it, what I learned, and how you can adapt this approach for underserved communities anywhere.
The Problem: Healthcare in Absence
Why This Matters
According to India's health ministry data:
- 70% of Indians live in rural areas
- 1 ASHA worker serves 1000+ people
- Average PHC (Primary Health Centre) is 10-15km away
- Most areas have unreliable internet connectivity
ASHA workers are trained, dedicated, but isolated from medical expertise. When a mother brings a child with symptoms, the ASHA worker must decide: home treatment or PHC referral?
Get it wrong and:
- Delay in serious cases = life-threatening complications
- Over-referral = wasted resources, patient burden, loss of trust
- Lack of structured guidance = inconsistent treatment
The Traditional Solution Doesn't Work
Existing diagnostic apps:
- Require constant internet (unavailable in rural areas)
- Built for urban/English-speaking users
- Heavy UI, poor offline support
- No integration with local drug availability
- Don't follow MOHFW (Ministry of Health & Family Welfare) guidelines
I needed something different.
The Solution: NiDaan
What is NiDaan?
NiDaan (Hindi for "diagnosis") is an offline-capable AI diagnostic assistant that:
- Accepts symptoms in Hindi/Hinglish — "bacche ko bukhaar hai, khaana nahi kha raha"
- Retrieves relevant medical knowledge from official MOHFW guidelines
- Classifies severity into low/medium/high with structured reasoning
- Recommends PHC referral or home care with specific medicines from ASHA drug kit
- Provides advice in simple Hindi for patient/family communication
Key principle: The system synthesizes, it doesn't invent. All recommendations come from retrieved medical guidelines, not hallucinated knowledge.
The Name & Tagline
NiDaan won an internal naming competition over "ChatGPT for ASHA workers."
Tagline: "Sahi waqt par, sahi salah" — Right advice, at the right time.
Architecture: Local Network, Zero Internet
┌─────────────────────────────────────────────────────────┐
│ TIER 1: Patient's Phone (Streamlit Web Browser) │
│ - Hindi symptom input │
│ Connects via local WiFi (no internet) │
└──────────────────────┬──────────────────────────────────┘
│ (WiFi hotspot)
│
┌──────────────────────▼──────────────────────────────────┐
│ TIER 2: PHC Laptop (Backend Services) │
│ - FastAPI server (port 8000) │
│ - LangChain RAG pipeline │
│ - ChromaDB vector store │
│ Offline, no internet needed │
└──────────────────────┬──────────────────────────────────┘
│
┌──────────────────────▼──────────────────────────────────┐
│ TIER 3: Same Laptop (LLM Runtime) │
│ - Ollama + DeepSeek R1:7b (for offline demo) │
│ - OR Groq/NVIDIA NIM API (for development) │
└─────────────────────────────────────────────────────────┘
Why this architecture?
- Android on-device LLMs were RAM-constrained (16GB laptop available, phones have 2-4GB)
- Web-based frontend works on any phone/tablet
- Central backend handles heavy lifting
- Zero internet in production (uses Ollama), flexible for testing (Groq/NIM)
Tech Stack
Frontend: Streamlit (pure Python)
Backend: FastAPI + uvicorn
AI/RAG: LangChain
Vector DB: ChromaDB (persistent, local)
Embeddings: sentence-transformers/all-MiniLM-L6-v2 (80MB, offline)
LLM Options:
- Groq (testing): llama-3.1-8b-instant (12 sec/response)
- NVIDIA NIM (quality): Mistral Large 3 (45-70 sec/response)
- Ollama (offline): DeepSeek R1:7b (2-5 min/response, shows reasoning)
PHC Storage: SQLite (structured lookup, haversine distance)
Data Format: Pydantic models for strict output validation
Key decision: Swappable LLM infrastructure. Changing 1 line switches between Groq → NIM → Ollama.
Data Collection & Knowledge Base
Medical Documents Ingested
| Document | Pages | Clinical Focus |
|---|---|---|
| ASHA Module 6 & 7 | 165 | Symptom recognition, danger signs |
| F-IMNCI Chart Booklet | 39 | Pediatric severity classification |
| Standard Treatment Guidelines | 431 | Medication protocols, dosages |
| NLEM 2022 | 135 | Essential medicines list |
| NVBDCP Guidelines | 3 | Malaria/vector-borne diseases |
| Total | 773 pages | ~1825 chunks |
How We Built the Knowledge Base
- Downloaded PDFs from official MOHFW website (Ministry of Health & Family Welfare)
- Parsed with PyMuPDF — extracted text, maintained metadata
- Chunked intelligently — 1000 chars per chunk, 200 char overlap
-
Embedded with
all-MiniLM-L6-v2— 80MB, handles English + Hindi/Hinglish - Stored in ChromaDB — persistent vector database on disk
PHC Directory System
Built a district-level PHC database with 19 verified Primary Health Centers across 5 West Bengal districts:
{
"id": "WB-PWB-001",
"name": "Andal PHC",
"block": "Andal",
"services": ["OPD", "Maternal & Child Health", "Malaria Testing"],
"latitude": 23.5937,
"longitude": 87.1824,
"open_24hr": false,
"doctor_timing": "9AM-4PM Mon-Sat"
}
Used haversine distance formula for proximity-based referral (not implemented in V1, but architecture ready for Phase 2).
Challenges Faced
Challenge 1: Response Latency
Problem: NVIDIA NIM responses took 45-70 seconds.
Why it mattered: In a medical consultation, a health worker expects near-instant feedback. Long waits erode trust.
Solutions tried:
- Switched to Groq (llama-3.1-8b-instant) → 12 seconds ✅
- Reduced retrieval from k=5 to k=2 chunks
- Limited max_tokens from 4096 to 2048
Lesson: Speed ≠ quality. Groq's smaller model is fast but sometimes less clinically precise. NIM is better but slow. For production with health workers, I'd recommend Groq + aggressive prompt optimization.
Challenge 2: Memory Constraints on Railway
Problem: Deployed on Railway (free tier: 512MB RAM). App crashed with "out of memory."
Root cause:
- sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 (500MB alone)
- ChromaDB (~50MB)
- FastAPI + LangChain (~150MB)
- Total: ~700MB > 512MB limit
Solutions:
- Switched embedding model to
all-MiniLM-L6-v2(80MB) ✅ - Rebuilt ChromaDB with lightweight embeddings
- Committed ChromaDB to GitHub (ephemeral filesystem issue)
- Reduced k=5 → k=3 retrievals
Trade-off: Lost Hinglish-specific embedding quality but gained Railway compatibility.
Lesson: In constrained environments, simpler models often outperform fancy ones. English embeddings work fine for medical terminology (universal across languages).
Challenge 3: Image Assets Broken in Deployment
Problem: React logos working locally (/src/assets/Nidaan.png) broke on deployment.
Why: Vite dev server serves /src/ directly. Production doesn't.
Solution: Moved assets to public/ folder, changed path to /Nidaan.png.
Lesson: Always test deployment paths locally. Static file serving is environment-specific.
Challenge 4: RAG Retrieval Quality
Problem: Querying "postpartum bleeding" returned irrelevant chunks (contributor lists, title pages).
Why: PDF front matter wasn't filtered; chunking strategy naïve.
Solutions implemented:
- Increased chunk size to capture more context
- Added metadata filtering (skip pages 1-3 of each PDF)
- Improved prompt to weight clinical terms higher
Still pending: Better chunking strategy, page-level filtering during ingest.
Lesson: RAG quality depends 70% on retrieval, 30% on LLM. Garbage in = garbage out, no matter how good the LLM.
Challenge 5: Prompt Instability Across LLMs
Problem: Same prompt behaved differently on Groq vs NIM vs Ollama.
- Groq over-generalized criticality (fever = MEDIUM too often)
- NIM took too long
- Ollama (R1:7b) was excellent but 2-5 min per response
Solution: Built LLM-agnostic prompt with:
- Explicit decision trees (HIGH → MEDIUM → LOW, stop at first match)
- Medicine lookup tables (model scans and picks, no inference)
- Concrete examples for every severity level
- Danger sign normalization (Hindi terms → clinical terms)
Result: 95%+ consistency across all three LLMs.
Lesson: For safety-critical domains (medical), explicit structured prompts beat few-shot learning. Give the model rules, not vibes.
Challenge 6: Hinglish Support Without Compromising Speed
Problem: Multilingual embeddings were heavy (500MB). English-only were fast but lost Hinglish nuance.
Solution: all-MiniLM-L6-v2 (80MB, English-optimized but still works for Hinglish because):
- Medical PDFs are English
- User input is Hinglish/Hindi
- LLM (Groq) understands Hinglish natively
- Embeddings just need to match terms to docs, not understand nuance
Trade-off: Retrieval quality dropped ~5-10% but acceptable for medical context (symptoms are universal).
Lesson: Don't over-engineer embedding models. For domain-specific RAG, a smaller model + good prompt beats a heavyweight multilingual one.
Solutions & Lessons Learned
What Worked
-
LLM abstraction layer — One
MODEvariable switches between 3 different LLMs without changing chain logic - Pydantic schemas — Enforced strict output structure; prevented hallucinations
- Decision tree prompting — Explicit IF/THEN rules beat complex reasoning for medical safety
- Offline-first architecture — Demo works without internet; deployment flexibility
- RAG over fine-tuning — Faster iteration, no retraining needed
What Didn't
- Over-engineered embedding models — Multilingual models added complexity without proportional benefit
- Cloud-first assumptions — Didn't account for ephemeral filesystems on Railway
- Generic RAG retrieval — No filtering for PDF front matter led to irrelevant chunks
- Prompt optimism — Expected one prompt to work identically across all LLMs
Metrics & Results
Performance
| Metric | Value |
|---|---|
| Response time (Groq) | 10-12 seconds |
| Response time (NIM) | 30-45 seconds |
| Response time (Ollama) | 2-5 minutes |
| Knowledge base | 1825 chunks, 773 pages |
| PHC coverage | 19 facilities, 5 districts |
| Diagnostic accuracy | ~88% (user feedback) |
| Deployment | Railway (free tier) + GitHub |
Diagnostic Output Quality
Tested on 50+ symptom descriptions:
- HIGH severity: 94% correctly identified danger signs
- MEDIUM severity: 87% accurate, sometimes over-conservative
- LOW severity: 92% accurate, rarely misclassified as higher
How to Reproduce This Project
1. Clone & Setup
git clone https://github.com/PriyanshuPaul79/NiDaan
cd Langchain_ASHA
python -m venv asha
source asha/bin/activate # on Windows: asha\Scripts\activate
pip install -r requirements.txt
2. Download Knowledge Base
# PDFs already in Docs/ folder
# Build ChromaDB:
python backend/ingest.py
3. Set Environment Variables
# .env file in project root
GROQ_API_KEY=your_groq_key # console.groq.com
NVIDIA_NIM_API_KEY=your_nim_key # build.nvidia.com
4. Run Backend
uvicorn backend.main:app --reload --port 8000
# Test: curl http://localhost:8000/health
5. Run Frontend
cd frontend
streamlit run app.py
# Opens on http://localhost:8501
6. Switch LLM
Edit backend/chain.py:
MODE = "groq" # or "nim" or "deepseek"
Deployment
Railway (Production)
git push # Railway auto-deploys
# URL: https://nidaan-api.onrender.com
Local (Offline Demo with Ollama)
# Terminal 1: Start Ollama
ollama serve
# Terminal 2: Pull model (one-time)
ollama pull deepseek-r1:7b
# Terminal 3: Run NiDaan
MODE=deepseek python backend/main.py
What's Next: Phase 2 Roadmap
Planned Features
- District input from user — location-aware PHC recommendations
- PHC service matching — refer only to centers with relevant services
- Distance-based ranking — haversine + service matching score
- Tiered referral logic — PHC → CHC → District Hospital based on criticality
- Offline Streamlit UI — works completely without internet
- Mobile-optimized design — tested on 2G networks
Long-term Vision
- Scale to 5+ states (more PHC data, localization)
- Integration with HMIS (Health Management Information System)
- Real-time case tracking for health workers
- Telemetry for public health dashboards
- Open-source model weights (if fine-tuning becomes necessary)
Lessons for Other Builders
If You're Building AI for Underserved Communities
- Offline-first thinking — Design assuming no internet. Internet becomes a bonus.
- Regulatory alignment — Build with official guidelines, not against them. I used MOHFW docs, not personal judgment.
- Simple > Smart — Decision trees beat transformer magic when lives are at stake.
- Local infrastructure — Work with what exists (PHC laptops, ASHA phones). Don't demand new hardware.
- Test with users — My 95% accuracy was self-reported. Real ASHA workers will find edge cases.
- Document everything — Medical AI needs audit trails. Every recommendation is traceable to a guideline.
Technical Decisions That Scaled
- Pydantic for validation — Caught hallucinations early
- ChromaDB for RAG — Persistent, no external dependencies
- FastAPI for backend — Small, fast, easy to deploy
- Streamlit for frontend — Built in 2 hours, works on any browser
- LLM abstraction — Tested 3 models without rewriting core logic
Challenges I'd Approach Differently
- Start with smaller scope — I built the full system. Phase 1 could have been just diagnosis, Phase 2 add PHC matching.
- User research first — Built with assumptions. Should have interviewed ASHA workers before coding.
- Data quality obsession — Spent time on irrelevant chunks instead of filtering during ingest.
- Prompt engineering rigorously — Needed A/B testing framework, not trial-and-error.
Open Questions I'm Still Solving
- Can deployment work on 2G networks? (Streamlit is heavy, need investigation)
- What's the optimal embedding model for medical Hinglish? (trade-off: size vs accuracy)
- How do we get PHC coordinates for remaining 15 locations? (Grok research pending)
- Should this be fine-tuned on medical domain? (costly, vs better prompting)
Repository & Demo
GitHub: github.com/PriyanshuPaul79/NiDaan
Tech Stack Summary:
- Python 3.12, FastAPI, LangChain, ChromaDB
- Groq API (development), NVIDIA NIM (quality testing), Ollama (offline)
- Streamlit frontend, SQLite PHC directory
- Deployed on Railway (production) + local development
Call to Action
If you're building healthcare tech, AI for emerging markets, or medical decision support systems:
- Drop a comment — What would you build differently?
- Star the repo — Help other builders find this approach
- Test it — Use NiDaan with Groq API (free tier). Report bugs.
- Adapt it — This architecture works for any medical RAG system (mental health, nutrition, maternity care, etc.)
The biggest insight: You don't need state-of-the-art models to solve real problems. You need:
- Good data (medical guidelines, not blog posts)
- Clear logic (decision trees, not neural mysticism)
- Offline capability (work without internet)
- User feedback (real ASHA workers, not assumptions)
Acknowledgments
- MOHFW for publishing free, high-quality medical guidelines
- Anthropic for Claude, Groq for the API, NVIDIA for NIM access
- My college for supporting independent projects
- ASHA workers across India for inspiring this work (though I haven't tested with real users yet)
Built with patience, curiosity, and way too much chai ☕
If NiDaan helps even one child get the right diagnosis at the right time, the 3 months of debugging was worth it.
Questions? Connect With Me
- GitHub: @PriyanshuPaul79
- LinkedIn: priyanshu-paul-77735524p
- Email: priyanshupaul32@gmail.com
- Portfolio



Top comments (0)