Srimukh Vishnubotla

Posted on Apr 27

Building MedAI — An AI-Powered Disease Prediction & Clinical Decision Support System

#rag #ai #showdev #python

MedAI is a Flask-served Python application backed by MongoDB for clinical data persistence. The system combines RAG-powered disease retrieval across 14,000+ Human Disease Ontology entries, rule-based vital risk scoring, medical imaging analysis, and multi-turn AI chat — all running entirely on-device with no cloud dependency.

Team Members

This project was developed by:

@v_srimukh — V. Srimukh
@vamshidhar_reddy — M. Vamshidhar Reddy
@s_narendhar — S. Narendhar
@pavan_sri_ram — A. Pavan Sri Ram

We would like to express our sincere gratitude to @chanda_rajkumar for their invaluable guidance and support throughout this project. Their insights into system design, architecture, and the RAG pipeline played a key role in shaping MedAI.

The Problem We Set Out to Solve

Clinical decision support tools are expensive. Epic, Cerner, UpToDate — the software most hospitals use comes with licensing fees that only large institutions can justify.

When we sat down to plan our PFSD project, we kept coming back to one question: what can four students actually ship in a semester, using only open-source tools?

Turns out, quite a lot. A doctor or student should be able to enter patient vitals, symptoms, and history and get back a structured, grounded risk assessment — in under a minute, on their own laptop. No internet required. No external API calls. Nothing that breaks the moment a service goes down.

Our Solution

We built MedAI — a local-first clinical intelligence system. Enter patient vitals, symptoms, and medical history, and MedAI returns a structured risk assessment grounded in over 14,000 standardised disease definitions from the Human Disease Ontology, powered by a locally-running LLM via Ollama.

💡 The core insight: combining RAG with a structured ontology means the LLM never needs to hallucinate disease definitions — they're all already in HumanDO.obo.

Metric	Value
🟢 Diseases in knowledge base	14,000+ from Human Disease Ontology
API endpoints	7 (including 2 streaming SSE)
Service modules	10 focused modules
Python dependencies	5 packages total

Tech Stack

Web layer: Flask 3.x + Flask-CORS

LLM runtime: Ollama (local) · llama3.2 (text) · llava / llama3.2-vision (imaging)

Knowledge base: Human Disease Ontology · HumanDO.obo — 14,000+ diseases, ICD codes, synonyms

Database: MongoDB · pymongo · 6 collections · TTL-indexed analytics cache

Frontend: Jinja2 templates · Dark/light theme · streaming SSE responses

Dependencies: Flask · Flask-Cors · requests · Pillow · pymongo — that's the entire tree

Why We Chose MongoDB

When we started building MedAI, the biggest challenge wasn't the LLM integration — it was managing clinical data that doesn't fit neatly into rigid schemas. A patient assessment document contains vitals, a list of retrieved disease matches with scores, risk percentages, a full LLM-generated report, and metadata about which Ollama model was used. An imaging record looks completely different. A chat conversation is different again.

With a SQL database, we'd be writing migrations every time we added a new field to the assessment schema. MongoDB let us store each record as a self-contained document that matches the shape of the data naturally — and because each collection is cleanly separated, the system is still easy to query and aggregate across.

💡 MongoDB is also entirely optional in MedAI. If it's not installed, the app starts cleanly and all assessment and chat endpoints still work — data just isn't persisted long-term. That decoupling made the dev loop significantly smoother, especially across machines with different setups.

We also used MongoDB's expireAfterSeconds TTL indexing on the analytics_cache collection for automatic cache cleanup — no cron jobs, no scheduled tasks, no manual scripts.

Application Architecture

MedAI is a single Flask app — it serves the Jinja-rendered frontend and handles a REST/SSE API under /api/*, all from one process. No microservices, no Kubernetes, no cloud infra to manage. The whole thing runs on a regular laptop without breaking a sweat.

{
  "runtime": "Flask 3.x + Jinja2 templates",
  "llm_serving": "Ollama (local, port 11434)",
  "models": {
    "text": "llama3.2 (recommended, ~2GB)",
    "vision": "llava / llama3.2-vision (optional)"
  },
  "knowledge_base": "HumanDO.obo — Human Disease Ontology (14,000+ entries)",
  "mongodb_collections": [
    "patients",
    "clinical_assessments",
    "ai_conversations",
    "reports",
    "imaging_records",
    "analytics_cache"
  ],
  "total_api_endpoints": 7,
  "real_time": "Server-Sent Events (SSE) for streaming assessment tokens",
  "deployment": "python app.py → http://localhost:5000/dashboard"
}

MongoDB Data Model in MedAI

MongoDB sits at the heart of MedAI's persistence layer because clinical data naturally arrives in different shapes. Patient assessments, imaging records, and conversations all have different fields. A rigid SQL design would require joined tables for a single assessment and would break every time we extended the schema.

With MongoDB, each record stays a self-contained document, grouped into six collections for retrieval and analytics.

Collection 1 — clinical_assessments

Every assessment — risk scores, RAG matches, and the LLM report — is stored as a single document after the stream completes:

{
  "assessment_id": "a3f2b1c4-70cf-423b-8e61-153d63756d43",
  "patient_id": "pt-00142",
  "timestamp": "2026-04-07T11:32:09Z",
  "vitals": {
    "age": 54,
    "glucose": 148,
    "blood_pressure": "142/91",
    "temperature": 37.2
  },
  "chief_complaint": "fatigue and increased thirst",
  "risk_scores": {
    "diabetes": 72,
    "hypertension": 65
  },
  "rag_matches": [
    {
      "doid": "DOID:9352",
      "name": "type 2 diabetes mellitus",
      "score": 27,
      "icd": "E11"
    }
  ],
  "model_used": "llama3.2",
  "assessment_text": "Based on the presented vitals..."
}

Collection 2 — imaging_records

Each imaging analysis stores the modality type, model confidence scores, and triage urgency:

{
  "record_id": "img-00089",
  "patient_id": "pt-00142",
  "timestamp": "2026-04-07T12:10:44Z",
  "modality": "chest_xray",
  "vision_model": "llava",
  "confidence_scores": {
    "Pneumonia": 0.72,
    "Pleural effusion": 0.18,
    "Normal": 0.06
  },
  "finding": "Opacity in right lower lobe consistent with consolidation.",
  "triage_urgency": "high"
}

Collection 3 — analytics_cache

Analytics aggregations are cached with a TTL so MongoDB auto-expires stale entries — no cron jobs needed:

{
  "cache_key": "dashboard_overview_2026-04-07",
  "created_at": "2026-04-07T00:00:00Z",
  "expires_at": "2026-04-08T00:00:00Z",
  "data": {
    "total_assessments": 341,
    "avg_risk_diabetes": 58.3,
    "top_conditions": ["type 2 diabetes mellitus", "essential hypertension"]
  }
}

Collections Summary:

Collection	Purpose
`patients`	Demographics and base patient records
`clinical_assessments`	Full assessment documents — vitals, risk scores, RAG matches, LLM report
`ai_conversations`	Multi-turn chat history with DOID references
`reports`	Structured lab report analysis outputs
`imaging_records`	Vision model outputs — confidence scores, triage urgency
`analytics_cache`	TTL-indexed aggregation cache for dashboard metrics

The RAG-to-LLM Pipeline

How a full assessment works, step by step:

Step 1 — Patient data ingested

Age, vitals, symptoms, history, and chief complaint come in via POST /api/assess or the web form. Each field is separate but feeds one structured object.

Step 2 — RAG retrieval

Symptoms become search terms. rag_search() scans all 14,000+ OBO entries using a weighted keyword system and returns the top 8 disease matches:

# rag_search — keyword scoring (api_routes.py)
for w in words:
    if w in name.split():  score += 15   # exact word in name
    elif w in name:        score += 10   # partial name hit
    if w in syns:          score += 8    # synonym match
    if w in defn:          score += 4    # definition match
    if w in pars:          score += 2    # parent category

An exact name match earns 15 points. A synonym hit scores 8. A definition hit adds 4. Entries with both ICD codes and written definitions receive a quality bonus of 2 extra points.

Step 3 — Rule-based risk scoring

Before the LLM touches the data, calculate_risk_scores() converts vitals into risk percentages. Glucose feeds a diabetes score; blood pressure feeds a hypertension score. Each caps at 95%.

Step 4 — Prompt construction

A structured 700–1100 word prompt is built with the ontology context, risk scores, and patient snapshot, then sent to Ollama.

Step 5 — Streamed response

Tokens come back via /api/assess/stream as SSE events. The UI renders them in real time and saves the completed assessment to MongoDB.

Key Features

🟢 Streaming clinical assessment

Responses start showing up almost instantly — tokens stream to the browser via SSE as Ollama generates them. First words appear in under a second, so there's never a blank loading screen.

🔴 RAG disease search

Keyword-scored retrieval across 14,000+ ontology entries, matching disease names, synonyms, ICD codes, and parent categories to surface the most clinically relevant results for each query.

🟡 Medical report analysis

Paste any lab report or upload an image of one. The system picks out key findings, flags abnormal values, suggests possible conditions, and outlines next steps — structured output every time.

🔵 Medical imaging pipeline

Upload chest X-rays, CT scans, or MRIs. If llava is installed, the vision model reads the image directly. If not, an ontology-backed text fallback kicks in automatically.

🟢 Risk scoring engine

A rule-based system converts vitals into risk percentages before the LLM is involved at all. Fast, deterministic, and transparent.

🔵 AI chat assistant

Multi-turn medical Q&A that remembers your conversation and checks live ontology terms in real time. Every response includes sourced DOID references and an explicit AI disclaimer.

Ollama Integration & Resilience

Working with a local LLM means dealing with all the ways it can quietly fail — the model might still be loading, the CPU might time out, or someone configured a model that isn't installed. Early on this burned us during testing, so we built three layers of resilience:

A background probe thread keeps checking Ollama's health every few seconds and updates a shared status dictionary. An exponential backoff retry loop handles transient connection hiccups. A model fallback chain tries llama3.2 → llama3.1 → phi3:mini → phi3 in sequence. And if everything falls apart, a deterministic fallback_assessment() builds a structured report directly from the ontology matches — no LLM required.

# Streaming SSE endpoint — /api/assess/stream
def generate():
    # 1. Send metadata immediately (risk scores, RAG results)
    yield f"data: {json.dumps({'type': 'meta', 'data': meta})}\n\n"

    # 2. Stream tokens as they arrive from Ollama
    for token in stream_ollama(prompt, timeout=90):
        yield f"data: {json.dumps({'type': 'token', 'text': token})}\n\n"

    # 3. Persist to MongoDB, signal completion
    collection.insert_one({...})
    yield f"data: {json.dumps({'type': 'done'})}\n\n"

API Reference

MedAI exposes seven endpoints. Two support streaming via Server-Sent Events:

Endpoint	Description
`POST /api/assess`	Full patient risk assessment (buffered). Returns risk scores, retrieved diseases, RAG reference count, and LLM summary.
`POST /api/assess/stream` ⚡	Same as above via SSE. Streams tokens one by one. Saves to MongoDB on completion.
`POST /api/chat`	Multi-turn medical Q&A with conversation history, DOID references, and AI disclaimer.
`POST /api/analyze-report`	Structured analysis of lab or clinical report text.
`POST /api/analyze-image`	Medical imaging analysis. Falls back to text analysis if no vision model is installed.
`GET /api/search-diseases`	Direct ontology search. Returns up to 50 scored matches with definitions, synonyms, ICD codes, and DOID identifiers.

Running MedAI Locally

Getting it running takes four commands. Python 3.11+ is recommended. Ollama handles model serving over HTTP on port 11434. MongoDB is fully optional.

# 1. Install Python dependencies
pip install -r requirements.txt

# 2. Start Ollama and pull a model
ollama serve
ollama pull llama3.2          # ~2GB, recommended
ollama pull llava             # optional: enables image analysis

# 3. Run the Flask app
python app.py

# 4. Open the dashboard
# http://localhost:5000/dashboard

Set MONGO_ENABLED=false in your .env file if MongoDB isn't installed — the app keeps running without it. To switch models, set OLLAMA_MODEL in .env to override the default llama3.2.

Challenges We Faced

Challenge 1: Ollama resilience

We underestimated how differently Ollama behaves across machines. On one machine the model loaded fine; on another it timed out halfway through a response. Getting the retry logic, fallback chain, and background probe working together reliably took a lot of iteration.

Challenge 2: Streaming + MongoDB writes

The SSE streaming endpoint was trickier than it looked — not the streaming itself, but making sure MongoDB writes happened cleanly after the stream completed without blocking the response.

Challenge 3: Vision model output parsing

Different LLaVA versions format confidence scores differently. parse_confidence_scores() ended up with four regex patterns and a keyword fallback before it was reliable across model versions. The keepalive thread (pinging the vision model every 4 minutes) was added after we noticed the first image request of a session was always slow — a cold-start issue.

Challenge 4: Team coordination

Four people working on one codebase with strong opinions about route structure meant merge conflicts were a regular part of the process. We eventually settled on a rule: no one touches api_routes.py and a service module in the same branch. That helped a lot.

What's Next

The RAG engine is currently keyword-based. Moving to a hybrid dense + sparse approach — FAISS embeddings alongside BM25 — would meaningfully improve recall for unusual or rare symptoms that don't map cleanly to ontology terms. That's the highest-value technical improvement still on the table.

The imaging pipeline is promising, but llava on CPU alone is too slow for real clinical use. Enabling GPU support via OLLAMA_NUM_GPU is the single biggest improvement we could make to the vision side — the pipeline is already built for it.

The audit log and analytics cache infrastructure is already in place. Wiring the aggregation service to real patient data over time could make the risk trend and cohort charts genuinely useful for small clinics that can't afford enterprise analytics tools — which is exactly the kind of impact this project was always aiming for.

MedAI is a Flask-based Python full-stack project that uses MongoDB for clinical persistence — assessments, imaging records, and conversations — across six collections, with TTL-indexed analytics caching and a 14,000-entry Human Disease Ontology powering its RAG retrieval layer.

Built with Flask · Ollama · Human Disease Ontology · MongoDB · April 2026

Team: V. Srimukh, M. Vamshidhar Reddy, S. Narendhar, A. Pavan Sri Ram · Mentor: Chanda Raj Kumar Sir

DEV Community