DEV Community

Cover image for How Gemma 4 Can Help Solve the Trillion-Dollar Oversight in Women’s Health
Inna Campo
Inna Campo

Posted on

How Gemma 4 Can Help Solve the Trillion-Dollar Oversight in Women’s Health

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4


By 2030, an estimated 1.2 billion women will be experiencing perimenopause or menopause. The research literature connecting this transition to cardiovascular changes, musculoskeletal symptoms, mood disruption, and metabolic shifts exists, but it's scattered across peer-reviewed papers that no clinician has time to synthesize across specialties. The result is fragmented care: a cardiologist treating elevated LDL, an orthopedist treating frozen shoulder, a GP prescribing an SSRI for anxiety. Each doctor is correct in isolation, but none aware they're looking at the same hormonal root cause.

AXIOM (Automated X-reference Intelligence & Open Medicine) is a local-first scientific evidence engine that closes this gap. It ingests peer-reviewed literature, builds a structured knowledge graph of typed medical relationships, and answers clinical questions grounded entirely in cited evidence, with a privacy-first architecture designed for local deployment.


Demo

The demo covers: system status overview, live PubMed retrieval showing the relationships between adhesive capsulitis and estrogen, a five-symptom hybrid RAG query (perimenopause + frozen shoulder + elevated LDL + anxiety + sleep disruption) with graph triples expanded, and the graph explorer with path-finding between cardiovascular disease and insomnia.

The system has the following integrated layers:

Ingestion. PDFs are chunked, embedded via nomic-embed-text, and stored in ChromaDB. In parallel, each chunk goes to Gemma 4 for named entity recognition and relation extraction, producing typed triples like Estrogen Decline →[REDUCES_RISK]→ Bone Density. These are stored in a directed NetworkX graph with confidence scores.

Privacy Guard. Every query passes through a two-stage PHI scrubber before any model call: regex strips obvious identifiers, then Gemma rewrites the result into a generalised clinical research question. If Gemma fails, the regex output is the safe fallback.

Hybrid Retrieval. At query time: Gemma 4 extracts medical entities from the question as a JSON array, those entities are fuzzy-matched against graph nodes, and ChromaDB returns top-K chunks re-ranked by a composite score combining semantic similarity, study-type weight (RCTs over case reports), recency, and a graph-entity boost.

Three-Mode Answer Chain. chain.py routes to direct (parametric), vector RAG, or hybrid graph RAG. In hybrid mode the prompt presents structured triples and paper excerpts as explicitly labelled sections with different epistemic weights and graph relationships are hypotheses, excerpts are primary sources. All generation calls are serialised through a single-consumer LLMQueue to prevent concurrent Ollama requests from saturating local hardware.


Code

https://github.com/HARMONI-Lab/axiom

Stack: Python 3.11 · FastAPI · ChromaDB · NetworkX · Ollama · FastMCP · Biopython/NCBI Entrez · React 19 + Vite frontend

Running a 31B model locally with concurrent FastAPI requests causes queuing failures and dropped responses. To mitigate this a single-consumer asyncio queue that serialises all generation calls through one worker was implemented. Every endpoint caller awaits a Future allowing the queue to drain one job at a time. This sacrifices parallelism for reliability, which is the correct trade-off on a single-GPU machine.

class LLMQueue:
    """Single-consumer async queue in front of the LLM.

    Prevents concurrent Ollama requests from saturating local hardware
    by serialising all generation calls through one worker coroutine.
    Callers await a Future — the queue handles the rest.
    """

    async def submit(self, user_message: str, system_prompt: str = "") -> str:
        self._ensure_init()
        loop = asyncio.get_running_loop()
        fut: asyncio.Future = loop.create_future()
        job = LLMJob(user_message=user_message, system_prompt=system_prompt, future=fut)
        await self._queue.put(job)
        return await fut  # caller blocks here until worker completes the job

    async def _run(self):
        while True:
            job: LLMJob = await self._queue.get()
            try:
                result = await llm.generate_async(job.user_message, job.system_prompt)
                if not job.future.done():
                    job.future.set_result(result)
            except Exception as e:
                if not job.future.done():
                    job.future.set_exception(e)
            finally:
                self._queue.task_done()
Enter fullscreen mode Exit fullscreen mode

In production, replacing the lazy _ensure_init() check with an explicit startup/shutdown lifecycle and handling caller cancellation would prevent in-flight work from being wasted when clients disconnect and ensure the queue drains cleanly on container restart.


How I Used Gemma 4

I chose gemma4:31b Dense via Ollama, and here is why it was the right fit for this use case.

Why 31B and not the smaller models. The E2B and E4B models are optimized for edge and mobile deployment. AXIOM's ingestion pipeline asks Gemma to do something genuinely hard: read a paper chunk and determine not just which medical concepts appear, but whether the paper is asserting a causal relationship between them—as opposed to merely mentioning both. Identifying that a sentence is claiming estrogen decline causes adhesive capsulitis, rather than just co-occurring with it, is a complex language understanding task that requires the reasoning depth of the 31B Dense model. Empirical testing with the smaller variants showed that relationship-type accuracy degraded significantly. Furthermore, the dense 31B's architecture outperformed the 26B MoE on structured relation extraction, proving far more reliable at maintaining strict semantic focus and schema adherence across the full context window.

Why local via Ollama. Medical queries are sensitive by definition. You can't claim a system protects user privacy if it's built to send their private data away to the cloud. The design intent of AXIOM is strictly local-first, making it seamlessly deployable in rural clinics, research labs under strict IRB data-use restrictions, or any air-gapped environment where patient queries cannot leave the building. However, the architecture's provider abstraction (src/core/providers/) decouples model orchestration from core business logic, supporting dual-use topologies. For academic researchers doing macro-level literature synthesis where queries contain purely biomedical concepts and zero patient PHI, the framework can be instantly toggled to route requests to a high-throughput, hosted staging endpoint like gemma4:31b-cloud via Ollama. This enables rapid, parallelized multi-concept exploration without local hardware constraints, while keeping the underlying engine completely consistent.

Six core roles (and one experimental) Gemma 4 is invoked at seven points in the pipeline, not just generation:

  1. Privacy guard, rewrites raw queries into generalised clinical research questions, stripping PHI before anything else runs
  2. Entity extraction at query time, returns a JSON array of medical entities driving graph lookup
  3. Knowledge graph construction at ingestion, NER + relation extraction on every paper chunk, normalised to canonical relationship types
  4. Answer generation, hybrid mode synthesises graph triples and paper excerpts under strict citation constraints. Additionally, every retrieval result passes a live retraction check against NCBI E-utilities before reaching the model, retracted papers are blocked outright, expressions of concern are flagged and their composite scores halved, so Gemma reasons only over evidence that has survived post-publication scrutiny.
  5. Optional route classification, classifies questions into direct/vector/hybrid when ROUTER_USE_LLM=true
  6. Autonomous PubMed agent, a ReAct loop where Gemma decides which MCP tool to call next for multi-step literature retrieval.
  7. Optionally, Gemma serves a role as self-evaluator, scoring answers on topic coverage, source faithfulness, and quality, useful as a heuristic for knowledge base iteration, though not a substitute for external validation.

The result: with approximately 50 ingested papers, AXIOM surfaces a structured hormonal narrative linking elevated LDL, adhesive capsulitis, anxiety, sleep disruption, and irregular cycles to a single perimenopausal aetiology. In the fragmented-care scenario this tool addresses, the same symptoms would typically arrive at separate specialist systems with no shared context and no connecting insight surfaced automatically. The same architecture applies to any clinical domain where evidence is fragmented across specialties. The knowledge base, entity schema, and retrieval pipeline are domain-configurable.

Top comments (0)