Over 80% of clinical trials fail to meet enrollment deadlines (PMC). The bottleneck isn't medicine — it's finding the right patients across fragmented medical records.
This article describes Medical Cohort Agent, an AI system built with Elasticsearch Agent Builder that turns a researcher's natural language question into a normalized, queryable patient cohort — handling schema variance, OCR artifacts, and missing data across multiple healthcare facilities.
This agent doesn't answer questions — it creates artifacts.
The Problem: Schema Variance Is the Real Enemy
We work with one of Israel's largest HMOs — an organization serving millions of patients across hundreds of facilities. Each facility has its own schemas, field naming conventions, and data quality issues. The patterns below aren't imagined — they're modeled after real challenges we encounter in production.
| Concept | Hospital A | Hospital B | Lab Chain | Clinic Network |
|---|---|---|---|---|
| Age | patient_age: 67 |
גיל: 67 |
age: 67 |
age_group: "60-70" |
| Date | 2024-03-15 |
15/03/2024 |
15.03.24 |
2024-03 |
| Smoking | smoking_status: true |
(not tracked) | (not tracked) | smoking: true |
| Notes | clinical_notes |
סיכום_רפואי |
notes |
text |
Add OCR artifacts from scanned documents — סוכדת instead of סוכרת (diabetes misspelled by scanner) — and exact keyword matching fails silently.
The bottleneck isn't data volume. It's the variance.
The Solution: Two-Layer Architecture
The key insight: separate judgment from execution.
Layer 1 — Agent (Judgment)
The Elasticsearch Agent Builder agent handles schema discovery and criteria planning:
- Discovers all indices and their mappings via platform tools (
list_indices,get_index_mapping,search) - Builds per-facility field maps (which field holds "age", which holds "conditions")
- Plans structured criteria from the natural language question
- Explains data gaps and caveats BEFORE execution — e.g. "Hospital B doesn't track smoking — patients from there will be matched via clinical text only"
Layer 2 — Elastic Workflow (Execution)
A deterministic workflow normalizes data — no LLM in the loop:
-
Strict pass: iterates over facilities via
foreach, applies a single parameterized Painless script that normalizes using the agent's field maps - Semantic kNN pass: uses E5-large embeddings (1024-dim) to find patients whose clinical text matches the research question semantically — catches OCR artifacts, synonyms, and negation
- Count + breakdown: reports totals per facility and confidence level
The output is a persistent cohort_<name> index — not a chat response.
Why Not Just a Chatbot?
A chatbot answers questions. This agent creates artifacts.
The cohort index persists after the conversation ends. Researchers can:
- Query it with ES|QL for follow-up analysis
- Filter by
match_confidence: strictvsprobable - Inspect per-match provenance (
source_field_map,match_explanation,knn_score) - Share it with colleagues
- Build on it without re-processing raw data
The workflow is deterministic and auditable. Adding a new facility requires no code changes — only the agent discovering and mapping its schema.
Normalization Impact
Reproducible metrics from the sample corpus (10 indices, 4 facilities):
| Metric | Before | After |
|---|---|---|
| Field names per concept | 5+ variants | 1 unified |
| Date formats | 4 different | 1 canonical |
| Patient ID types | string, int, float | normalized string |
| OCR-corrupted clinical terms | invisible to search | captured via kNN |
| Structured/text contradictions | ~18% undetected | classified with confidence |
The metrics script is included in the repo: python3 scripts/metrics.py --data-dir sample_data
Air-Gapped Deployment
Healthcare data cannot leave the network. The entire stack runs on a single VM with no internet dependency:
- LLM: Ollama + Llama 4 (local inference)
- Embeddings: E5-large via Ollama (1024-dim vectors)
- Stack: Elasticsearch 9.3 + Kibana (Agent Builder + Workflows)
The architecture is LLM-agnostic — swap Ollama for any OpenAI-compatible provider without code changes.
Synthetic Data: e2llm-medsynth
Real patient data can't be used for demos. We built the data generator as a standalone open-source tool:
pip install e2llm-medsynth
e2llm-medsynth --verbose --output-dir output
The noise patterns — OCR character swaps (ר↔ד, ח↔כ), type mismatches, missing fields — are modeled after patterns observed in production healthcare systems. MedSynth makes them reproducible for anyone.
Supports 6 locales (he_IL, ar_SA, ar_EG, es_ES, es_MX, es_AR). MIT licensed.
📦 e2llm-medsynth on PyPI · GitHub
Demo
A researcher types in Hebrew:
מצא חולי סוכרת מעל גיל 60 שמעשנים
(Find diabetic patients over 60 who smoke)
The agent discovers schemas, builds field maps, explains that Hospital B doesn't track smoking (probable matches only via clinical text), then triggers the workflow.
Result: 96 patients from 10 indices, classified by confidence:
- Strict: 36 (structured field match)
- Probable: 60 (semantic similarity of clinical text)
The researcher continues:
באילו מחלקות טופלו?
(Which departments were they treated in?)
The agent queries the cohort index directly with ES|QL — no re-processing of raw data.
הראה לי את ההתאמות הסבירות - למה הן לא ודאיות?
(Show me the probable matches — why aren't they certain?)
The agent queries match_confidence: "probable" and returns per-patient explanations with kNN scores and match reasoning.
Try It
cp .env.example .env
bash scripts/demo_setup.sh --embed
Open Kibana → Agent Builder → Medical Cohort Agent → ask a question.
Two open-source tools came out of this project:
Built with Elasticsearch Agent Builder + Elastic Workflows for the Elasticsearch Agent Builder Hackathon.
Top comments (0)