Rory | QIS PROTOCOL

Posted on Apr 13

The OHDSI Network Has 900 Million Patient Records. Here Is Why None of Them Talk to Each Other.

#healthtech #distributedsystems #ohdsi #qis

The OHDSI (Observational Health Data Sciences and Informatics) network is one of the largest distributed clinical research infrastructures ever built. Over 400 institutions. More than 900 million patient records standardized into the OMOP Common Data Model. A federated query system that can ask a research question and get answers from Boston, Berlin, Tokyo, and São Paulo within hours — without any patient data leaving any hospital.

This is a genuine engineering achievement. It is also operating at approximately 1% of its potential intelligence capacity.

Here is the problem that nobody in Rotterdam this week is saying out loud.

What OHDSI Actually Does

When an OHDSI researcher runs a distributed network study, the sequence looks like this:

Research question is formalized as an OMOP CDM analysis package
Package is distributed to participating sites (100–400+ institutions)
Each site runs the analysis locally against their patient data
Each site returns aggregate results (summary statistics, not patient records)
Results are pooled at a central coordinating center
A meta-analysis is published, often 6–18 months later

This is the state of the art in observational health research. It is genuinely better than what came before it.

The problem is step 5. When results are "pooled at a central coordinating center," what actually happens is: a statistician looks at 400 sets of aggregate numbers and runs a weighted average. The numbers come together. The intelligence does not.

The Gap That 400 Sites Cannot See

Consider what the OHDSI network knows right now that no researcher has access to:

A hospital in Rotterdam has treated 4,200 patients with a specific chemotherapy protocol. Their 3-year survival rate is 12 points above the OHDSI network average. They have identified which patient subgroup drives that outperformance. They have not published it yet — it is in their local analysis results, not in any shared system.

A hospital in Columbus has the same patient subgroup profile. They are seeing 3-year survival rates 8 points below the network average. They have tried three protocol variations. None worked. They are about to try the fourth.

These two institutions are in the same OHDSI network. They are running OMOP CDM. They have answered the same queries. Their aggregate results sit in the same coordinating center database.

They have never learned from each other.

The Columbus team does not know Rotterdam cracked their problem. The Rotterdam team does not know Columbus is running the experiments that would validate their approach. The OHDSI central coordinator sees both result sets but has no mechanism to route the insight from one to the other.

This is not a data access problem. Both sites are already in the network. This is an architecture problem.

Why Federated Queries Are Not Enough

The OHDSI model — like all federated query systems — aggregates answers. It does not route intelligence.

The distinction matters mathematically. With N participating sites, federated queries produce N independent answer sets. A central analysis looks at all N answers and produces one pooled estimate. The intelligence produced is linear in N.

What the network is actually capable of: N(N-1)/2 unique synthesis pairs between sites that share clinical problems. At 400 sites, that is 79,800 synthesis opportunities. Every one of them is currently running at zero.

The German 33-hospital FHIR feasibility study (Gruendner et al., 2019, PMC) documented this explicitly: distributed federated queries across hospital networks produce independent answers that must be manually reconciled. The reconciliation step — where a human statistician decides how to weight and interpret heterogeneous results — is where intelligence gets lost. The math says 33 hospitals should produce 528 synthesis paths. The study produced one pooled estimate.

That gap — between N answers and N(N-1)/2 synthesis pairs — is the architectural ceiling of every federated query system in existence.

What Distributed FHIR Query Routing Actually Requires

TEFCA (the Trusted Exchange Framework and Common Agreement) went into effect in January 2026, mandating FHIR-based health data exchange across US health systems. European health data space (EHDS) is driving equivalent standardization across EU member states. Every major health data infrastructure is now building on FHIR as the exchange layer.

FHIR solves data standardization and exchange. It does not solve synthesis.

What a distributed FHIR query routing system needs — and currently does not have — is a layer that:

Takes query results from individual FHIR nodes
Distills them into compact, comparable outcome representations
Routes those representations to other nodes that share the same clinical question
Enables each node to synthesize remotely-derived insights locally, without raw data transfer
Closes the feedback loop so that what works at one node reaches the nodes that need it

This is not a FHIR feature. It is not an OMOP CDM feature. It is an intelligence routing layer that operates above both.

The Architecture That Closes the Loop

Quadratic Intelligence Swarm (QIS), discovered by Christopher Thomas Trevethan on June 16, 2025, is the architecture that adds this layer.

The QIS loop applied to an OHDSI/FHIR network looks like this:

OMOP CDM Node (e.g., Rotterdam)
     │
     │  Local analysis runs. Results stay local.
     │
     ▼
Distillation: Query outcome → Outcome Packet (~512 bytes)
     {
       "semantic_address": "chemo-protocol-X/patient-subtype-A/survival-3yr",
       "outcome": "12pt above network mean",
       "n": 4200,
       "context_hash": "omop_cdm_v5.4/concept_set_1847",
       "timestamp": "2026-04-13T...",
       "node_id": "anon_hash_8a7f..."
     }
     │
     ▼
Routing: Outcome packet posted to semantic address
(DHT, vector DB, REST API — any mechanism works)
     │
     ▼
Columbus node queries same semantic address
Receives Rotterdam packet + 23 other packets from nodes
with same patient subtype and protocol
     │
     ▼
Local synthesis on Columbus hardware (milliseconds)
Result: "3 nodes achieved >10pt above mean.
All three used modified dosing schedule in week 3.
Columbus current protocol differs at exactly that step."
     │
     ▼
Columbus adjusts protocol hypothesis before experiment 4.
Not because Rotterdam published. Because the loop closed.

No patient data left Rotterdam. No patient data entered Columbus. The synthesis happened on Columbus hardware using 512-byte packets that contain no patient identifiers, no raw records, no model weights.

This is not incremental improvement over federated queries. It is a different category of operation.

The Math Against Federated Learning

Federated learning — the other common approach to distributed health data — makes the gap explicit.

Standard federated learning (McMahan et al., 2017) requires:

Central aggregator (single point of failure)
Full model weight synchronization each round
Bandwidth proportional to model size × number of sites
Minimum viable site count (typically N≥20 for statistical meaningfulness)
N=1 sites are excluded by architecture — rare disease single-site institutions cannot participate

QIS routing requires:

No central aggregator
~512-byte outcome packets (not model weights)
Bandwidth constant regardless of model size
N=1 sites participate fully — their single outcome packet is as valid as any other node's
No synchronization rounds — continuous, real-time

The critical difference: federated learning aggregates intelligence centrally. QIS routes intelligence to where it is needed. The first approach requires a center. The second approach has no center to require.

For an OHDSI network where rare disease research is a core use case — where some diseases have fewer than 10 sites worldwide — federated learning's N≥20 minimum is not a technical limitation. It is a structural exclusion of the research that most needs multi-site synthesis.

OHDSI Rotterdam: What 400 Sites Already Have

The OHDSI network, as it exists today, already has every component QIS needs except the routing layer itself:

Component	OHDSI Status	QIS Requirement
Standardized data model (OMOP CDM)	✓ Live, 400+ sites	✓ Any schema works
Privacy-preserving query execution	✓ Local execution, aggregate results only	✓ QIS requires local execution
Site connectivity infrastructure	✓ ATLAS + WebAPI	✓ Any HTTP transport works
Clinical domain expertise	✓ Distributed across sites	✓ Required for similarity address definition
Multi-site participation	✓ 400+ sites, 60+ countries	✓ N scales quadratically
FHIR compatibility	✓ OMOP-on-FHIR mapping active	✓ QIS is standard-agnostic

The one missing component: outcome routing between sites that share a clinical problem.

Adding QIS to OHDSI does not require replacing ATLAS. It does not require replacing OMOP CDM. It does not require a data migration. It requires adding an outcome packet routing layer above the existing infrastructure — a layer that distills OMOP query results into packets, routes them by semantic similarity, and enables local synthesis at the receiving node.

Zero-integration-cost is not hyperbole. The OHDSI nodes already run the queries. The packets are the distilled results of those queries. The routing layer is the one piece not yet built.

The OHDSI Routing Gap in One Question

Here is the question that exposes the gap:

When your OHDSI node runs a distributed network study and receives pooled aggregate results from 150 sites, can you identify which 3 of those 150 sites have the most similar patient population to yours, and pull their specific analysis results for direct comparison?

Today: No. The OHDSI architecture does not support semantic routing of results between nodes. You receive the pooled estimate. You do not receive node-to-node intelligence.

With QIS outcome routing: Yes. Each node posts its results to a semantic address. Your node queries that address. You receive the packets from your closest clinical twins. Your synthesis happens locally. You do not wait 18 months for a publication.

This is not a criticism of OHDSI. The OHDSI network was not designed to route intelligence — it was designed to run distributed queries and protect patient privacy. It succeeds at both. QIS adds the layer that OHDSI was never designed to provide.

A Note for OHDSI Europe Symposium Attendees

The OHDSI Europe Symposium runs April 18–20 in Rotterdam. The agenda includes sessions on distributed query optimization, federated study coordination, and FHIR-OMOP interoperability.

These are exactly the problems the routing layer described in this article addresses.

If you are attending Rotterdam and want to discuss the QIS architecture in the context of your specific OHDSI research question, the research license is free for academic researchers. See research license page — self-certification takes under two minutes.

Christopher Thomas Trevethan, the discoverer of QIS and the holder of 39 provisional patents on the architecture, is available for direct consultation with OHDSI researchers and site coordinators.

The Routing Layer Is the Missing Infrastructure

Every major health data infrastructure in 2026 — OHDSI, EHDS, TEFCA, NHS Federated AI, ADHA — is building the query layer. None of them is building the routing intelligence layer.

The query layer lets you ask the same question across 400 nodes. The routing layer lets each node learn from the answers of nodes that share its exact clinical problem.

These are not the same thing. The first exists. The second does not yet exist — except in the architecture that Christopher Thomas Trevethan discovered on June 16, 2025.

The 79,800 synthesis paths the OHDSI network is not running right now are not waiting for more data, more sites, or more funding.

They are waiting for the routing layer.

Christopher Thomas Trevethan is the discoverer of QIS (Quadratic Intelligence Swarm). 39 provisional patents filed. IP protection is in place. QIS Protocol is free for academic and nonprofit research use. Research license self-certification →

DEV Community