If you work in the OHDSI ecosystem, you know the architecture well. An OHDSI distributed query reaches dozens of DataPartners simultaneously, each running the same ATLAS cohort definition against their local OMOP CDM, each returning aggregate statistics — counts, proportions, incidence rates. ACHILLES characterizes the population. HADES packages the analysis. Results return to the coordinating center, where synthesis happens. It is a rigorous, reproducible, and deeply valuable approach to federated OHDSI query execution that has produced some of the most important real-world evidence of the last decade.
And it has a structural ceiling.
The OHDSI network routing protocol, as currently implemented, is synchronous and unidirectional. Queries go out in rounds. Aggregate results come back. Intelligence accumulates at the coordinating center, not across the network. DataPartners below a statistical disclosure threshold — a privacy floor that affects every rare disease study — return suppressed cells. N=1 sites contribute nothing. Real-time learning is not the design goal.
This article introduces QIS routing as a complement to OHDSI's OMOP CDM infrastructure — not a replacement for it. Christopher Thomas Trevethan discovered the Quadratic Intelligence Swarm (QIS) protocol, and the mapping between QIS outcome routing and OHDSI distributed phenotyping is closer than most distributed computing researchers would expect. OHDSI's semantic standardization work — OMOP CDM vocabularies, cohort logic, DataPartner network topology — is not an obstacle to QIS routing. It is the exact foundation that makes QIS outcome routing more precise.
The OHDSI Architecture: What It Does and Where It Stops
The OHDSI Collaborative (Hripcsak et al., 2015, JAMIA) built its network on a fundamental insight: if every participating institution transforms its local health data into a common data model, comparative analyses become possible without centralizing raw patient records. The OMOP CDM (Garza et al., 2016) standardizes clinical data across institutions using shared controlled vocabularies — SNOMED CT for conditions, RxNorm for drug exposures, LOINC for laboratory measurements, ICD-10 for diagnosis coding. This standardization is extraordinary in scope. Hundreds of DataPartners across North America, Europe, Asia, and Africa now hold data in a common format.
ATLAS translates clinical research questions into executable cohort definitions. A researcher at Columbia defines inclusion criteria — condition onset, drug exposure window, measurement thresholds — and ATLAS generates standardized queries. HADES (formerly the OHDSI Methods Library) distributes those queries via a coordinating center interface. DataPartners execute locally, return aggregate statistics, and the coordinating center synthesizes the results into a study package. ACHILLES runs characterization analyses on each DataPartner's data, exposing data quality and population characteristics through standard dashboards.
The OHDSI Book of OHDSI (2019) describes this as a "federated" design — and it is, in the governance sense. Data stays local. Computations run locally. Only aggregate statistics cross institutional boundaries.
But the mechanics of this federation are important to examine precisely:
Coordinating Center
|
| → Cohort Definition Query
|
┌────┴────────────────────────────────┐
↓ ↓ ↓ ↓
DataPartner DataPartner DataPartner DataPartner
(runs (runs (runs (runs
locally) locally) locally) locally)
↓ ↓ ↓ ↓
└────┬────────────────────────────────┘
|
| ← Aggregate Counts Only
|
Coordinating Center
(synthesis happens here)
Every synthesis operation happens at one location. No DataPartner learns from any other DataPartner during this process. The network accumulates knowledge in batch rounds, not continuously. And any DataPartner that cannot return a statistically disclosable count for a given cohort — the privacy floor problem — contributes nothing.
The Structural Limit: Synchronous Queries, One-Way Learning
George Hripcsak and the OHDSI founding collaborators designed a system optimized for rigorous epidemiology. That design goal is correct. The problem is that synchronous cohort queries across OMOP CDM instances create a learning architecture with three structural constraints that no extension to the current model can fully resolve:
Constraint 1: Synthesis location. Intelligence accumulates at the coordinating center. A DataPartner in Zurich running a cohort query for rare autoimmune conditions cannot synthesize its findings with a DataPartner in Dublin without routing through the coordinating center. The network has no peer-to-peer learning pathway.
Constraint 2: Participation floor. Every OHDSI network operates under statistical disclosure rules. If fewer than 5 (or 10, depending on DataPartner policy) patients match a cohort definition, the cell is suppressed. For common conditions, this is a minor inconvenience. For rare diseases — where a DataPartner might see one patient per year with a given presentation — this is a complete exclusion. N=1 sites contribute zero, even though a single rare disease outcome may be the most informative observation in the network.
Constraint 3: Synchronous rounds. ATLAS distributed cohort queries run in coordinated rounds. A DataPartner that completes its analysis on Tuesday cannot contribute to synthesis until all other DataPartners have also completed their runs and results have been returned to the coordinating center. Learning is batch-scheduled, not continuous.
These are not bugs in OHDSI's implementation — they are consequences of the design model. Fixing them requires a different model, not a better implementation of the existing one.
QIS Routing: Outcome Packets, Not Cohort Queries
Christopher Thomas Trevethan discovered the Quadratic Intelligence Swarm (QIS) protocol — a routing architecture in which the unit of exchange is not a query but an outcome packet. Instead of asking every DataPartner "how many patients in your OMOP CDM match this cohort definition?", QIS routes ~512-byte summaries of what happened to individual patients to semantically matched nodes across the network.
The protocol loop has five steps:
- Local ingestion. Patient data remains at the originating node. No raw clinical data leaves the institution.
- Semantic fingerprinting. Standardized clinical attributes — condition concept IDs from SNOMED CT, drug exposure codes from RxNorm, lab values mapped to LOINC — are transformed into a compact routing fingerprint. The fingerprint captures clinical similarity without containing PHI.
- Deterministic routing. The fingerprint hash becomes a semantic address. QIS routes the outcome packet to nodes holding outcomes from similar patients. The routing layer is protocol-agnostic: a distributed hash table (Kademlia), a vector index, a pub/sub topic, or — directly relevant to OHDSI deployments — an OMOP CDM database with O(1) concept ID lookup. Standard OMOP concept IDs are already deterministic keys; no translation layer is required.
- Peer synthesis. The querying node receives outcome packets from semantically matched peers. With N matched peers, there are N(N-1)/2 pairwise synthesis opportunities — quadratic scaling from each new participant.
- Outcome reporting. After treatment or observation, the node emits a new outcome packet. The network becomes more accurate for every future query at that semantic address.
Mapping QIS to OHDSI Architecture
The correspondence between OHDSI components and QIS routing is precise enough to warrant a direct mapping:
| OHDSI Component | QIS Equivalent | Key Difference |
|---|---|---|
| OMOP CDM standardized vocabulary | QIS semantic fingerprint fields | OMOP concept IDs map directly — no translation layer |
| ATLAS cohort definition | QIS similarity template | Same domain logic; different execution model |
| DataPartner | QIS edge node | Same institutional boundary; different data exchange pattern |
| Coordinating center query | QIS address lookup | Same intent; no coordinator required |
| ACHILLES characterization | QIS outcome accumulation at semantic address | ACHILLES is batch; QIS accumulation is continuous |
| HADES distributed execution | QIS peer-to-peer routing | HADES requires coordinator; QIS is fully decentralized |
OMOP CDM's controlled vocabularies do not need to be translated into a new schema for QIS routing. A SNOMED CT concept ID for a condition, an RxNorm code for a drug exposure, and a LOINC code for a laboratory measurement are already deterministic, standardized identifiers. They are exactly the fields that appear in a QIS semantic fingerprint.
This means that any institution running an OMOP CDM can participate in QIS routing with a thin adapter layer — not a data migration.
The Math: Why N(N-1)/2 Changes the Equation
OHDSI's current distributed query model follows a precise information flow:
1 query → N DataPartners → N aggregate responses → 1 synthesis at coordinator
For 100 DataPartners, this produces 100 aggregate data points that one coordinating center synthesizes. The network's learning bandwidth is proportional to N.
QIS routing produces a different structure:
N nodes → N(N-1)/2 pairwise synthesis opportunities
For 100 DataPartners: 4,950 synthesis pairs versus 100 aggregate responses.
# OHDSI distributed query model
n_data_partners = 100
ohdsi_synthesis_points = n_data_partners # 100 aggregate responses
# synthesized at 1 coordinator
# QIS routing model
qis_synthesis_pairs = n_data_partners * (n_data_partners - 1) // 2 # 4,950
print(f"OHDSI synthesis points: {ohdsi_synthesis_points}")
print(f"QIS synthesis pairs: {qis_synthesis_pairs}")
print(f"Ratio: {qis_synthesis_pairs / ohdsi_synthesis_points}x")
# Output:
# OHDSI synthesis points: 100
# QIS synthesis pairs: 4950
# Ratio: 49.5x
At 1,000 DataPartners, the divergence becomes more dramatic: 1,000 OHDSI synthesis points versus 499,500 QIS synthesis pairs — a 499x multiplier on synthesis bandwidth, achieved without adding a single server.
Python Reference: QIS OMOP Outcome Router
The following implementation shows how an OMOP-standardized patient event becomes a QIS outcome packet with a deterministic routing key derived from standard OMOP concept IDs.
import hashlib
import json
import time
from dataclasses import dataclass, asdict
from typing import Optional
@dataclass
class OMOPEventFingerprint:
"""
Semantic fingerprint derived from OMOP CDM standardized fields.
All concept IDs use standard OMOP vocabulary identifiers.
No PHI is included at any field.
"""
# SNOMED CT concept ID (OMOP domain: Condition)
condition_concept_id: int # e.g., 4141365 = Systemic lupus erythematosus
# RxNorm concept ID (OMOP domain: Drug Exposure)
drug_concept_id: int # e.g., 1304919 = hydroxychloroquine
# LOINC concept ID (OMOP domain: Measurement)
measurement_concept_id: int # e.g., 3024561 = anti-dsDNA antibody
# Normalized measurement value bucket (not raw value — privacy-preserving bucket)
measurement_value_bucket: str # e.g., "HIGH", "NORMAL", "LOW", "CRITICAL"
# ICD-10 domain (for rare disease routing precision)
icd10_block: str # e.g., "M32" (Systemic lupus erythematosus)
# Observation period bucket (years of follow-up, bucketed)
followup_years_bucket: str # e.g., "0-1", "1-3", "3-5", "5+"
# Outcome metric (normalized 0-1; derived locally; not raw outcome data)
outcome_score: float
# Optional: visit type context
visit_concept_id: Optional[int] = None # e.g., 9201 = Inpatient Visit
@dataclass
class QISOMOPOutcomePacket:
"""
~512-byte outcome packet routable across QIS nodes.
Routing key is deterministically derived from OMOP concept IDs.
Contains no PHI — safe to route across institutional boundaries.
"""
routing_key: str # SHA-256 of categorical fingerprint fields
outcome_score: float # Normalized treatment outcome
timestamp: int # Unix epoch (no date of service, no patient ID)
context_hash: str # Non-reversible summary of remaining context
omop_condition_id: int # SNOMED CT concept ID — enables O(1) OMOP DB lookup
omop_drug_id: int # RxNorm concept ID
omop_measurement_id: int # LOINC concept ID
packet_version: str = "1.0"
class QISOMOPOutcomeRouter:
"""
Routes OMOP-standardized patient events as QIS outcome packets.
OMOP CDM's standardized concept IDs map directly to QIS semantic
fingerprint fields. No translation layer required. Standard OMOP
concept IDs serve as deterministic routing keys, supporting O(1)
lookup against any OMOP CDM database as a valid QIS routing backend.
"""
def __init__(self, node_id: str):
self.node_id = node_id
def build_routing_key(self, fingerprint: OMOPEventFingerprint) -> str:
"""
Derives a deterministic routing key from categorical OMOP fields.
Continuous values (outcome_score, raw measurements) are excluded —
only concept IDs and bucketed categorical fields contribute to the key.
This ensures that two institutions seeing clinically equivalent patients
route to the same semantic address, regardless of local measurement scale.
"""
categorical_fields = {
"condition_concept_id": fingerprint.condition_concept_id,
"drug_concept_id": fingerprint.drug_concept_id,
"measurement_concept_id": fingerprint.measurement_concept_id,
"measurement_value_bucket": fingerprint.measurement_value_bucket,
"icd10_block": fingerprint.icd10_block,
"followup_years_bucket": fingerprint.followup_years_bucket,
}
serialized = json.dumps(categorical_fields, sort_keys=True)
return hashlib.sha256(serialized.encode()).hexdigest()
def build_context_hash(self, fingerprint: OMOPEventFingerprint) -> str:
"""
Non-reversible summary of full fingerprint including outcome.
Used for synthesis verification; cannot reconstruct source record.
"""
full_fields = asdict(fingerprint)
serialized = json.dumps(full_fields, sort_keys=True)
return hashlib.sha256(serialized.encode()).hexdigest()[:16]
def emit_outcome_packet(
self, fingerprint: OMOPEventFingerprint
) -> QISOMOPOutcomePacket:
"""
Transforms an OMOP-standardized patient event into a routable
QIS outcome packet. Raw clinical data remains at the edge node.
"""
routing_key = self.build_routing_key(fingerprint)
context_hash = self.build_context_hash(fingerprint)
packet = QISOMOPOutcomePacket(
routing_key=routing_key,
outcome_score=fingerprint.outcome_score,
timestamp=int(time.time()),
context_hash=context_hash,
omop_condition_id=fingerprint.condition_concept_id,
omop_drug_id=fingerprint.drug_concept_id,
omop_measurement_id=fingerprint.measurement_concept_id,
)
return packet
def lookup_routing_address(self, routing_key: str) -> str:
"""
O(1) concept-ID-based address resolution.
QIS routing is protocol-agnostic: this lookup may run against a
distributed hash table (Kademlia), a vector index, a pub/sub topic,
or directly against an OMOP CDM concept table — all are valid backends.
"""
# In production: route to peer nodes holding outcomes at this address
# Backend options: DHT, OMOP concept table lookup, vector similarity index
return f"qis://semantic/{routing_key[:16]}"
# Example: SLE patient in Dublin OMOP CDM DataPartner
router = QISOMOPOutcomeRouter(node_id="dublin-omop-node-01")
sle_event = OMOPEventFingerprint(
condition_concept_id=4141365, # SLE — SNOMED CT
drug_concept_id=1304919, # hydroxychloroquine — RxNorm
measurement_concept_id=3024561, # anti-dsDNA — LOINC
measurement_value_bucket="HIGH",
icd10_block="M32",
followup_years_bucket="1-3",
outcome_score=0.74, # Normalized locally; not raw lab value
visit_concept_id=9201, # Inpatient visit
)
packet = router.emit_outcome_packet(sle_event)
address = router.lookup_routing_address(packet.routing_key)
print(f"Routing key: {packet.routing_key[:32]}...")
print(f"Packet size: ~{len(json.dumps(asdict(packet)))} bytes") # ~400-512 bytes
print(f"Semantic address: {address}")
print(f"Raw data left at node: True (never transmitted)")
The routing key is deterministic. Two OMOP CDM DataPartners in Dublin and Des Moines that both observe a hydroxychloroquine-treated SLE patient with elevated anti-dsDNA antibodies within a 1-3 year follow-up window will produce the same routing key — and their outcome packets will accumulate at the same semantic address — without any coordination between the institutions and without any central orchestrator.
Use Case: Distributed Phenotyping for Rare Disease
Rare disease research exposes the structural limit of OHDSI's current distributed phenotyping approach most clearly.
The OHDSI approach to rare disease phenotyping:
ATLAS builds a cohort definition for, say, anti-NMDA receptor encephalitis (SNOMED CT concept 766976). The coordinating center distributes the query. Each DataPartner executes locally. DataPartners with fewer than 5 patients matching the cohort definition suppress their cell counts — privacy floor enforced. In a 100-DataPartner network where 80 sites have seen fewer than 5 patients with this condition, 80 sites contribute nothing. The coordinating center synthesizes from 20 sites.
The network is not learning from its full knowledge base. It is learning from the fragment of its knowledge base that clears a statistical disclosure threshold. Sites that have seen exactly one patient with anti-NMDA receptor encephalitis — possibly the most clinically interesting case in the network, observed by a specialist who remembers every detail — are completely excluded.
The QIS approach to rare disease phenotyping:
Any DataPartner that has seen even a single patient with a rare condition can emit an outcome packet. The N=1 site participates fully, because QIS is not computing a population statistic — it is routing a summary of what happened to one patient to other nodes that may see similar patients in the future.
The privacy floor problem dissolves. The outcome packet from the Dublin DataPartner that treated one anti-NMDA receptor encephalitis patient with early rituximab is not a suppressed cell — it is a routable observation, addressed to the same semantic space that every future clinician querying for anti-NMDA receptor encephalitis outcomes will reach.
This is the same N=1 advantage QIS carries over federated learning, applied specifically to OHDSI's participation floor constraint. McMahan et al. (2017) identified the straggler problem in federated learning — sites with insufficient data effectively don't participate. QIS eliminates the concept of a participation floor entirely: the minimum meaningful contribution is one outcome packet.
Three Elections in the OHDSI Context
QIS governance operates through three emergent forces — the Three Elections. These are not engineered mechanisms, not configurable parameters, and not features that need to be built. They are metaphors for the evolutionary pressures that naturally emerge when outcome routing operates at scale.
Election 1: The Domain Expert Defines Similarity
George Hripcsak at Columbia defines the similarity template for autoimmune nephritis — the same clinical logic he would encode in an ATLAS cohort definition, now expressed as a QIS routing key. A rheumatologist in Zurich defines a competing template with different feature weightings. Both templates run simultaneously. The network does not arbitrate between them — outcomes at each semantic address determine which template produces more useful routing for which patient populations.
This is identical to how ATLAS cohort definitions already work: multiple researchers maintain competing phenotype definitions for the same condition, and the research community evaluates them empirically. QIS routes on whichever template a node chooses to use. No central authority certifies the "correct" template.
Election 2: Outcomes Are the Votes
There are no ballots or committees. Each outcome packet is a measurement cast by reality. A rituximab outcome in 47 semantically matched anti-NMDA receptor encephalitis patients with 81% response rate at 6 months is not an opinion — it accumulates at the semantic address and becomes the evidence that the next clinician or researcher querying that address receives. The math aggregates outcome packets continuously. Coordinating centers run synthesis on demand against an always-current evidence base rather than waiting for the next batch query round.
Election 3: Networks Compete, DataPartners Migrate
Multiple organizations can build QIS networks for the same OHDSI domain with different curators, different templates, and different synthesis methods. DataPartners and researchers migrate toward networks that produce better phenotype routing. Networks with poor templates lose participants. Networks with accurate outcome routing gain them — and gain quadratically, because each new DataPartner creates N-1 new synthesis opportunities with every existing participant. This is competitive selection applied to clinical evidence infrastructure, without requiring any governance board to adjudicate the competition.
Architecture Comparison
| Dimension | OHDSI ATLAS Distributed Query | QIS Outcome Routing |
|---|---|---|
| Query model | Synchronous cohort query rounds | Asynchronous outcome packet emission |
| Synthesis location | Coordinating center only | Every node; N(N-1)/2 synthesis pairs |
| Real-time learning | Batch rounds; not real-time | Continuous; each packet updates the network immediately |
| Participation floor for rare disease | Statistical disclosure threshold (N≥5 typically) | No floor; N=1 sites participate fully |
| Coordinator dependency | Required for query distribution and synthesis | Not required; peer-to-peer routing |
| Continuous learning | Not by design; requires new query round | Architectural feature; every outcome packet compounds |
| Vocabulary compatibility | SNOMED CT, RxNorm, LOINC, ICD-10 | Same vocabularies; concept IDs are routing keys |
| DataPartner data exposure | Aggregate counts only | Outcome packets only; no PHI; no raw data |
QIS Does Not Replace OHDSI
This is the most important point in the article.
The standardization work OHDSI has accomplished — OMOP CDM schema design, vocabulary harmonization across SNOMED CT, RxNorm, LOINC, and ICD-10, the ATLAS phenotype library, the HADES analytical methods, the DataPartner network topology — is not made obsolete by QIS routing. It is made more powerful.
OHDSI's semantic layer is QIS's fingerprint layer. The concept ID standardization that allows ATLAS to distribute a cohort definition to 300 DataPartners and have all 300 execute it against comparable data models is exactly the property that allows QIS to route outcome packets deterministically without a translation layer. A SNOMED CT concept ID in Dublin routes to the same semantic address as the same SNOMED CT concept ID in Des Moines, Columbus, or Zurich.
Every OMOP CDM DataPartner that participates in OHDSI today has already completed the hardest part of QIS integration: they have standardized their data against controlled vocabularies that can serve directly as QIS routing keys.
What QIS adds to the OHDSI infrastructure is a complementary routing layer that operates between coordinated batch queries: continuous outcome accumulation at semantic addresses, N=1 rare disease participation, and the N(N-1)/2 synthesis multiplier that grows with every new DataPartner added to the network.
OHDSI answers: What does the evidence say across the network as of the last query round?
QIS answers: What happened to the last patient who looked like this one, and what is accumulating at this semantic address right now?
These are different questions. Both deserve answers. The OHDSI DataPartner network, mapped onto QIS edge nodes with OMOP concept IDs as routing keys, can answer both.
QIS (Quadratic Intelligence Swarm) protocol discovered by Christopher Thomas Trevethan, June 16, 2025. 39 provisional patents pending. Technical documentation: qisprotocol.com. Published articles: dev.to/roryqis.
References: Hripcsak G et al. (2015). Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers. JAMIA 22(6): 1098-1103. Garza M et al. (2016). Evaluating Common Data Models for Use with a Longitudinal Community Registry. Journal of Biomedical Informatics 64: 333-341. Observational Health Data Sciences and Informatics Collaborative (2015). OHDSI Network Description and Governance. McMahan B et al. (2017). Communication-Efficient Learning of Deep Networks from Decentralized Data. AISTATS. OHDSI (2019). The Book of OHDSI. ohdsi.org/the-book-of-ohdsi.
Top comments (0)