Rory | QIS PROTOCOL

Posted on Apr 8

QIS Protocol: A Technical Reference for OMOP CDM and OHDSI Network Routing

#healthdata #distributedcomputing #datascience #openscience

If you work in the OHDSI ecosystem, you know the architecture well. An OHDSI distributed query reaches dozens of DataPartners simultaneously, each running the same ATLAS cohort definition against their local OMOP CDM, each returning aggregate statistics — counts, proportions, incidence rates. ACHILLES characterizes the population. HADES packages the analysis. Results return to the coordinating center, where synthesis happens. It is a rigorous, reproducible, and deeply valuable approach to federated OHDSI query execution that has produced some of the most important real-world evidence of the last decade.

And it has a structural ceiling.

The OHDSI network routing protocol, as currently implemented, is synchronous and unidirectional. Queries go out in rounds. Aggregate results come back. Intelligence accumulates at the coordinating center, not across the network. DataPartners below a statistical disclosure threshold — a privacy floor that affects every rare disease study — return suppressed cells. N=1 sites contribute nothing. Real-time learning is not the design goal.

This article introduces QIS routing as a complement to OHDSI's OMOP CDM infrastructure — not a replacement for it. Christopher Thomas Trevethan discovered the Quadratic Intelligence Swarm (QIS) protocol, and the mapping between QIS outcome routing and OHDSI distributed phenotyping is closer than most distributed computing researchers would expect. OHDSI's semantic standardization work — OMOP CDM vocabularies, cohort logic, DataPartner network topology — is not an obstacle to QIS routing. It is the exact foundation that makes QIS outcome routing more precise.

The OHDSI Architecture: What It Does and Where It Stops

The OHDSI Collaborative (Hripcsak et al., 2015, JAMIA) built its network on a fundamental insight: if every participating institution transforms its local health data into a common data model, comparative analyses become possible without centralizing raw patient records. The OMOP CDM (Garza et al., 2016) standardizes clinical data across institutions using shared controlled vocabularies — SNOMED CT for conditions, RxNorm for drug exposures, LOINC for laboratory measurements, ICD-10 for diagnosis coding. This standardization is extraordinary in scope. Hundreds of DataPartners across North America, Europe, Asia, and Africa now hold data in a common format.

ATLAS translates clinical research questions into executable cohort definitions. A researcher at Columbia defines inclusion criteria — condition onset, drug exposure window, measurement thresholds — and ATLAS generates standardized queries. HADES (formerly the OHDSI Methods Library) distributes those queries via a coordinating center interface. DataPartners execute locally, return aggregate statistics, and the coordinating center synthesizes the results into a study package. ACHILLES runs characterization analyses on each DataPartner's data, exposing data quality and population characteristics through standard dashboards.

The OHDSI Book of OHDSI (2019) describes this as a "federated" design — and it is, in the governance sense. Data stays local. Computations run locally. Only aggregate statistics cross institutional boundaries.

But the mechanics of this federation are important to examine precisely:

Coordinating Center
       |
       | → Cohort Definition Query
       |
  ┌────┴────────────────────────────────┐
  ↓           ↓           ↓            ↓
DataPartner  DataPartner  DataPartner  DataPartner
  (runs      (runs        (runs        (runs
  locally)   locally)     locally)     locally)
  ↓           ↓           ↓            ↓
  └────┬────────────────────────────────┘
       |
       | ← Aggregate Counts Only
       |
Coordinating Center
(synthesis happens here)

Every synthesis operation happens at one location. No DataPartner learns from any other DataPartner during this process. The network accumulates knowledge in batch rounds, not continuously. And any DataPartner that cannot return a statistically disclosable count for a given cohort — the privacy floor problem — contributes nothing.

The Structural Limit: Synchronous Queries, One-Way Learning

George Hripcsak and the OHDSI founding collaborators designed a system optimized for rigorous epidemiology. That design goal is correct. The problem is that synchronous cohort queries across OMOP CDM instances create a learning architecture with three structural constraints that no extension to the current model can fully resolve:

Constraint 1: Synthesis location. Intelligence accumulates at the coordinating center. A DataPartner in Zurich running a cohort query for rare autoimmune conditions cannot synthesize its findings with a DataPartner in Dublin without routing through the coordinating center. The network has no peer-to-peer learning pathway.

Constraint 2: Participation floor. Every OHDSI network operates under statistical disclosure rules. If fewer than 5 (or 10, depending on DataPartner policy) patients match a cohort definition, the cell is suppressed. For common conditions, this is a minor inconvenience. For rare diseases — where a DataPartner might see one patient per year with a given presentation — this is a complete exclusion. N=1 sites contribute zero, even though a single rare disease outcome may be the most informative observation in the network.

Constraint 3: Synchronous rounds. ATLAS distributed cohort queries run in coordinated rounds. A DataPartner that completes its analysis on Tuesday cannot contribute to synthesis until all other DataPartners have also completed their runs and results have been returned to the coordinating center. Learning is batch-scheduled, not continuous.

These are not bugs in OHDSI's implementation — they are consequences of the design model. Fixing them requires a different model, not a better implementation of the existing one.

QIS Routing: Outcome Packets, Not Cohort Queries

Christopher Thomas Trevethan discovered the Quadratic Intelligence Swarm (QIS) protocol — a routing architecture in which the unit of exchange is not a query but an outcome packet. Instead of asking every DataPartner "how many patients in your OMOP CDM match this cohort definition?", QIS routes ~512-byte summaries of what happened to individual patients to semantically matched nodes across the network.

The protocol loop has five steps:

Local ingestion. Patient data remains at the originating node. No raw clinical data leaves the institution.
Semantic fingerprinting. Standardized clinical attributes — condition concept IDs from SNOMED CT, drug exposure codes from RxNorm, lab values mapped to LOINC — are transformed into a compact routing fingerprint. The fingerprint captures clinical similarity without containing PHI.
Deterministic routing. The fingerprint hash becomes a semantic address. QIS routes the outcome packet to nodes holding outcomes from similar patients. The routing layer is protocol-agnostic: a distributed hash table (Kademlia), a vector index, a pub/sub topic, or — directly relevant to OHDSI deployments — an OMOP CDM database with O(1) concept ID lookup. Standard OMOP concept IDs are already deterministic keys; no translation layer is required.
Peer synthesis. The querying node receives outcome packets from semantically matched peers. With N matched peers, there are N(N-1)/2 pairwise synthesis opportunities — quadratic scaling from each new participant.
Outcome reporting. After treatment or observation, the node emits a new outcome packet. The network becomes more accurate for every future query at that semantic address.

Mapping QIS to OHDSI Architecture

The correspondence between OHDSI components and QIS routing is precise enough to warrant a direct mapping:

OHDSI Component	QIS Equivalent	Key Difference
OMOP CDM standardized vocabulary	QIS semantic fingerprint fields	OMOP concept IDs map directly — no translation layer
ATLAS cohort definition	QIS similarity template	Same domain logic; different execution model
DataPartner	QIS edge node	Same institutional boundary; different data exchange pattern
Coordinating center query	QIS address lookup	Same intent; no coordinator required
ACHILLES characterization	QIS outcome accumulation at semantic address	ACHILLES is batch; QIS accumulation is continuous
HADES distributed execution	QIS peer-to-peer routing	HADES requires coordinator; QIS is fully decentralized

OMOP CDM's controlled vocabularies do not need to be translated into a new schema for QIS routing. A SNOMED CT concept ID for a condition, an RxNorm code for a drug exposure, and a LOINC code for a laboratory measurement are already deterministic, standardized identifiers. They are exactly the fields that appear in a QIS semantic fingerprint.

This means that any institution running an OMOP CDM can participate in QIS routing with a thin adapter layer — not a data migration.

The Math: Why N(N-1)/2 Changes the Equation

OHDSI's current distributed query model follows a precise information flow:

1 query → N DataPartners → N aggregate responses → 1 synthesis at coordinator

For 100 DataPartners, this produces 100 aggregate data points that one coordinating center synthesizes. The network's learning bandwidth is proportional to N.

QIS routing produces a different structure:

N nodes → N(N-1)/2 pairwise synthesis opportunities

For 100 DataPartners: 4,950 synthesis pairs versus 100 aggregate responses.

# OHDSI distributed query model
n_data_partners = 100
ohdsi_synthesis_points = n_data_partners  # 100 aggregate responses
                                           # synthesized at 1 coordinator

# QIS routing model
qis_synthesis_pairs = n_data_partners * (n_data_partners - 1) // 2  # 4,950

print(f"OHDSI synthesis points: {ohdsi_synthesis_points}")
print(f"QIS synthesis pairs:    {qis_synthesis_pairs}")
print(f"Ratio:                  {qis_synthesis_pairs / ohdsi_synthesis_points}x")
# Output:
# OHDSI synthesis points: 100
# QIS synthesis pairs:    4950
# Ratio:                  49.5x

At 1,000 DataPartners, the divergence becomes more dramatic: 1,000 OHDSI synthesis points versus 499,500 QIS synthesis pairs — a 499x multiplier on synthesis bandwidth, achieved without adding a single server.

Python Reference: QIS OMOP Outcome Router

The following implementation shows how an OMOP-standardized patient event becomes a QIS outcome packet with a deterministic routing key derived from standard OMOP concept IDs.

import hashlib
import json
import time
from dataclasses import dataclass, asdict
from typing import Optional

@dataclass
class OMOPEventFingerprint:
    """
    Semantic fingerprint derived from OMOP CDM standardized fields.
    All concept IDs use standard OMOP vocabulary identifiers.
    No PHI is included at any field.
    """
    # SNOMED CT concept ID (OMOP domain: Condition)
    condition_concept_id: int        # e.g., 4141365 = Systemic lupus erythematosus

    # RxNorm concept ID (OMOP domain: Drug Exposure)
    drug_concept_id: int             # e.g., 1304919 = hydroxychloroquine

    # LOINC concept ID (OMOP domain: Measurement)
    measurement_concept_id: int      # e.g., 3024561 = anti-dsDNA antibody

    # Normalized measurement value bucket (not raw value — privacy-preserving bucket)
    measurement_value_bucket: str    # e.g., "HIGH", "NORMAL", "LOW", "CRITICAL"

    # ICD-10 domain (for rare disease routing precision)
    icd10_block: str                 # e.g., "M32" (Systemic lupus erythematosus)

    # Observation period bucket (years of follow-up, bucketed)
    followup_years_bucket: str       # e.g., "0-1", "1-3", "3-5", "5+"

    # Outcome metric (normalized 0-1; derived locally; not raw outcome data)
    outcome_score: float

    # Optional: visit type context
    visit_concept_id: Optional[int] = None  # e.g., 9201 = Inpatient Visit


@dataclass
class QISOMOPOutcomePacket:
    """
    ~512-byte outcome packet routable across QIS nodes.
    Routing key is deterministically derived from OMOP concept IDs.
    Contains no PHI — safe to route across institutional boundaries.
    """
    routing_key: str            # SHA-256 of categorical fingerprint fields
    outcome_score: float        # Normalized treatment outcome
    timestamp: int              # Unix epoch (no date of service, no patient ID)
    context_hash: str           # Non-reversible summary of remaining context
    omop_condition_id: int      # SNOMED CT concept ID — enables O(1) OMOP DB lookup
    omop_drug_id: int           # RxNorm concept ID
    omop_measurement_id: int    # LOINC concept ID
    packet_version: str = "1.0"


class QISOMOPOutcomeRouter:
    """
    Routes OMOP-standardized patient events as QIS outcome packets.

    OMOP CDM's standardized concept IDs map directly to QIS semantic
    fingerprint fields. No translation layer required. Standard OMOP
    concept IDs serve as deterministic routing keys, supporting O(1)
    lookup against any OMOP CDM database as a valid QIS routing backend.
    """

    def __init__(self, node_id: str):
        self.node_id = node_id

    def build_routing_key(self, fingerprint: OMOPEventFingerprint) -> str:
        """
        Derives a deterministic routing key from categorical OMOP fields.
        Continuous values (outcome_score, raw measurements) are excluded —
        only concept IDs and bucketed categorical fields contribute to the key.
        This ensures that two institutions seeing clinically equivalent patients
        route to the same semantic address, regardless of local measurement scale.
        """
        categorical_fields = {
            "condition_concept_id": fingerprint.condition_concept_id,
            "drug_concept_id": fingerprint.drug_concept_id,
            "measurement_concept_id": fingerprint.measurement_concept_id,
            "measurement_value_bucket": fingerprint.measurement_value_bucket,
            "icd10_block": fingerprint.icd10_block,
            "followup_years_bucket": fingerprint.followup_years_bucket,
        }
        serialized = json.dumps(categorical_fields, sort_keys=True)
        return hashlib.sha256(serialized.encode()).hexdigest()

    def build_context_hash(self, fingerprint: OMOPEventFingerprint) -> str:
        """
        Non-reversible summary of full fingerprint including outcome.
        Used for synthesis verification; cannot reconstruct source record.
        """
        full_fields = asdict(fingerprint)
        serialized = json.dumps(full_fields, sort_keys=True)
        return hashlib.sha256(serialized.encode()).hexdigest()[:16]

    def emit_outcome_packet(
        self, fingerprint: OMOPEventFingerprint
    ) -> QISOMOPOutcomePacket:
        """
        Transforms an OMOP-standardized patient event into a routable
        QIS outcome packet. Raw clinical data remains at the edge node.
        """
        routing_key = self.build_routing_key(fingerprint)
        context_hash = self.build_context_hash(fingerprint)

        packet = QISOMOPOutcomePacket(
            routing_key=routing_key,
            outcome_score=fingerprint.outcome_score,
            timestamp=int(time.time()),
            context_hash=context_hash,
            omop_condition_id=fingerprint.condition_concept_id,
            omop_drug_id=fingerprint.drug_concept_id,
            omop_measurement_id=fingerprint.measurement_concept_id,
        )

        return packet

    def lookup_routing_address(self, routing_key: str) -> str:
        """
        O(1) concept-ID-based address resolution.
        QIS routing is protocol-agnostic: this lookup may run against a
        distributed hash table (Kademlia), a vector index, a pub/sub topic,
        or directly against an OMOP CDM concept table — all are valid backends.
        """
        # In production: route to peer nodes holding outcomes at this address
        # Backend options: DHT, OMOP concept table lookup, vector similarity index
        return f"qis://semantic/{routing_key[:16]}"


# Example: SLE patient in Dublin OMOP CDM DataPartner
router = QISOMOPOutcomeRouter(node_id="dublin-omop-node-01")

sle_event = OMOPEventFingerprint(
    condition_concept_id=4141365,       # SLE — SNOMED CT
    drug_concept_id=1304919,            # hydroxychloroquine — RxNorm
    measurement_concept_id=3024561,     # anti-dsDNA — LOINC
    measurement_value_bucket="HIGH",
    icd10_block="M32",
    followup_years_bucket="1-3",
    outcome_score=0.74,                 # Normalized locally; not raw lab value
    visit_concept_id=9201,              # Inpatient visit
)

packet = router.emit_outcome_packet(sle_event)
address = router.lookup_routing_address(packet.routing_key)

print(f"Routing key: {packet.routing_key[:32]}...")
print(f"Packet size: ~{len(json.dumps(asdict(packet)))} bytes")  # ~400-512 bytes
print(f"Semantic address: {address}")
print(f"Raw data left at node: True (never transmitted)")

The routing key is deterministic. Two OMOP CDM DataPartners in Dublin and Des Moines that both observe a hydroxychloroquine-treated SLE patient with elevated anti-dsDNA antibodies within a 1-3 year follow-up window will produce the same routing key — and their outcome packets will accumulate at the same semantic address — without any coordination between the institutions and without any central orchestrator.

Use Case: Distributed Phenotyping for Rare Disease

Rare disease research exposes the structural limit of OHDSI's current distributed phenotyping approach most clearly.

The OHDSI approach to rare disease phenotyping:

ATLAS builds a cohort definition for, say, anti-NMDA receptor encephalitis (SNOMED CT concept 766976). The coordinating center distributes the query. Each DataPartner executes locally. DataPartners with fewer than 5 patients matching the cohort definition suppress their cell counts — privacy floor enforced. In a 100-DataPartner network where 80 sites have seen fewer than 5 patients with this condition, 80 sites contribute nothing. The coordinating center synthesizes from 20 sites.

The network is not learning from its full knowledge base. It is learning from the fragment of its knowledge base that clears a statistical disclosure threshold. Sites that have seen exactly one patient with anti-NMDA receptor encephalitis — possibly the most clinically interesting case in the network, observed by a specialist who remembers every detail — are completely excluded.

The QIS approach to rare disease phenotyping:

Any DataPartner that has seen even a single patient with a rare condition can emit an outcome packet. The N=1 site participates fully, because QIS is not computing a population statistic — it is routing a summary of what happened to one patient to other nodes that may see similar patients in the future.

The privacy floor problem dissolves. The outcome packet from the Dublin DataPartner that treated one anti-NMDA receptor encephalitis patient with early rituximab is not a suppressed cell — it is a routable observation, addressed to the same semantic space that every future clinician querying for anti-NMDA receptor encephalitis outcomes will reach.

This is the same N=1 advantage QIS carries over federated learning, applied specifically to OHDSI's participation floor constraint. McMahan et al. (2017) identified the straggler problem in federated learning — sites with insufficient data effectively don't participate. QIS eliminates the concept of a participation floor entirely: the minimum meaningful contribution is one outcome packet.

Three Elections in the OHDSI Context

QIS governance operates through three emergent forces — the Three Elections. These are not engineered mechanisms, not configurable parameters, and not features that need to be built. They are metaphors for the evolutionary pressures that naturally emerge when outcome routing operates at scale.

Election 1: The Domain Expert Defines Similarity

George Hripcsak at Columbia defines the similarity template for autoimmune nephritis — the same clinical logic he would encode in an ATLAS cohort definition, now expressed as a QIS routing key. A rheumatologist in Zurich defines a competing template with different feature weightings. Both templates run simultaneously. The network does not arbitrate between them — outcomes at each semantic address determine which template produces more useful routing for which patient populations.

This is identical to how ATLAS cohort definitions already work: multiple researchers maintain competing phenotype definitions for the same condition, and the research community evaluates them empirically. QIS routes on whichever template a node chooses to use. No central authority certifies the "correct" template.

Election 2: Outcomes Are the Votes

There are no ballots or committees. Each outcome packet is a measurement cast by reality. A rituximab outcome in 47 semantically matched anti-NMDA receptor encephalitis patients with 81% response rate at 6 months is not an opinion — it accumulates at the semantic address and becomes the evidence that the next clinician or researcher querying that address receives. The math aggregates outcome packets continuously. Coordinating centers run synthesis on demand against an always-current evidence base rather than waiting for the next batch query round.

Election 3: Networks Compete, DataPartners Migrate

Multiple organizations can build QIS networks for the same OHDSI domain with different curators, different templates, and different synthesis methods. DataPartners and researchers migrate toward networks that produce better phenotype routing. Networks with poor templates lose participants. Networks with accurate outcome routing gain them — and gain quadratically, because each new DataPartner creates N-1 new synthesis opportunities with every existing participant. This is competitive selection applied to clinical evidence infrastructure, without requiring any governance board to adjudicate the competition.

Architecture Comparison

Dimension	OHDSI ATLAS Distributed Query	QIS Outcome Routing
Query model	Synchronous cohort query rounds	Asynchronous outcome packet emission
Synthesis location	Coordinating center only	Every node; N(N-1)/2 synthesis pairs
Real-time learning	Batch rounds; not real-time	Continuous; each packet updates the network immediately
Participation floor for rare disease	Statistical disclosure threshold (N≥5 typically)	No floor; N=1 sites participate fully
Coordinator dependency	Required for query distribution and synthesis	Not required; peer-to-peer routing
Continuous learning	Not by design; requires new query round	Architectural feature; every outcome packet compounds
Vocabulary compatibility	SNOMED CT, RxNorm, LOINC, ICD-10	Same vocabularies; concept IDs are routing keys
DataPartner data exposure	Aggregate counts only	Outcome packets only; no PHI; no raw data

QIS Does Not Replace OHDSI

This is the most important point in the article.

The standardization work OHDSI has accomplished — OMOP CDM schema design, vocabulary harmonization across SNOMED CT, RxNorm, LOINC, and ICD-10, the ATLAS phenotype library, the HADES analytical methods, the DataPartner network topology — is not made obsolete by QIS routing. It is made more powerful.

OHDSI's semantic layer is QIS's fingerprint layer. The concept ID standardization that allows ATLAS to distribute a cohort definition to 300 DataPartners and have all 300 execute it against comparable data models is exactly the property that allows QIS to route outcome packets deterministically without a translation layer. A SNOMED CT concept ID in Dublin routes to the same semantic address as the same SNOMED CT concept ID in Des Moines, Columbus, or Zurich.

Every OMOP CDM DataPartner that participates in OHDSI today has already completed the hardest part of QIS integration: they have standardized their data against controlled vocabularies that can serve directly as QIS routing keys.

What QIS adds to the OHDSI infrastructure is a complementary routing layer that operates between coordinated batch queries: continuous outcome accumulation at semantic addresses, N=1 rare disease participation, and the N(N-1)/2 synthesis multiplier that grows with every new DataPartner added to the network.

OHDSI answers: What does the evidence say across the network as of the last query round?

QIS answers: What happened to the last patient who looked like this one, and what is accumulating at this semantic address right now?

These are different questions. Both deserve answers. The OHDSI DataPartner network, mapped onto QIS edge nodes with OMOP concept IDs as routing keys, can answer both.

QIS (Quadratic Intelligence Swarm) protocol discovered by Christopher Thomas Trevethan, June 16, 2025. 39 provisional patents pending. Technical documentation: qisprotocol.com. Published articles: dev.to/roryqis.

References: Hripcsak G et al. (2015). Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers. JAMIA 22(6): 1098-1103. Garza M et al. (2016). Evaluating Common Data Models for Use with a Longitudinal Community Registry. Journal of Biomedical Informatics 64: 333-341. Observational Health Data Sciences and Informatics Collaborative (2015). OHDSI Network Description and Governance. McMahan B et al. (2017). Communication-Efficient Learning of Deep Networks from Decentralized Data. AISTATS. OHDSI (2019). The Book of OHDSI. ohdsi.org/the-book-of-ohdsi.

DEV Community