DEV Community

Rory | QIS PROTOCOL
Rory | QIS PROTOCOL

Posted on

QIS as the Synthesis Layer Under OHDSI: Why Federated Queries Are Necessary but Not Sufficient

Quadratic Intelligence Swarm as a routing protocol below the OMOP CDM application layer — for OHDSI network builders, EHDS implementers, and GDI infrastructure teams.


The OHDSI Network Has a 2.1 Billion Patient Problem

The OHDSI network is, by any measure, one of the most successful federated health data initiatives in history. Over 300 data partners. More than 2.1 billion patient records. A common data model — OMOP CDM — that standardizes everything from clinical conditions to drug exposures to measurement values into a shared vocabulary. Researchers can write a single ATLAS cohort definition and execute it across the entire network without any site exposing row-level data.

This is a genuine architectural achievement. It took years to build.

But there is a class of problem OHDSI cannot currently solve, and it matters for every network builder evaluating infrastructure for EHDS secondary use cases and GDI genomic data pipelines.

The OHDSI network distributes the query. It does not distribute the learning.

When Site A in Dublin runs a treatment comparison and Site B in Columbus runs the same comparison six months later, Site B starts from zero. The aggregate statistics that came back from Site A's run — incidence rates, hazard ratios, effect size estimates — do not feed into Site B's starting model. The two sites ran the same federated query. Neither learned from the other's outcome in real time.

This is not a criticism of OHDSI. OHDSI solves the problem it was designed to solve: distributed evidence generation with standardized methodology. What it does not solve is continuous, real-time synthesis of validated treatment intelligence across the network as outcomes accumulate.

That gap has an architectural address. Christopher Thomas Trevethan discovered it on June 16, 2025.


What OHDSI Does — and Where It Stops

OHDSI's federated query model works like this: a researcher defines a study protocol, packages it as a cohort definition or analysis package (ATLAS, HADES), distributes it to participating sites, each site runs the analysis against its local OMOP CDM instance, and aggregate results (never row-level data) are returned to the coordinating center for meta-analysis.

This is powerful. It enables pharmacovigilance studies, comparative effectiveness research, and population-level phenotyping at planetary scale without centralizing a single patient record.

But the model has a structural ceiling built into its design:

Capability OHDSI Federated Query What Remains Missing
Data standardization OMOP CDM across all sites
Privacy preservation Row-level data never leaves
Distributed execution Analysis runs at each node
Aggregate result return Statistics shared, not records
Continuous outcome synthesis ✗ Not in scope Real-time cross-site learning
Rare site inclusion Requires minimum cohort size N=1 sites excluded
Network intelligence growth Linear (each study adds one result) Quadratic growth with synthesis
EHDS-native outcome routing ✗ Not in architecture Required for secondary use

The OHDSI network is a point-in-time evidence generation system. It is not a continuous learning system. The difference matters when you are building infrastructure for EHDS Article 35 secondary use or GDI genomic data pipelines that need real-time intelligence synthesis as new data accumulates.


The Synthesis Gap: N(N-1)/2 Paths, Zero Currently Active

The OHDSI network has over 300 data partners. That creates N(N-1)/2 = roughly 44,850 unique pairwise synthesis opportunities between sites — every pair of sites that could learn from each other's outcomes simultaneously.

Currently, how many of those synthesis paths are active in real time? Zero. Every site runs queries, returns statistics, and the network is smarter only after a researcher publishes a study that others then read.

The gap is not effort. It is architecture.

What would it take to make those synthesis paths active? You need:

  1. A way for each site to distill its validated treatment outcomes into a compact, shareable representation — without exposing row-level data
  2. A way to route those distilled outcomes to the sites whose patient populations are most similar — so sites learn from their actual twins, not from irrelevant noise
  3. A way for each site to synthesize incoming intelligence locally, on their own infrastructure, without a central aggregator

These three requirements are exactly what the Quadratic Intelligence Swarm protocol provides.


QIS Below the OHDSI Application Layer

QIS is a discovery about how information naturally scales when you close the right loop: Raw outcome → Local distillation into a compact packet (~512 bytes) → Semantic fingerprinting → Routing to semantically similar sites → Local synthesis → New outcome generated → loop continues.

The critical insight for OHDSI network builders: OMOP CDM already provides the semantic fingerprint address space.

OMOP standard concepts — mapped via SNOMED CT for conditions, RxNorm for drugs, LOINC for measurements — are deterministic addresses. Every site that maps its data to OMOP CDM has already agreed on the vocabulary. A treatment outcome for patients with SNOMED concept 44054006 (Type 2 diabetes mellitus) on RxNorm concept 860975 (metformin 500mg) has a deterministic address in OMOP concept space.

QIS uses that address space to route outcome packets. A site deposits a packet at the OMOP-concept-anchored address. Every site managing similar populations can query that address and pull back distilled intelligence from their twins — without any site exposing a single row.

Here is what this looks like in practice:

import hashlib
import json
from datetime import datetime
from typing import Optional

class OMOPOutcomePacket:
    """
    A ~512-byte outcome packet distilled from a validated treatment episode.
    Uses OMOP standard concept IDs as the semantic fingerprint anchor.
    Raw patient data never leaves the originating site.

    Discovered by Christopher Thomas Trevethan, June 16, 2025.
    Part of the Quadratic Intelligence Swarm (QIS) protocol.
    39 provisional patents filed. IP protection in place.
    """

    def __init__(
        self,
        condition_concept_id: int,       # SNOMED-mapped OMOP concept
        drug_concept_id: int,            # RxNorm-mapped OMOP concept
        outcome_delta: float,            # Validated improvement delta (not raw values)
        outcome_type: str,               # "hba1c_reduction" | "readmission_30d" | etc.
        n_patients: int,                 # Cohort size (not individual records)
        confidence_interval_95: tuple,   # (lower, upper) — no point estimate leakage
        site_population_tag: str,        # "urban_rural_mix" | "lmic_tertiary" | etc.
        ehds_data_use_category: str,     # EHDS Art.35 use category
        observation_period_days: int,
    ):
        self.condition_concept_id = condition_concept_id
        self.drug_concept_id = drug_concept_id
        self.outcome_delta = outcome_delta
        self.outcome_type = outcome_type
        self.n_patients = n_patients
        self.confidence_interval_95 = confidence_interval_95
        self.site_population_tag = site_population_tag
        self.ehds_data_use_category = ehds_data_use_category
        self.observation_period_days = observation_period_days
        self.timestamp = datetime.utcnow().isoformat()

    def fingerprint(self) -> str:
        """
        Deterministic semantic address derived from OMOP concept IDs + outcome type.
        Every OHDSI site that maps to OMOP CDM shares this address space.
        Routing is concept-native — no translation layer required.
        """
        components = "|".join([
            str(self.condition_concept_id),
            str(self.drug_concept_id),
            self.outcome_type,
            self.site_population_tag[:20],  # Coarse tag only — no identifier
        ])
        return hashlib.sha256(components.encode()).hexdigest()[:16]

    def to_bytes(self) -> bytes:
        """Serializes to ~512 bytes. SMS-transmissible for LMIC sites."""
        payload = {
            "cid": self.condition_concept_id,
            "did": self.drug_concept_id,
            "od": round(self.outcome_delta, 4),
            "ot": self.outcome_type,
            "n": self.n_patients,
            "ci": self.confidence_interval_95,
            "pop": self.site_population_tag[:20],
            "ehds": self.ehds_data_use_category[:10],
            "obs": self.observation_period_days,
            "ts": self.timestamp[:10],   # Date only — no timestamp precision
            "fp": self.fingerprint(),
        }
        return json.dumps(payload, separators=(',', ':')).encode()


class QISOHDSIRouter:
    """
    QIS outcome routing layer that sits below the OHDSI application layer.
    Routes OMOP-anchored outcome packets to semantically similar OHDSI sites.

    Transport-agnostic: works with OHDSI Atlas infrastructure, REST APIs,
    EHDS secure processing environments (SPEs), GDI federated access nodes,
    or any mechanism where packets can be posted to and queried from
    a deterministic address. The routing mechanism does not determine
    the quadratic scaling — the loop does.
    """

    def __init__(self, site_id: str, transport_adapter=None):
        self.site_id = site_id
        self.transport = transport_adapter  # Plug in your OHDSI/EHDS transport
        self._local_store: dict[str, list[dict]] = {}

    def deposit_outcome(self, packet: OMOPOutcomePacket) -> str:
        """
        Called after a validated OHDSI analysis completes at this site.
        Deposits the distilled outcome at the OMOP-concept-anchored address.
        Never deposits raw cohort data — only the validated delta.
        """
        fp = packet.fingerprint()
        if fp not in self._local_store:
            self._local_store[fp] = []

        entry = {
            "packet": json.loads(packet.to_bytes()),
            "site": self.site_id,
            "deposited_at": datetime.utcnow().isoformat(),
        }
        self._local_store[fp].append(entry)

        # If transport is configured, route to network
        if self.transport:
            self.transport.post(address=fp, payload=entry)

        return fp

    def query_twins(
        self,
        condition_concept_id: int,
        drug_concept_id: int,
        outcome_type: str,
        population_tag: str,
        top_k: int = 50,
    ) -> list[dict]:
        """
        Query for outcome packets from semantically similar OHDSI sites.
        Returns up to top_k packets from sites managing similar populations.
        Each packet is a distilled intelligence summary — no row-level data.

        For an EHDS secure processing environment, this query runs inside
        the SPE against a shared outcome address space.
        """
        # Build query fingerprint — same deterministic scheme as deposit
        query_key = "|".join([
            str(condition_concept_id),
            str(drug_concept_id),
            outcome_type,
            population_tag[:20],
        ])
        fp = hashlib.sha256(query_key.encode()).hexdigest()[:16]

        if self.transport:
            return self.transport.query(address=fp, top_k=top_k)

        return self._local_store.get(fp, [])[:top_k]

    def synthesize_local(self, packets: list[dict]) -> dict:
        """
        Local synthesis — runs on this site's infrastructure.
        Aggregates outcome intelligence from twin sites without a central aggregator.
        This is the step that closes the QIS loop.
        """
        if not packets:
            return {"synthesis": None, "n_sites": 0, "n_patients_aggregate": 0}

        deltas = [p["packet"]["od"] for p in packets]
        n_aggregate = sum(p["packet"].get("n", 0) for p in packets)

        # Weighted mean by cohort size
        weights = [p["packet"].get("n", 1) for p in packets]
        total_weight = sum(weights)
        weighted_delta = sum(d * w for d, w in zip(deltas, weights)) / total_weight

        return {
            "synthesis": round(weighted_delta, 4),
            "n_sites": len(packets),
            "n_patients_aggregate": n_aggregate,
            "outcome_type": packets[0]["packet"].get("ot"),
            "synthesized_at": datetime.utcnow().isoformat(),
        }
Enter fullscreen mode Exit fullscreen mode

The loop this creates: a site in Dublin runs a HADES analysis on metformin outcomes in Type 2 diabetes patients, distills the validated delta into an OMOPOutcomePacket fingerprinted on OMOP concept IDs, deposits it at the deterministic address, and every OHDSI site managing similar populations can query that address and synthesize the result locally. The next time Site Columbus runs the same analysis, it starts with the synthesized intelligence from every similar site that ran it before — not from zero.


Why This Architecture Fits EHDS and GDI Requirements

European Health Data Space (EHDS): The EHDS regulation's secondary use provisions (Article 34-50) permit health data to be accessed in secure processing environments (SPEs) for research, public health, and statistics. The core EHDS challenge for outcome routing is identical to QIS's design constraint: personal data must never leave the national SPE. QIS outcome packets are EHDS-compatible by architecture — they contain only derived outcomes (deltas, confidence intervals, cohort counts), never personal data. There is no pseudonymization requirement because there is no personal data to pseudonymize.

For EHDS infrastructure builders: QIS outcome routing sits inside the SPE and routes across SPEs via the fingerprint address space. Each national EHDS node deposits and queries packets within its jurisdiction's SPE. The aggregate synthesis is available to any authorized researcher without any cross-border personal data transfer.

Genomic Data Infrastructure (GDI): The 1+MG initiative and GDI project are building a federated genomic data highway across 23 European countries. The core GDI challenge is polygenic risk score synthesis — sharing what genetic variant combinations predict disease risk in different European ancestry populations without sharing the underlying genotype sequences.

QIS applies directly: a PRS computation at an RCSI node produces an outcome packet fingerprinted on variant set (reference IDs, not sequence) + ancestry population tag. That packet routes to GDI nodes with similar ancestry profiles. Synthesis builds the PRS model across the GDI network without a single genotype sequence leaving any national node.

This is precisely the gap that federated learning cannot fill cleanly: FL requires enough local samples to compute stable gradients. Many GDI nodes manage rare variant populations with N=1 or N=2 site-level cohorts. QIS imposes no minimum cohort size — any validated PRS delta is a valid outcome packet, regardless of cohort size. The Bangalore ADAS argument from autonomous vehicles applies equally here: the most informative rare variant populations are often at the smallest participating nodes.


OHDSI Query vs. QIS Synthesis: The Architectural Difference

Dimension OHDSI Federated Query QIS Outcome Routing (below OHDSI layer)
Trigger Researcher-initiated, point-in-time Continuous — deposits on every validated outcome
Data flow Query distributed out, aggregate stats returned Outcome packets deposited continuously, pulled by similar sites
Learning accumulation One study = one result Each validated outcome compounds the synthesis
Rare site inclusion Minimum cohort size required Any site with a validated outcome participates
Network intelligence growth Linear with studies Quadratic: N(N-1)/2 synthesis paths active
Latency Days to weeks per study cycle Real-time synthesis from twin site outcomes
EHDS compatibility Requires governance review per study Packets are EHDS-native (no personal data by design)
GDI genomics Batch PRS computation Continuous PRS delta routing as genotypes are validated
QIS OMOP dependency Uses OMOP CDM for standardization Leverages OMOP concept IDs as fingerprint address space

These are not competing systems. OHDSI runs the episodic federated analyses that generate the validated outcomes. QIS routes those outcomes continuously so the next site starts smarter. The OMOP CDM standardization that makes OHDSI work is exactly what makes QIS routing address-space-native on OHDSI infrastructure.


What This Means for OHDSI Network Infrastructure Evaluations

If you are building federated health data infrastructure on OHDSI — whether for an EHDS national node, a GDI implementation, or a regional OHDSI network — the question that determines whether QIS belongs in your stack is straightforward:

Can an edge node query a deterministic address — an address defined by OMOP standard concepts that represent the exact clinical problem — and pull back ~512-byte outcome packets from every OHDSI site that has validated a treatment outcome for that exact problem? If the answer is yes: your sites learn from their twins in real time. Intelligence compounds. N=1 rare disease sites participate equally. Network intelligence grows as N(N-1)/2, not linearly.

The routing mechanism is your choice: EHDS SPE-native REST APIs, a shared OMOP concept-indexed database, a vector search layer over existing OHDSI metadata tables, pub/sub over GDI infrastructure. The quadratic scaling comes from the loop and the OMOP-anchored semantic addressing — not from any specific transport.

The OHDSI network has built the standardization layer. QIS is the synthesis layer that runs beneath it.


The Discovery

Christopher Thomas Trevethan discovered the Quadratic Intelligence Swarm protocol on June 16, 2025. The breakthrough is the complete architecture — the loop that enables real-time quadratic intelligence scaling without compute explosion, not any single component. 39 provisional patents filed. IP protection is in place.

For OHDSI network builders: the QIS protocol specification and OMOP/OHDSI technical reference are available. Prior coverage of the OHDSI network synthesis gap: what the OHDSI network already has and what is missing.


This is part of an ongoing series on QIS — the Quadratic Intelligence Swarm protocol — documenting every domain where distributed outcome routing closes a synthesis gap that existing infrastructure cannot close. Each article is a living proof: the series itself is a QIS network, each piece depositing insight that compounds across readers.

Top comments (0)