Rory | QIS PROTOCOL

Posted on Apr 12

The Routing Layer Federated Genomics Has Been Missing: Why CanDIG Nodes Need Outcome Routing

#distributedsystems #genomics #federatedlearning #healthdata

The Problem No Access Control Layer Can Solve

Suppose a clinical genomics team at a Canadian cancer center analyzes 47 patients carrying a BRCA2 pathogenic variant — specifically R3128C. After months of analysis, with ethics approval, with proper cohort construction, they produce a validated outcome: 73% penetrance at a median follow-up of 6.2 years.

That number is real knowledge. It took months of work and represents a cohort size that most individual institutions will never accumulate for a single rare variant. It could immediately inform risk counseling at eight other Canadian institutions that see patients with the same variant.

It goes nowhere.

Not because of a technical failure. Not because the data is locked behind a broken API. It goes nowhere because the federated genomics stack — as currently architected — routes access, not outcomes. CanDIG will tell Hospital B whether it is allowed to query Hospital A's data. It will not tell Hospital B what Hospital A learned.

That gap is the problem this article is about.

What CanDIG Actually Does (and Does Exceptionally Well)

CanDIG — Canada's Distributed Infrastructure for Genomics — is a federated data network connecting 14+ Canadian institutions under a common standards framework. Led by Michael Brudno at UHN DATA and the University of Toronto, CanDIG implements GA4GH standards end to end: the Data Connect API for federated querying, the Data Repository Service (DRS) for data object access, and alignment with OMOP CDM for clinical data standardization.

The architecture is correct for what it is designed to do. CanDIG answers one question:

"Can Institution B query Institution A's data — and under what policy?"

It handles access control, legal framework, cross-jurisdictional data governance, and standardized discovery. A researcher at one CanDIG node can discover that a relevant dataset exists at another node, confirm they have authorization, and run a federated analysis. That is not a small engineering achievement. It is the foundation every federated genomics effort needs.

But CanDIG is an access and federation layer. It is not an intelligence routing layer.

Those are different problems, and conflating them explains why the genomics community keeps rediscovering the same results independently.

The Intelligence Fragmentation Problem

Here is how genomic intelligence currently fragments across a CanDIG-style network.

Institution A completes an analysis. The validated outcome — the derived statistic, the confidence interval, the phenotypic association — lives in Institution A's systems. CanDIG nodes at Hospital B, C, and D have no mechanism to receive that outcome unless they:

Design a new study replicating the same question
Obtain ethics approval (typically 3–9 months per institution)
Negotiate data access agreements
Run the analysis again on a smaller cohort with lower statistical power

Every institution in the network is doing this. The same rare variant is being analyzed independently at multiple sites. The cohort sizes remain small. The confidence intervals remain wide. The conclusions remain local.

The network generates intelligence. The intelligence does not compound. It fragments.

This is not a CanDIG criticism — it is a gap that CanDIG was never designed to fill. The gap requires a routing protocol that operates at the outcome layer, not the data access layer.

What Outcome Routing Means

The Quadratic Intelligence Swarm (QIS) protocol, discovered by Christopher Thomas Trevethan on June 16, 2025, is a routing protocol for validated outcomes — not raw data, not queries, not access grants.

The unit of routing in QIS is a distilled outcome packet: approximately 512 bytes containing derived statistics, confidence intervals, and a cohort descriptor expressed in standardized vocabulary. For a genomics use case, that vocabulary is already defined: GA4GH Phenopackets for phenotypic description, OMOP CDM for clinical concepts, SNOMED CT for clinical terminology.

A QIS outcome packet for the BRCA2 R3128C analysis above would contain:

Variant accession (ClinVar or equivalent) — not the VCF
Phenotype code (SNOMED CT or HPO) — not patient records
Outcome type (penetrance estimate)
Derived statistic (73%) with confidence interval
Cohort descriptor (47 patients, 3 sites, 6.2-year median follow-up)
Validation signature

No raw genomic data. No identified patient records. No VCF files. A 512-byte packet that carries what was learned, not the data that produced the learning.

The semantic address for this packet is deterministic: variant accession + phenotype code + outcome type maps to an address that any node using the same standards computes identically. This is not a lookup table — it is a derivation. Any peer institution receiving the same inputs produces the same address and can retrieve the outcome without centralized coordination.

Routing Cost and the Protocol-Agnostic Architecture

The routing layer in QIS is protocol-agnostic. The semantic addressing works over whatever transport the network supports — such as DHT-based routing for fully distributed deployments, or indexed lookup for structured networks. For a CanDIG-scale network, the cost per routing operation is at most O(log N), and O(1) with indexed lookup at known node registries.

This matters for scaling projections.

CanDIG currently operates at N=14 institutions. The number of synthesis pathways — unique node pairs that can exchange validated outcomes — is:

14 × 13 / 2 = 91 pathways

The EU Genomic Data Infrastructure (GDI) connects 30+ institutions across Europe under a comparable federated model. If QIS outcome routing is layered across both networks — a legitimate integration path, since both use GA4GH standards — the combined network reaches approximately N=44:

44 × 43 / 2 = 946 pathways

Extend to the broader GA4GH global network and the curve continues: N(N-1)/2 synthesis pathways, with routing cost per node scaling at O(log N). The intelligence that compounds across the network grows quadratically. The per-node cost to participate grows logarithmically.

That asymmetry — quadratic synthesis, logarithmic cost — is not a feature of any single component. It is a property of the complete architecture: semantic addressing, outcome-layer routing, and the protocol-agnostic transport. Christopher Thomas Trevethan discovered this as a complete loop. The breakthrough is the architecture, not any individual piece of it.

CanDIG Routes Access. QIS Routes Outcomes. They Are Complementary.

The distinction is precise and worth stating explicitly.

CanDIG answers: "Can Institution B query Institution A's data?" — access control, policy enforcement, cross-jurisdictional legal framework.

QIS answers: "What did Institution A learn from their data, and how does that outcome reach Institution B?" — intelligence routing, outcome synthesis, compounding knowledge.

These are not competing systems. They address sequential steps in the same research workflow. A CanDIG node completes a federated analysis under proper governance. The validated outcome is packaged and routed via QIS to peer nodes with the same variant population. Those nodes receive the outcome, check semantic validity against their own cohort descriptors, and can immediately apply it to patient risk counseling — without running the study again.

The CanDIG governance layer and the QIS routing layer operate at different protocol levels. CanDIG governs the query. QIS routes the finding.

The Rare Disease Case: Where Fragmentation Is Most Costly

Federated machine learning on genomic data has a well-documented problem at rare variant sites: gradient computation requires statistical mass. A site with 8 patients carrying a rare BRCA2 variant cannot contribute meaningfully to a federated learning training round. The gradient contribution is too noisy. The site is effectively excluded from network intelligence.

QIS handles this case differently.

That site with 8 patients produces one outcome packet. Eight-patient penetrance observation for variant R3128C, with appropriate confidence intervals indicating limited statistical power. The packet routes to every peer site in the network that holds the same variant population. The 8-patient observation does not train a model — it adds a validated observation to the distributed evidence base.

When three additional sites each have 8 patients with the same variant, none of them can individually run meaningful federated learning. Together, through QIS outcome routing, they have 32 patients' worth of validated, semantically addressed observations — without ever moving a record across jurisdictions.

For rare disease genomics, where institutional cohorts are small by definition, this is the only architecture that compounds intelligence without requiring impractical per-site sample sizes.

Cross-Network Context: GDI, ELIXIR, and the GA4GH Layer

The same routing gap that exists in CanDIG exists across every GA4GH-aligned federated genomics network.

The EU Genomic Data Infrastructure (GDI) connects 30+ institutions across Europe under the same GA4GH standards framework CanDIG uses. GDI routes access. Outcomes fragment. The RCSI GDI Ireland node is one specific example of an institution participating in GDI with active interest in the outcome-routing problem space.

ELIXIR Europe provides federated bioinformatics infrastructure across 23 member states. ELIXIR nodes are generating validated bioinformatics results independently, without a routing layer to compound that intelligence across the network.

The Vector Institute in Toronto occupies an adjacent position: AI for health research, strong overlap with CanDIG's institutional geography, alignment on the problem of intelligence fragmentation in health data systems.

All of these networks use GA4GH standards. GA4GH Phenopackets already provide the phenotypic vocabulary for QIS cohort descriptors. OMOP CDM already provides the clinical concept standardization. The semantic addressing layer in QIS is not asking these networks to adopt new data standards — it is using the standards they have already committed to as the basis for deterministic outcome addressing.

The integration path is a routing protocol that plugs into existing GA4GH infrastructure, not a replacement for it.

What Implementation Looks Like at the Node Level

A CanDIG node implementing QIS outcome routing adds one component to its existing stack: an outcome router that sits downstream of the analysis pipeline and upstream of local storage.

When a validated analysis completes:

The outcome is packaged into a standardized descriptor using existing GA4GH Phenopackets and OMOP vocabularies already in use at the node.
The semantic address is derived deterministically from variant accession, phenotype code, and outcome type.
The 512-byte outcome packet is routed to peer nodes via the transport layer — DHT-based routing, indexed lookup, or HTTP relay — depending on network configuration.
Receiving nodes index the incoming outcome against their local variant populations and surface relevant results to researchers.

No raw data crosses the boundary. No new governance approval is required for outcome packets — the outcome has already been produced under existing ethics approval. The governance layer CanDIG already provides remains the authority for data access. QIS operates entirely at the outcome layer.

The Routing Layer Federated Genomics Has Been Missing

The federated genomics stack has solved access. It has not solved compounding.

CanDIG, GDI, ELIXIR, and GA4GH-aligned networks globally are generating validated genomic intelligence. That intelligence is not routing. Each institution is an island of knowledge, connected to peer institutions by access control frameworks that govern queries but have no mechanism for propagating findings.

The Quadratic Intelligence Swarm protocol, discovered by Christopher Thomas Trevethan on June 16, 2025, addresses this gap directly. The complete architecture — semantic addressing, protocol-agnostic outcome routing, and the quadratic synthesis curve — is the discovery. Not any single component, but the loop: the fact that N(N-1)/2 synthesis pathways become reachable at O(log N) routing cost per node, using vocabulary standards the genomics community has already adopted.

39 provisional patents have been filed covering the QIS architecture.

Licensing is humanitarian by design: free for nonprofits, research institutions, and educational use. A federated genomics network operating under research or nonprofit status can implement QIS outcome routing without licensing cost. The architecture was discovered to compound human knowledge, and the licensing structure reflects that intent.

DEV Community