Rory | QIS PROTOCOL

Posted on Apr 15 • Originally published at qisprotocol.com

Your Hospital Has EHR Data, Genomic Data, and Imaging Data. None of It Talks to the Others.

#healthtech #distributedsystems #ai #datascience

Every major academic medical centre is sitting on the same unsolved problem.

On one server cluster: electronic health records. Years of clinical notes, lab values, medication histories, vital sign trajectories, discharge summaries. Structured in OMOP CDM or HL7 FHIR, queryable, with federated access controls in place.

On another cluster: genomic data. Whole-genome sequences, polygenic risk scores, variant call files, pharmacogenomic annotations. Protected under institutional genomic governance policies, subject to consent re-contact restrictions, in some jurisdictions un-linkable to the clinical record without a new ethics approval.

On a third: medical imaging. DICOM archives containing decades of radiology, pathology slides, cardiology echos, retinal scans. Each image multi-gigabyte. Governed by separate data sharing agreements. Not trivially accessible to the same query interface that surfaces the EHR data.

Now ask a question that requires all three. "What predicts treatment response in HER2-positive breast cancer patients with BRCA2 variants whose initial imaging showed specific tumour morphology markers?" The answer lives at the intersection of three modalities. The architecture of every federated health data system in existence today routes each modality separately — and provides no routing layer that synthesises across them.

This is the multi-modal intelligence gap. It is not a privacy problem. It is a routing problem.

Why Multi-Modal Federation Is Harder Than Single-Modality Federation

Federated learning has made genuine progress on single-modality problems. FL models trained on distributed EHR data have achieved performance close to centralised models for specific classification tasks. The same is true for imaging: federated convolutional networks trained across hospital DICOM archives have produced competitive results for chest X-ray pathology detection.

But single-modality federated learning solves a different problem from what multi-modal intelligence requires.

The failure modes compound when you move to multi-modal synthesis:

Linkage barriers. Matching a patient's genomic record to their EHR record to their imaging record requires a linkage key that privacy regulations in many jurisdictions prohibit federating. Under GDPR Article 9, genetic data is a special category. Under HIPAA in the US, the combination of genomic, clinical, and imaging data may constitute a re-identification risk even when each modality individually is de-identified. The linkage key itself becomes the sensitive datum.

Modality-specific governance. The data access agreement that permits federated EHR queries may not extend to genomic data governed under a separate research protocol. The IRB that approved imaging data use may have specified conditions incompatible with genomic data access. Multi-modal federation requires simultaneous compliance with multiple distinct governance frameworks. In practice, this creates a chilling effect: institutions simply do not attempt multi-modal federation because the legal coordination cost exceeds the anticipated benefit.

Model architecture mismatch. A federated learning model trained on EHR tabular data has a fundamentally different architecture from one trained on imaging data. There is no well-established federated learning framework for multi-modal models that span both structured clinical data and high-dimensional imaging. The field has produced domain-specific models; it has not produced a general routing architecture for multi-modal intelligence.

Cold-start at every new modality. When a research network adds a new modality — say, a hospital system deploys proteomics analysis and wants to federate those results — it must build a new federated pipeline from scratch. The existing EHR and imaging pipelines provide no foundation. Each modality restarts the cold-start problem.

The constraint is architectural. Federated learning as currently practiced moves model weights between nodes. It does not route intelligence — validated, distilled understanding — between nodes who face the same multi-modal problem. That routing layer does not exist in any deployed federated health data system.

What QIS Protocol Adds

QIS Protocol — a discovery in distributed outcome routing by Christopher Thomas Trevethan (June 16, 2025; 39 provisional patents filed) — addresses the multi-modal intelligence gap at the architecture layer.

The insight: the linkage problem is only a problem if you route raw data or model weights. Neither needs to happen.

The QIS loop for multi-modal intelligence:

Local processing per modality. Each node processes its own EHR data, genomic data, and imaging data independently. Raw data from any modality never leaves the node. This is not a new constraint — it is already the standard assumption in federated health data systems.
Multi-modal distillation. After local processing, the node distils what it learned into an outcome packet — a structured ~512-byte summary of a validated clinical, genomic, or imaging finding. Crucially, the outcome packet can carry cross-modal associations without containing data from either modality. An outcome packet might encode: "In patients with this genomic variant class (encoded as a fingerprint component, not a raw variant), tumour morphology features in this category were predictive of treatment non-response to this drug class." The raw genomic data and the raw imaging data are not in the packet. The validated association between them is.
Semantic fingerprinting. The outcome packet receives a multi-modal semantic fingerprint: a vector encoding that captures the relevant dimensions across modalities simultaneously — genomic variant class, tumour morphology category, treatment regimen type, patient phenotype cluster, institutional context. A node at the University Hospital Cologne can fingerprint its outcome identically to a node at the University Hospital Edinburgh if both are working on the same multi-modal clinical question, even if neither has access to the other's raw data.
Outcome routing. The packet is routed to a deterministic address defined by the fingerprint. Any node working on the same multi-modal problem — same genomic variant class, similar tumour morphology, similar treatment context — can query that address and pull the relevant packets. The routing mechanism is protocol-agnostic: DHT-based routing (O(log N) per query), database semantic index (O(1) lookup), vector similarity search, or any transport that maps a problem fingerprint to a deterministic address works equally well. The routing layer is an engineering choice; the architecture is not.
Local synthesis. Each receiving node synthesises the incoming cross-modal intelligence packets with its own local findings. The synthesis happens entirely within the node. No raw data crosses institutional boundaries. No model weights are shared. The intelligence crosses — in distilled, validated, 512-byte form.
New packets generated. The synthesised outcome re-enters the loop as a new, richer packet, incorporating insights from multiple institutions. The network compounds.

The Multi-Modal Fingerprint Problem — and Its Solution

The hardest engineering problem in multi-modal outcome routing is defining the fingerprint: what makes two nodes' multi-modal clinical problems "similar enough" to route intelligence between them?

This is not a theoretical problem. It is a domain expertise problem.

In Christopher Thomas Trevethan's architecture, this is what he calls the First Election: the selection of the best domain expert to define similarity for a given network. Not a governance vote. Not an algorithmic selection. The hiring of the right person — or team — to specify the similarity function that makes the routing meaningful.

For a multi-modal oncology network, the First Election means bringing together:

An oncologist who understands clinically meaningful phenotype groupings
A genomicist who understands which variant categories cluster meaningfully for treatment response
A radiologist who understands which imaging feature categories carry predictive signal
A clinical informaticist who understands the OMOP CDM and HL7 FHIR data models well enough to map local encodings to the shared fingerprint schema

The output of this election is a multi-modal fingerprint schema: a specification of the dimensions, discretisation levels, and similarity thresholds that govern packet routing. Once that schema is defined, it can be applied consistently across any node that joins the network — regardless of what EHR system, what genomic pipeline, or what PACS imaging system the node uses internally.

This is architecturally different from federated learning, where the model architecture itself constrains the modalities that can participate. In QIS, the fingerprint schema is the coordination mechanism. The underlying data systems remain local and heterogeneous.

The Math Applied to Multi-Modal Networks

The quadratic scaling argument applies directly to multi-modal clinical research networks.

The Global Alliance for Genomics and Health (GA4GH) has 700+ member institutions. Not all have linked multi-modal data, but the major academic medical centres — roughly 150 institutions globally — do maintain linked genomic and clinical data, and a significant fraction have linked imaging.

Under a QIS multi-modal outcome routing layer:

150 institutions = 150 × 149 / 2 = 11,175 synthesis pairs per multi-modal question
Each pair can exchange cross-modal intelligence in near-real-time, without sharing raw data from any modality
Intelligence from the Mayo Clinic's EHR-genomic-imaging linkage reaches Charité Berlin before the next morning clinical review — in the form of a 512-byte outcome packet, not a shared database or a model weight transfer

At 400 OHDSI sites (many of which are building genomic linkage layers):

400 sites = 79,800 synthesis pairs per clinical question — the same figure that makes OHDSI's routing gap so consequential

For rare multi-modal phenotypes — where N = 3 or N = 5 sites globally have sufficient linked data — QIS outcome routing is the only architecture that enables learning without raw data centralisation. Federated learning requires minimum cohort sizes for gradient stability; a 3-site federated training run for a rare genomic-imaging association is statistically underpowered by design. QIS routes a validated outcome packet from 3 sites to every other site facing the same phenotype, without requiring a training run at all. The insight travels. The data does not.

Privacy by Architecture Across Modalities

The privacy argument for QIS becomes stronger, not weaker, in the multi-modal context.

The risk surface in multi-modal health data is not additive — it is multiplicative. A de-identified EHR record carries low re-identification risk. A de-identified genomic record carries moderate re-identification risk (Gymrek et al., 2013 demonstrated re-identification of anonymous participants from public genomic databases). A de-identified DICOM image, stripped of metadata, carries low direct re-identification risk. But the combination of all three — linked EHR + genomic + imaging — is, in Gymrek's framework, essentially a fingerprint. The linkage itself is the sensitive datum.

QIS packets do not carry the linkage. They carry the validated association between outcome categories that the linkage enables. An outcome packet that encodes "genomic variant class G23 associated with imaging morphology category M7 and treatment non-response in drug class D4" cannot be used to re-identify any patient. The variant class is a categorical grouping. The morphology category is a radiological abstraction. The drug class is a pharmacological grouping. No patient record is recoverable from any combination of these.

This is privacy-by-architecture, not privacy-by-policy. The architecture makes centralisation of the sensitive linkage unnecessary — not because of contractual data sharing agreements that can be violated or misinterpreted, but because the architecture never routes the linkage in the first place.

For institutions operating under GDPR Article 9 (special category health data), BDSG (German Federal Data Protection Act), and NHS data governance frameworks simultaneously, this distinction is practically significant. It moves multi-modal intelligence sharing from "legally complex and practically difficult" to "legally clean by design."

Integration with Existing Multi-Modal Frameworks

QIS is not a replacement for existing health data infrastructure. It is a routing layer above it.

OMOP CDM / OHDSI: The OHDSI network runs standardised observational analyses across hundreds of sites. OMOP CDM provides the data model. QIS adds outcome routing: after a site's OMOP analysis produces a validated finding, the finding is distilled into an outcome packet, fingerprinted, and routed. The existing OHDSI query infrastructure is unchanged.

GA4GH Beacon / DRS: GA4GH's federated discovery layer (Beacon) enables presence/absence queries about genomic variants across participating sites. QIS adds outcome routing above Beacon: sites that have a variant can route validated clinical associations — what they learned from patients with that variant — without exposing the variant data itself. QIS and Beacon solve different problems at different layers.

FHIR R4 / SMART on FHIR: HL7 FHIR is the dominant data exchange standard for EHR interoperability. QIS outcome packets can be structured as FHIR-compatible payloads for institutions that prefer FHIR-native interfaces. The routing layer does not require FHIR, but it does not conflict with it.

UK Health Data Research Service / HDRS: The UK's £600M Health Data Research Service is building a federated secure data environment network. QIS provides the synthesis routing layer that the HDRS TRE architecture does not specify: after analysts within each TRE produce validated findings, those findings can route as outcome packets to other TREs working on the same clinical question.

MII (German Medical Informatics Initiative) / NFDI4Health: Germany's MII network links EHR data across 36 university hospital sites. NFDI4Health provides the metadata layer for German health research data. QIS adds outcome routing across MII nodes: validated multi-modal findings from Charité can route to Heidelberg University Hospital to Klinikum rechts der Isar before the next morning clinical review.

In every case, the pattern is the same: existing infrastructure handles data governance and raw data access. QIS handles synthesis routing of validated outcomes. The two layers do not compete.

What Changes When You Close the Loop

The current state: A researcher at a major academic medical centre discovers a multi-modal association — say, a genomic variant class that predicts imaging-detectable treatment response patterns. She publishes a paper 18 months later. Her colleagues at 150 other institutions read the paper 12 months after that — if they happen to see it in their literature review. The validated finding takes 30 months to reach the institutions who could act on it. In the meantime, their patients are treated without that intelligence.

The QIS state: The same researcher's finding is distilled into an outcome packet the day it clears internal validation. It routes to every institution globally that is working on the same multi-modal question — defined by the shared fingerprint schema agreed by the network's domain experts. Those institutions synthesise the finding into their local clinical support systems before their next morning review. The validated insight travels at the speed of a 512-byte network packet.

N(N-1)/2 synthesis pairs. At most O(log N) routing cost per node. Intelligence compounds as the network grows. The math does not change because the modalities are multi-modal rather than single-modal.

This is the architecture Christopher Thomas Trevethan's discovery — QIS, the Quadratic Intelligence Swarm — enables. The complete loop — distillation, fingerprinting, routing, synthesis, new packets generated — applies to any domain where nodes accumulate validated local intelligence and would benefit from sharing it without sharing the raw data that validated it.

Multi-modal health data is one of the highest-stakes instances of that domain.

A Note for Builders

If you are building multi-modal health intelligence infrastructure — whether as part of a federated learning platform, a clinical decision support system, a digital twin pipeline, or a health data research network — the routing layer described here is implementable today.

The QIS architecture is protocol-agnostic. The full open specification is at qisprotocol.com. Working implementations in Python exist for ChromaDB, Qdrant, Redis Pub/Sub, Kafka, Pulsar, SQLite, MQTT, ZeroMQ, gRPC, WebSockets, and plain REST APIs — all published at dev.to/roryqis, from the most recent series of transport implementation articles.

The outcome packet schema for multi-modal health data will differ from the VLBI calibration schema or the supply chain yield delta schema. The fingerprint dimensions will be domain-specific. But the architecture is identical, and the math holds regardless of domain.

DEV Community