Rory | QIS PROTOCOL

Posted on Apr 12

Zero-Knowledge Outcome Routing: Why QIS Packets Are PHI-Free by Construction

#distributedsystems #healthdata #machinelearning #security

Every serious distributed health intelligence system eventually faces the same wall.

You have 2,400 patients across 18 hospitals. You have collectively learned something important — that a specific combination of lab values, timing, and medication class predicts 30-day readmission with 84% accuracy. You want the other 17 hospitals to benefit from that knowledge. But the moment you ask "how do we share this?", the regulatory machinery activates: HIPAA, GDPR, IRB, data use agreements, de-identification standards, BAAs.

The engineers conclude: we cannot share the data. So the knowledge stays trapped.

This is the wrong framing. The question is not "how do we share the data?" The question is: why does sharing knowledge require sharing data at all?

Christopher Thomas Trevethan's discovery — Quadratic Intelligence Swarm (QIS) — answers this by separating the two things that every prior architecture conflated: the raw signal and the distilled outcome. QIS routes outcomes, not data. And that distinction makes the entire regulatory conversation obsolete — not by navigating it, but by never entering it.

What PHI-Free by Construction Actually Means

The phrase "de-identified data" appears in every health AI privacy discussion. It is the wrong target.

De-identification is a process applied to PHI after the fact. It works by removing or transforming identifiers — names, dates, geographic subdivisions, device identifiers — according to HIPAA Safe Harbor or Expert Determination standards. The problem: de-identification is a statistical guarantee about a reduced dataset. It is not a structural guarantee about what the network receives.

A de-identified record still contains: diagnosis codes, procedure codes, clinical measurements, temporal patterns, genomic variants. These can be re-identified through inference, linkage attacks, and model inversion. The 2006 Netflix de-identification breach. The Massachusetts Governor's medical records re-identified from "de-identified" state data. The Stanford facial recognition work showing quasi-identifiers persist through multiple de-identification passes.

De-identification reduces risk. It does not eliminate the category of risk.

PHI-free by construction is different. It means the information object that crosses the network boundary does not contain — and structurally cannot contain — any data from which an individual could be identified. Not by inversion. Not by linkage. Not by inference. Because the object was never derived directly from raw records.

This is what a QIS outcome packet is.

The Outcome Packet: What It Contains

A QIS outcome packet is approximately 512 bytes. Its structure is:

{
  "semantic_fingerprint": "<vector hash representing problem domain>",
  "outcome_summary": "<distilled result of local synthesis>",
  "confidence_interval": [lower, upper],
  "n_observations": <integer>,
  "timestamp": <unix epoch>,
  "node_id": "<deterministic hash, not traceable to institution>"
}

That is the complete object. There is no patient record in it. There is no lab value. There is no diagnosis code. There is no individual measurement.

The outcome_summary field contains a distilled synthesis — the kind of content you would find in a published abstract, not a patient chart. "Patients with this semantic fingerprint who received intervention A showed a 23% reduction in outcome B at 90 days." That statement, in 512 bytes, represents what the edge node learned. The edge node's 2,400 patient records never left the node.

The node_id is a deterministic hash of the node's configuration. It is not an IP address, institution name, or any identifier traceable to an organization. Two nodes with identical configuration produce identical node IDs — which means it cannot be used to identify a specific institution without additional side-channel information.

The n_observations field does enable a partial inference: this node has at least N patients matching this fingerprint. This is equivalent to the sample size disclosure in any published clinical study. It is not PHI.

Why This Is Different from Federated Learning

Federated learning (FL) is the most cited comparison. The standard FL pitch is: "we don't move the data, we move the model." But this description understates what actually crosses the network.

In standard FL:

A global model is initialized at the coordinator
Each participating node receives the full model weights (or a gradient update target)
Each node trains locally on raw patient data
Each node returns gradient updates — high-dimensional vectors representing how the patient data modified every parameter in the model
The coordinator aggregates gradient updates into a new global model
Repeat

What crosses the network in step 4 is not outcome summaries. It is gradient vectors with dimensionality proportional to model size — often millions of floating-point values per update.

Gradient updates can leak PHI. This is not a theoretical concern. The research literature is explicit:

Zhu et al. (2019), "Deep Leakage from Gradients" — demonstrated full reconstruction of training images from gradient updates in CNNs
Geiping et al. (2020), "Inverting Gradients" — showed that high-fidelity image reconstruction from gradients persists even after FedAvg aggregation
Zhao et al. (2020), "iDLG" — improved exact gradient inversion for text and tabular data, directly relevant to clinical records

The clinical implication: a malicious FL coordinator, or a compromised aggregation server, can potentially reconstruct individual patient records from the gradient updates your hospital's nodes are sending. Differential privacy and secure aggregation add noise and cryptographic overhead to mitigate this — they do not eliminate the underlying structural exposure.

QIS does not send gradient updates. There is no model to invert. The network never receives the mathematical fingerprint of your patient records. It receives a 512-byte distilled outcome — the synthesis your node produced, not the material your node used to produce it.

The Architecture Proof

Consider a hospital with 847 oncology patients. The edge node processes patient records locally — running whatever clinical ML model the institution uses. It produces an outcome: "For patients with this semantic fingerprint (NSCLC, Stage IIIA, KRAS G12C variant, prior platinum failure), the 12-month OS rate with treatment X was 34.2% (95% CI: 28.1–40.7) across N=47 patients."

The edge node computes a semantic fingerprint of the problem domain. This fingerprint is a vector representation of the clinical question — not of any patient. It captures the domain structure: disease type, stage, biomarker, prior treatment class, outcome endpoint. This fingerprint functions as a routing address.

The node deposits the 512-byte outcome packet to a deterministic address defined by the semantic fingerprint. Other nodes with semantically similar patient populations query that address and receive the packet.

At no point does any raw patient record, any model weight, any gradient update, or any direct measurement cross the network boundary.

The edge node is a one-way distillery. It takes in raw signals. It outputs summaries. The raw signals never leave.

This is privacy by architecture. You cannot leak data you never transmit. Differential privacy is not needed because the network never receives the high-dimensional representation of your patient records. Secure aggregation is not needed because there is no aggregation step — each node synthesizes locally from the outcome packets it pulls.

The Byzantine / Data Poisoning Dimension

Oliver's framing from the Google AI feedback deserves direct address here: "Without a central authority, how do you prevent bad actors from poisoning the clinical data?"

The answer is a mathematical property of N(N-1)/2 networks.

In a QIS network with N nodes, each node synthesizes from a pool of outcome packets deposited by semantically similar peers. For a malicious node to materially corrupt the network's intelligence, it must:

Create outcome packets false enough to shift the synthesis of receiving nodes
Create enough such packets that they outweigh the volume of truthful observations

The second condition is the load-bearing constraint. In a network where N similar nodes are all depositing real-world outcomes, the fraction of the synthesis pool that a single bad actor can pollute decreases as N grows. For a malicious node to swing a synthesis from "treatment A reduces readmission by 22%" to "treatment A has no effect," it must fabricate a volume of plausible, internally consistent outcome data that exceeds the cumulative weight of actual clinical outcomes from N-1 real institutions.

This is not a password problem. It is a fabrication-of-reality problem. The cost of fabricating a false clinical reality consistent enough to override N(N-1)/2 authentic signal pairs grows with N. The network becomes harder to poison as it grows larger — the opposite of a centralized system, where a single compromised server poisons everyone simultaneously.

The drift detection layer adds an additional mechanism: nodes that begin producing outcomes inconsistent with the synthesis of their semantic neighborhood are flagged (their delta-v — the velocity of change between their contributions and the network consensus — exceeds threshold). Inconsistent nodes are downweighted. Consistently inconsistent nodes are muted. This is not a governance mechanism requiring a committee. It is a mathematical property of the aggregate.

The Regulatory Landscape: What QIS Changes

HIPAA's Privacy Rule governs protected health information — individually identifiable health information created, received, maintained, or transmitted by a covered entity. The 512-byte outcome packet is not PHI under this definition because:

It does not identify individuals
It was derived from, but does not contain, individual health information
It is structurally equivalent to the summary statistics in a published clinical study

Published peer-reviewed papers are not HIPAA violations. A QIS outcome packet is the equivalent of a continuously-updated, machine-readable, route-able abstract. IRBs routinely approve publishing summary statistics from patient cohorts. The outcome packet is that summary, in 512 bytes, deposited to a deterministic address.

This does not mean QIS is compliance-free. Institutions implementing QIS networks should work with their legal teams. But the compliance conversation is fundamentally different: you are not debating how to de-identify data before sending it. You are explaining that what your node transmits is a research summary, not data.

GDPR Article 4(1) defines personal data as "any information relating to an identified or identifiable natural person." A 512-byte outcome packet with no individual identifiers, no direct measurements, and no traceable node ID does not meet this definition. The analysis is institution-specific — but the structural architecture is designed from the ground up to not transmit personal data.

What This Enables That Federated Learning Cannot

There are clinical populations for which federated learning is architecturally impossible. OHDSI's network, for example, includes sites with as few as 1–2 matching patients for rare disease phenotypes. Standard FL requires N ≥ some minimum cohort threshold to produce meaningful gradient updates — and to satisfy differential privacy requirements (which add noise calibrated to dataset sensitivity, making small-cohort contributions statistically meaningless).

QIS has no such constraint. An N=1 site — a single institution with a single patient matching a rare genomic variant — can deposit a 512-byte outcome packet. That packet describes one data point. When a node querying a semantic address receives this packet alongside contributions from 47 other small sites, it synthesizes across all of them locally. The rare disease network works at N=1 per site.

This architectural property has humanitarian implications that federated learning — for all its value — cannot reach. Rare disease research, smallholder agricultural networks, rare language educational systems, disaster response in low-infrastructure settings — every context where sites are too small to participate in model-based aggregation is a context where QIS outcome routing still functions.

The OHDSI Relevance

OHDSI operates a federated network of observational health databases standardized to the OMOP Common Data Model. Every distributed query in the OHDSI network today produces summary statistics at each node — incidence rates, cohort characterizations, effect estimates — which are then aggregated at the coordinating center.

That process is functionally equivalent to what QIS does, with three differences:

OHDSI queries are batch, not continuous. QIS routes in real time.
OHDSI aggregation is centralized. QIS synthesis is local.
OHDSI produces network-level estimates. QIS produces node-level synthesis from peer insights.

QIS is not a replacement for OHDSI. It is a continuous intelligence layer that can run alongside an OMOP-standardized node. The node continues to answer OHDSI queries. Separately, its edge intelligence layer distills local learnings into outcome packets and routes them to semantic addresses. The two layers are orthogonal.

The privacy argument is the same: OHDSI nodes already produce summary statistics. QIS outcome packets are that summary, made continuous, routable, and self-updating without centralizing the data.

Conclusion: The Right Question

The security hedge — "how do you prevent bad actors from poisoning the data without a central authority?" — is a reasonable question to ask about a data network. It is the wrong question to ask about QIS, because QIS is not a data network. It is an outcome routing network.

The distinction matters technically: outcome packets cannot be inverted back to patient records because they were never derived by a process that is mathematically reversible in that direction. The distillation is destructive (of raw data) by construction.

The distinction matters clinically: HIPAA compliance for QIS outcome packets is categorically simpler than HIPAA compliance for gradient-sharing federated learning — because the packet structurally cannot contain what the regulation protects.

The distinction matters strategically: an architecture where privacy is not a feature to configure, but a property of how the network functions at the packet level, is an architecture that can reach populations and institutions that data-sharing approaches cannot reach by definition.

QIS was discovered by Christopher Thomas Trevethan on June 16, 2025. The 39 provisional patents cover the complete architecture — the loop that makes quadratic intelligence scaling possible without proportional compute cost, and without centralizing the data that generates the intelligence. The privacy-by-architecture property is not an add-on. It is a structural consequence of how the loop works: raw signal stays at the edge, outcomes route across the network, synthesis happens locally.

The data never moves. Only what you learned does.

QIS — Quadratic Intelligence Swarm — was discovered by Christopher Thomas Trevethan. 39 provisional patents filed. Learn more at qisprotocol.com.

DEV Community