Rory | QIS PROTOCOL

Posted on Apr 10

The Mathematical Alternative to Federated Learning for Rare Disease Signal Amplification

#healthdata #raredisease #federatedlearning #distributedsystems

For rare disease researchers, distributed health data architects, and clinical informatics teams working with small-cohort, multi-site disease registries.

The Structural Exclusion

Federated learning has become the default architecture for learning across distributed health data without centralizing it. The approach is sound for common diseases: train a model locally at each site, share gradient updates (not raw data), aggregate gradients centrally, distribute the improved global model, repeat.

The mechanism fails for rare diseases. Not because the math is wrong — the math is correct. Because the math has a precondition that rare disease data cannot satisfy.

Federated learning requires each participating site to compute stable gradient updates from its local data. Stable gradients require sufficient local sample size. The exact threshold depends on model complexity, but practical implementations rarely produce usable gradients below 50-100 local samples.

Now consider the rare disease landscape:

Disease	Global Prevalence	Typical Site Cohort
Pancreatic neuroendocrine tumors	~1 per 100,000	3-15 patients
Erdheim-Chester disease	~600 known cases worldwide	1-3 patients per center
Castleman disease (UCD)	~5,000 US cases	2-10 patients per site
Fibrodysplasia ossificans progressiva	~900 known cases globally	1-2 patients per center
Niemann-Pick type C	~1 per 120,000	1-5 patients per site

A hospital managing 3 Erdheim-Chester patients cannot compute a stable gradient. A registry with 5 Niemann-Pick type C patients does not produce meaningful gradient updates. Under federated learning, these sites are excluded from the learning network — not by policy, but by architecture.

The sites with the rarest patients — the sites whose data is most informationally valuable precisely because no one else has it — are the sites federated learning cannot include.

Why This Is an Architecture Problem, Not a Data Problem

The standard response is "collect more data." Pool the rare disease registries. Establish centralized repositories. Run larger studies.

This response misunderstands the constraint. The data exists. Across the global network of rare disease centers, there are enough patients with Erdheim-Chester, enough patients with Niemann-Pick C, enough patients with FOP to generate meaningful treatment intelligence. The patients are just distributed across dozens of sites, each with single-digit cohorts.

The problem is not insufficient data. The problem is that the learning architecture requires each individual site to have sufficient data — and for rare diseases, no individual site does.

What would solve this: an architecture where every site contributes its validated treatment outcome, regardless of cohort size. Where a center managing 2 patients contributes equally to the network intelligence alongside a center managing 200. Where the synthesis happens without any site exposing raw patient data, and without a central aggregator that all sites must trust.

This architecture exists.

QIS Outcome Routing for Rare Disease Networks

Christopher Thomas Trevethan discovered the Quadratic Intelligence Swarm (QIS) protocol on June 16, 2025. The protocol routes validated treatment outcomes between distributed sites using semantic addressing — and it imposes no minimum cohort size.

Here is how it applies to rare disease research:

The Packet

After a treatment episode completes at any site — regardless of whether the site has 2 patients or 200 — the validated outcome is distilled into a compact packet (~512 bytes). The packet contains:

Condition identifier: Orphanet/OMIM code for the rare disease
Treatment identifier: WHO Drug Dictionary or RxNorm code
Outcome delta: The measured change (biomarker improvement, functional score change, survival interval)
Confidence indicator: Cohort size and observation period (not a confidence interval — with N=2, a classical CI is meaningless, but the observation is still valid)
Population descriptor: Age range, genetic variant if applicable, geographic tag (coarse)

No patient identifiers. No raw clinical records. No genotype sequences. The packet is a distilled summary of what was learned from a validated treatment episode.

The Address

The packet is fingerprinted using its clinical content: disease code + drug code + outcome type + population tag → deterministic semantic address. Every site working on the same disease with the same treatment produces packets at the same address.

A center in Boston managing 3 Erdheim-Chester patients treated with vemurafenib produces a packet at the same semantic address as a center in Paris managing 2 patients with the same treatment. Neither site knows the other exists. The address is determined by the clinical question, not by the site identity.

The Routing

Packets are deposited at their semantic address. Any site managing patients with the same rare disease can query that address and pull back every validated outcome from every site worldwide that has treated the same condition.

The routing cost is O(log N) or better — where N is the number of participating sites — depending on the transport mechanism. For rare disease networks, the transport can be as simple as a shared database indexed by semantic address. The routing cost is O(1) in that configuration.

The Synthesis

Each site synthesizes incoming packets locally. With rare disease data, the synthesis model differs from large-cohort aggregation:

No weighted-mean averaging: With N=2 and N=3 cohorts, a weighted mean is unstable. Instead, the synthesis is a case series compilation — each site's outcome is presented as an independent observation.
Signal accumulation: 3 sites with 2 patients each contribute 6 independent treatment observations. This is not a randomized controlled trial. It is a distributed case series that no single institution could produce alone.
Outcome trajectory matching: Sites with similar patient profiles (same genetic variant, same disease stage) produce packets at the same address. The synthesis groups outcomes by clinical similarity, not by institutional origin.

The Loop

The synthesis at each site becomes a new data point that can be distilled and deposited. A site that synthesizes outcomes from 5 peer sites now has a richer model of treatment response — and that enriched model informs the next treatment decision at that site, which generates a new validated outcome, which is distilled and deposited. Intelligence compounds.

With N sites in the network, the number of active synthesis paths is N(N-1)/2. For a rare disease network with 30 participating centers:

Synthesis paths = 30 × 29 / 2 = 435

435 pairwise learning opportunities — from a disease where no single center has more than 15 patients. This is the quadratic scaling property: I(N) = Θ(N²). The intelligence in the network grows as the square of the number of participating sites, not linearly.

Federated Learning vs. QIS for Rare Disease: The Structural Comparison

Dimension	Federated Learning	QIS Outcome Routing
Minimum cohort per site	~50-100 for stable gradients	No minimum — any validated outcome routes
N=2 site participation	Excluded (gradient noise > signal)	Included — 2 observations are 2 valid packets
Central aggregator	Required (gradient server)	None — peer-to-peer synthesis
What flows between sites	Model gradient updates (~MB)	Outcome packets (~512 bytes)
Synthesis model	Global model convergence	Distributed case series compilation
Privacy model	Gradient privacy (vulnerable to inversion attacks)	No raw data in packets by architecture
Rare variant subgroups	Further fragmented by subgroup	Semantic addressing routes by variant
Compute per site	Full model training	Packet distillation only
Network intelligence scaling	Linear (each site contributes gradients once per round)	Quadratic: N(N-1)/2 active synthesis paths

The fundamental difference: federated learning asks each site to be a training node. QIS asks each site to be a reporting node. Training requires sufficient local data. Reporting requires only a validated outcome.

The Rare Disease Use Cases

Pharmacogenomic Signal Routing

A rare disease patient with a specific genetic variant responds unusually well to an off-label treatment. At a single center, this is an anecdote. Across a QIS network of 30 rare disease centers, the semantic address for that variant + drug + outcome collects similar observations from sites worldwide. When 4 sites report similar responses across 8 patients with the same variant, the anecdote becomes a signal.

No central database collected the cases. No researcher designed a study. The protocol routed the observations to each other based on clinical similarity.

Natural History Compilation

For ultra-rare diseases (prevalence < 1 per million), natural history data is the scarcest and most valuable resource. Each site's observation of disease progression — untreated trajectory, milestone timing, organ involvement sequence — is a data point that every other site needs.

QIS routes natural history observations continuously. A site in Tokyo deposits a disease progression packet for a 12-year-old FOP patient. A site in Philadelphia managing the only other pediatric FOP patient in their catchment area queries the semantic address and receives progression data from every site worldwide managing similar patients. The compilation builds itself through the protocol — no registry coordination required.

Treatment Response Across Genetic Subgroups

Many rare diseases have multiple genetic subtypes with different treatment responses. A drug that works for ATTR-CM patients with the V122I variant may fail for patients with the T60A variant. Federated learning treats these as the same disease — the model averages across variants. QIS semantic addressing includes the genetic variant in the fingerprint, routing outcomes to the correct subgroup automatically.

What This Means for Rare Disease Infrastructure

If you are building rare disease data infrastructure — a patient registry, a multi-center research network, a natural history study, a treatment outcome database — the architectural question is:

Does your system include or exclude sites with small cohorts?

If it excludes them (because the learning mechanism requires large local samples), you lose the most informationally rare observations in your network. If it includes them (because the mechanism routes validated outcomes regardless of cohort size), you activate the pairwise synthesis paths between every site.

QIS outcome routing is the architecture that includes every site. The protocol imposes no minimum cohort size. The routing cost is O(log N) or better. The intelligence scales as N(N-1)/2. The data exposure is zero — no raw patient data leaves any site.

For rare disease research, where the most valuable data sits at the smallest sites, this is the difference between a network that learns and a network that merely stores.

The Discovery

Christopher Thomas Trevethan discovered the Quadratic Intelligence Swarm protocol on June 16, 2025. The breakthrough is the complete architecture — the loop that enables real-time quadratic intelligence scaling without compute explosion, not any single component. 39 provisional patents filed. Humanitarian licensing ensures the protocol is free forever for nonprofits, research institutions, and educational use.

For rare disease researchers: the QIS protocol specification, federated learning comparison, and the 20 most common technical questions are published.

This is part of an ongoing series on QIS — the Quadratic Intelligence Swarm protocol — documenting every domain where distributed outcome routing closes a synthesis gap that existing infrastructure cannot close.

DEV Community