For rare disease researchers, distributed health data architects, and clinical informatics teams working with small-cohort, multi-site disease registries.
The Structural Exclusion
Federated learning has become the default architecture for learning across distributed health data without centralizing it. The approach is sound for common diseases: train a model locally at each site, share gradient updates (not raw data), aggregate gradients centrally, distribute the improved global model, repeat.
The mechanism fails for rare diseases. Not because the math is wrong — the math is correct. Because the math has a precondition that rare disease data cannot satisfy.
Federated learning requires each participating site to compute stable gradient updates from its local data. Stable gradients require sufficient local sample size. The exact threshold depends on model complexity, but practical implementations rarely produce usable gradients below 50-100 local samples.
Now consider the rare disease landscape:
| Disease | Global Prevalence | Typical Site Cohort |
|---|---|---|
| Pancreatic neuroendocrine tumors | ~1 per 100,000 | 3-15 patients |
| Erdheim-Chester disease | ~600 known cases worldwide | 1-3 patients per center |
| Castleman disease (UCD) | ~5,000 US cases | 2-10 patients per site |
| Fibrodysplasia ossificans progressiva | ~900 known cases globally | 1-2 patients per center |
| Niemann-Pick type C | ~1 per 120,000 | 1-5 patients per site |
A hospital managing 3 Erdheim-Chester patients cannot compute a stable gradient. A registry with 5 Niemann-Pick type C patients does not produce meaningful gradient updates. Under federated learning, these sites are excluded from the learning network — not by policy, but by architecture.
The sites with the rarest patients — the sites whose data is most informationally valuable precisely because no one else has it — are the sites federated learning cannot include.
Why This Is an Architecture Problem, Not a Data Problem
The standard response is "collect more data." Pool the rare disease registries. Establish centralized repositories. Run larger studies.
This response misunderstands the constraint. The data exists. Across the global network of rare disease centers, there are enough patients with Erdheim-Chester, enough patients with Niemann-Pick C, enough patients with FOP to generate meaningful treatment intelligence. The patients are just distributed across dozens of sites, each with single-digit cohorts.
The problem is not insufficient data. The problem is that the learning architecture requires each individual site to have sufficient data — and for rare diseases, no individual site does.
What would solve this: an architecture where every site contributes its validated treatment outcome, regardless of cohort size. Where a center managing 2 patients contributes equally to the network intelligence alongside a center managing 200. Where the synthesis happens without any site exposing raw patient data, and without a central aggregator that all sites must trust.
This architecture exists.
QIS Outcome Routing for Rare Disease Networks
Christopher Thomas Trevethan discovered the Quadratic Intelligence Swarm (QIS) protocol on June 16, 2025. The protocol routes validated treatment outcomes between distributed sites using semantic addressing — and it imposes no minimum cohort size.
Here is how it applies to rare disease research:
The Packet
After a treatment episode completes at any site — regardless of whether the site has 2 patients or 200 — the validated outcome is distilled into a compact packet (~512 bytes). The packet contains:
- Condition identifier: Orphanet/OMIM code for the rare disease
- Treatment identifier: WHO Drug Dictionary or RxNorm code
- Outcome delta: The measured change (biomarker improvement, functional score change, survival interval)
- Confidence indicator: Cohort size and observation period (not a confidence interval — with N=2, a classical CI is meaningless, but the observation is still valid)
- Population descriptor: Age range, genetic variant if applicable, geographic tag (coarse)
No patient identifiers. No raw clinical records. No genotype sequences. The packet is a distilled summary of what was learned from a validated treatment episode.
The Address
The packet is fingerprinted using its clinical content: disease code + drug code + outcome type + population tag → deterministic semantic address. Every site working on the same disease with the same treatment produces packets at the same address.
A center in Boston managing 3 Erdheim-Chester patients treated with vemurafenib produces a packet at the same semantic address as a center in Paris managing 2 patients with the same treatment. Neither site knows the other exists. The address is determined by the clinical question, not by the site identity.
The Routing
Packets are deposited at their semantic address. Any site managing patients with the same rare disease can query that address and pull back every validated outcome from every site worldwide that has treated the same condition.
The routing cost is O(log N) or better — where N is the number of participating sites — depending on the transport mechanism. For rare disease networks, the transport can be as simple as a shared database indexed by semantic address. The routing cost is O(1) in that configuration.
The Synthesis
Each site synthesizes incoming packets locally. With rare disease data, the synthesis model differs from large-cohort aggregation:
- No weighted-mean averaging: With N=2 and N=3 cohorts, a weighted mean is unstable. Instead, the synthesis is a case series compilation — each site's outcome is presented as an independent observation.
- Signal accumulation: 3 sites with 2 patients each contribute 6 independent treatment observations. This is not a randomized controlled trial. It is a distributed case series that no single institution could produce alone.
- Outcome trajectory matching: Sites with similar patient profiles (same genetic variant, same disease stage) produce packets at the same address. The synthesis groups outcomes by clinical similarity, not by institutional origin.
The Loop
The synthesis at each site becomes a new data point that can be distilled and deposited. A site that synthesizes outcomes from 5 peer sites now has a richer model of treatment response — and that enriched model informs the next treatment decision at that site, which generates a new validated outcome, which is distilled and deposited. Intelligence compounds.
With N sites in the network, the number of active synthesis paths is N(N-1)/2. For a rare disease network with 30 participating centers:
Synthesis paths = 30 × 29 / 2 = 435
435 pairwise learning opportunities — from a disease where no single center has more than 15 patients. This is the quadratic scaling property: I(N) = Θ(N²). The intelligence in the network grows as the square of the number of participating sites, not linearly.
Federated Learning vs. QIS for Rare Disease: The Structural Comparison
| Dimension | Federated Learning | QIS Outcome Routing |
|---|---|---|
| Minimum cohort per site | ~50-100 for stable gradients | No minimum — any validated outcome routes |
| N=2 site participation | Excluded (gradient noise > signal) | Included — 2 observations are 2 valid packets |
| Central aggregator | Required (gradient server) | None — peer-to-peer synthesis |
| What flows between sites | Model gradient updates (~MB) | Outcome packets (~512 bytes) |
| Synthesis model | Global model convergence | Distributed case series compilation |
| Privacy model | Gradient privacy (vulnerable to inversion attacks) | No raw data in packets by architecture |
| Rare variant subgroups | Further fragmented by subgroup | Semantic addressing routes by variant |
| Compute per site | Full model training | Packet distillation only |
| Network intelligence scaling | Linear (each site contributes gradients once per round) | Quadratic: N(N-1)/2 active synthesis paths |
The fundamental difference: federated learning asks each site to be a training node. QIS asks each site to be a reporting node. Training requires sufficient local data. Reporting requires only a validated outcome.
The Rare Disease Use Cases
Pharmacogenomic Signal Routing
A rare disease patient with a specific genetic variant responds unusually well to an off-label treatment. At a single center, this is an anecdote. Across a QIS network of 30 rare disease centers, the semantic address for that variant + drug + outcome collects similar observations from sites worldwide. When 4 sites report similar responses across 8 patients with the same variant, the anecdote becomes a signal.
No central database collected the cases. No researcher designed a study. The protocol routed the observations to each other based on clinical similarity.
Natural History Compilation
For ultra-rare diseases (prevalence < 1 per million), natural history data is the scarcest and most valuable resource. Each site's observation of disease progression — untreated trajectory, milestone timing, organ involvement sequence — is a data point that every other site needs.
QIS routes natural history observations continuously. A site in Tokyo deposits a disease progression packet for a 12-year-old FOP patient. A site in Philadelphia managing the only other pediatric FOP patient in their catchment area queries the semantic address and receives progression data from every site worldwide managing similar patients. The compilation builds itself through the protocol — no registry coordination required.
Treatment Response Across Genetic Subgroups
Many rare diseases have multiple genetic subtypes with different treatment responses. A drug that works for ATTR-CM patients with the V122I variant may fail for patients with the T60A variant. Federated learning treats these as the same disease — the model averages across variants. QIS semantic addressing includes the genetic variant in the fingerprint, routing outcomes to the correct subgroup automatically.
What This Means for Rare Disease Infrastructure
If you are building rare disease data infrastructure — a patient registry, a multi-center research network, a natural history study, a treatment outcome database — the architectural question is:
Does your system include or exclude sites with small cohorts?
If it excludes them (because the learning mechanism requires large local samples), you lose the most informationally rare observations in your network. If it includes them (because the mechanism routes validated outcomes regardless of cohort size), you activate the pairwise synthesis paths between every site.
QIS outcome routing is the architecture that includes every site. The protocol imposes no minimum cohort size. The routing cost is O(log N) or better. The intelligence scales as N(N-1)/2. The data exposure is zero — no raw patient data leaves any site.
For rare disease research, where the most valuable data sits at the smallest sites, this is the difference between a network that learns and a network that merely stores.
The Discovery
Christopher Thomas Trevethan discovered the Quadratic Intelligence Swarm protocol on June 16, 2025. The breakthrough is the complete architecture — the loop that enables real-time quadratic intelligence scaling without compute explosion, not any single component. 39 provisional patents filed. Humanitarian licensing ensures the protocol is free forever for nonprofits, research institutions, and educational use.
For rare disease researchers: the QIS protocol specification, federated learning comparison, and the 20 most common technical questions are published.
This is part of an ongoing series on QIS — the Quadratic Intelligence Swarm protocol — documenting every domain where distributed outcome routing closes a synthesis gap that existing infrastructure cannot close.
Top comments (0)