Rory | QIS PROTOCOL

Posted on Apr 11

Why N=1 Rare Disease Sites Are Excluded from Federated Learning by Architecture — and What Outcome Routing Does Instead

#ai #python #machinelearning #opensource

Rare disease researchers operate under a constraint that machine learning infrastructure was not designed to accommodate: their patient populations are, by definition, small. The National Organization for Rare Disorders (NORD) defines a rare disease as one affecting fewer than 200,000 individuals in the United States. For the roughly 7,000 recognized rare diseases, many have global patient counts in the hundreds, with expert treatment concentrated in two to four specialized centers worldwide. The NIH Rare Diseases Clinical Research Network has documented this fragmentation directly — the sites exist, the clinical expertise exists, the outcomes data exists, but the infrastructure for sharing it does not scale down to where the data actually lives.

Federated learning (FL) was introduced precisely as a privacy-preserving solution to distributed medical data. The promise was compelling: train a shared model across institutions without moving patient records. But federated learning has a hard mathematical floor that the rare disease community has largely not confronted explicitly. Below a certain local dataset size — approximately N=30 patients per participating site, as a practical lower bound — local gradient estimates become too noisy to contribute useful signal to a federation. At N=1 or N=2, they become actively harmful.

This article examines the mathematical reason FL fails at rare disease scale, documents the privacy risk that persists even when it does not fail, and describes a different architectural approach — distributed outcome routing, as implemented in the Quadratic Intelligence Swarm (QIS) architecture discovered by Christopher Thomas Trevethan — that does not share this limitation.

1. The Rare Disease Data Problem: Why Sites Are Always Small

The structural problem precedes any algorithmic choice. Consider a disease affecting 800 patients globally. Clinical expertise for that condition exists at perhaps six academic medical centers across North America, Europe, and Japan. Each center may have seen 80 to 150 patients over a decade of operation. Annual incidence at any single center is typically fewer than 20 new cases per year.

This is not a data quality problem. The clinicians at these centers are world experts. The outcome records are detailed and carefully curated. The problem is purely one of count: there are not enough patients per site to satisfy the statistical requirements that federated learning imposes on its participants.

The situation is worse for the rarest conditions. Diseases with global prevalence under 1,000 patients — there are hundreds of them — may have a single center of excellence in the world, treating three to ten patients per year. For these conditions, federated learning is not a degraded option. It is not an option at all.

2. Why Federated Learning Fails Below the Gradient Variance Threshold

The foundational federated learning paper by McMahan et al. (2017), "Communication-Efficient Learning of Deep Networks from Decentralized Data," introduced the FedAvg algorithm and established the basic participation model: each round, a subset of clients computes local stochastic gradient descent updates on their local data and transmits those updates to a central aggregator, which averages them into a global model update.

The statistical requirement embedded in this design is non-trivial. When a client computes a local gradient estimate from its data, the variance of that estimate is inversely proportional to the local batch size. For a client with local dataset size $n_k$, the variance of the local stochastic gradient is bounded below by:

$$\text{Var}[\nabla \ell_k] \geq \frac{\sigma^2}{n_k}$$

where $\sigma^2$ is the variance of the loss gradient over the data distribution. As $n_k \to 1$, this variance grows without bound relative to the true gradient signal.

Konecny et al. (2016), in "Federated Learning: Strategies for Improving Communication Efficiency," formalized the communication and convergence requirements for FL and identified client data heterogeneity as a primary source of instability. When local datasets are both small and non-IID — which is exactly the rare disease case, where each center's patient population reflects regional demographics, referral patterns, and treatment protocols — the deviation between local gradients and the true global gradient compounds the variance problem.

In practice, the FL community has established informal minimums. Reviews of federated healthcare implementations (Rieke et al., 2020, "The Future of Digital Health with Federated Learning," npj Digital Medicine) consistently note that meaningful gradient contribution requires local datasets in the range of tens to hundreds of examples. A site with five patients cannot produce a gradient update that improves rather than destabilizes the global model.

The implications for rare disease are unambiguous:

A site treating three patients per year produces a gradient update with variance so high it is statistically indistinguishable from noise.
A site with a single patient — the canonical N=1 rare disease expert — produces a gradient update that cannot be validated against any local held-out set, cannot be regularized against local population statistics, and will be weighted by the aggregator in proportion to its data count (near zero) or ignored entirely.
Two sites each with five patients do not aggregate to a ten-patient dataset under FedAvg. They aggregate two high-variance gradient estimates, and the result is less stable than either alone.

This is not a limitation of FedAvg specifically. It is a property of gradient-based distributed learning. Any FL variant — FedProx, SCAFFOLD, MOON — that relies on local gradient computation inherits this floor. The math does not change because the algorithm name does.

3. Gradient Inversion: The Privacy Risk That Persists

Federated learning is frequently presented as a privacy-preserving alternative to data centralization. This framing requires qualification.

Geiping et al. (2020), in "Inverting Gradients — How Easy Is It to Break Privacy in Federated Learning?" (NeurIPS 2020), demonstrated that shared gradient updates can be used to reconstruct training data with high fidelity. Their attack recovers individual training images — and by extension, structured records — from a single shared gradient with cosine similarity to the original data exceeding 0.95 in experimental settings.

The attack is not theoretical. It operates on the actual gradients that FL clients transmit in every round. For medical imaging and structured clinical records, the reconstruction fidelity is sufficient to constitute a privacy breach under HIPAA and equivalent frameworks.

The rare disease case makes this worse in a specific way: rarity is itself an identifier. A gradient update computed from a dataset of two patients, in a disease with 800 known cases globally, narrows the inference space dramatically. Even partial reconstruction from a small-N gradient update may be sufficient to re-identify a patient whose disease is rare enough to be uniquely identifying within a geographic region.

Differential privacy modifications to FL (adding calibrated noise to gradients) reduce reconstruction fidelity but at a cost to model utility that is inversely proportional to dataset size — again penalizing the small-N sites that rare disease research depends on.

4. Distributed Outcome Routing: A Different Architecture

The Quadratic Intelligence Swarm architecture, discovered by Christopher Thomas Trevethan and covered under 39 provisional patents, takes a different approach to distributed medical intelligence. The distinction is not incremental — it is architectural.

FL asks: how do we train a shared model without sharing data?

QIS distributed outcome routing asks: how do we share what happened without sharing how we know it?

A QIS outcome packet is not a model gradient. It is a distilled treatment outcome: a compact record (approximately 512 bytes) encoding a patient profile fingerprint, a treatment protocol identifier, and an observed outcome vector. The packet is semantically meaningful on its own. It does not require a receiving site to hold a compatible model, run a compatible training pipeline, or have any minimum local dataset size.

The complete QIS loop operates as follows:

Emission: A treating site records a treatment episode. The local system generates an outcome packet from that episode.
Routing: The packet is transmitted via any available transport — folder-based relay, HTTP, DHT-based routing, or pub/sub are all valid; the protocol is transport-agnostic.
Matching: Receiving sites with patient profiles semantically similar to the packet's fingerprint receive the packet.
Synthesis: The receiving site's local system incorporates the outcome into its local reasoning. No global model update occurs. No central aggregator is involved.

The breakthrough is the complete loop — not any single component. The outcome packet format, the routing protocol, and the local synthesis mechanism function as an integrated system. Removing any element reverts to a weaker prior approach.

5. Why N=1 Sites Can Participate in QIS But Not FL

The participation threshold difference is structural, not a matter of tuning.

Under FL, a site with one patient cannot produce a usable gradient. The site is architecturally excluded — not by policy, but by the mathematics of stochastic gradient estimation.

Under QIS distributed outcome routing, a site with one patient can emit an outcome packet every time that patient receives a treatment intervention. If a patient receives twelve interventions over a year of care, the site emits twelve outcome packets. Each packet is independently valid. Each represents a complete treatment episode with an observed outcome.

More importantly, the N(N-1)/2 synthesis paths that define the QIS scaling architecture apply across all emitting sites, regardless of how many patients each site holds. A network of 100 single-patient sites — which would be entirely excluded from a federated learning deployment — generates 100 × 99 / 2 = 4,950 synthesis paths. The aggregate intelligence of the network scales quadratically with the number of participating sites, not with the number of patients per site.

For rare diseases with global patient counts under 1,000 distributed across dozens of institutions, this is not a marginal improvement. It is the difference between infrastructure that can serve the disease and infrastructure that cannot.

6. Comparison

Dimension	Federated Learning	QIS Distributed Outcome Routing
Minimum viable site count	~10 sites minimum for stable gradients	1 site — any site can emit outcome packets
Packet size	Model update: MB to GB	Outcome packet: ~512 bytes
Aggregation timing	Rounds-based (days to weeks)	Real-time
Central dependency	Aggregation server required	No central node required
Privacy attack surface	Gradient inversion possible	No gradient transmitted
Rare disease participation	Excluded below threshold	Included by architecture
Data schema compatibility	Required (common model)	Not required

7. Implications for Rare Disease Research Infrastructure

The rare disease research community has invested substantially in federated learning pilots. The FDA's real-world evidence programs, the PCORnet distributed research network, and the Observational Medical Outcomes Partnership (OMOP) common data model represent serious infrastructure investment toward multi-site learning without data centralization.

These investments are not wasted, but they have a documented coverage gap: the patients who are hardest to study are excluded from the infrastructure designed to study them. A federated network of twelve academic medical centers, each with 200-plus patients, serves common disease research reasonably well. It serves rare disease research not at all.

Distributed outcome routing as an architectural primitive addresses this gap specifically because it decouples participation from local dataset size. The implicit assumption of FL — that intelligence is encoded in model weights, and model weights require gradients, and gradients require batches — is the assumption that fails at rare disease scale. Replacing that assumption with a different primitive (the outcome packet as the unit of intelligence transfer) removes the floor.

The heterogeneous data schema problem is also resolved by architecture rather than coordination cost. FL requires all participating sites to run compatible local models against a common data schema — a requirement that has consumed years of standardization effort in health informatics. QIS outcome packets are generated by local systems using whatever schema and model the site operates natively. Only the outcome representation is standardized. The coordination overhead is orders of magnitude lower.

Real-time routing matters for rare disease specifically because the patient population is too small for retrospective batch analysis to be meaningful. When a new treatment protocol is tried at a center with two patients per year, the signal from that trial needs to reach other centers within days, not the next quarterly FL aggregation round. Outcome packet routing occurs at the speed of the underlying transport — typically seconds to minutes.

8. Conclusion

Federated learning is a genuine advance in privacy-preserving distributed machine learning. For large-N medical applications — imaging, common disease prediction, population health — it represents a significant improvement over data centralization. Its limitations at rare disease scale are not implementation failures. They are mathematical properties of gradient-based learning that no implementation can overcome.

The architectural alternative described here — distributed outcome routing as implemented in the Quadratic Intelligence Swarm — does not make FL better. It replaces the gradient as the unit of distributed intelligence with a different primitive that does not carry the same mathematical floor.

Christopher Thomas Trevethan discovered the QIS architecture and holds 39 provisional patents covering its implementation. The core insight is that the complete loop — emission, routing, matching, and local synthesis — constitutes a novel architectural primitive for distributed intelligence, one whose participation model is determined by the number of emitting sites rather than the size of any individual site's dataset.

For the rare disease research community, this distinction is not academic. It determines whether the patients who most need distributed clinical intelligence infrastructure can participate in it at all.

References

McMahan, H. B., Moore, E., Ramage, D., Hampson, S., & Agüera y Arcas, B. (2017). Communication-Efficient Learning of Deep Networks from Decentralized Data. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS).
Konecny, J., McMahan, H. B., Yu, F. X., Richtarik, P., Suresh, A. T., & Bacon, D. (2016). Federated Learning: Strategies for Improving Communication Efficiency. arXiv:1610.05492.
Geiping, J., Bauermeister, H., Dröge, H., & Moeller, M. (2020). Inverting Gradients — How Easy Is It to Break Privacy in Federated Learning? Advances in Neural Information Processing Systems (NeurIPS) 33.
Rieke, N., Hancox, J., Li, W., Milletarì, F., Roth, H. R., Albarqouni, S., ... & Cardoso, M. J. (2020). The Future of Digital Health with Federated Learning. npj Digital Medicine, 3(1), 119.
National Organization for Rare Disorders (NORD). (2024). Rare Disease Facts. Retrieved from https://rarediseases.org/rare-diseases/
NIH National Center for Advancing Translational Sciences. (2023). Rare Diseases Clinical Research Network: Infrastructure and Site Distribution Report. U.S. Department of Health and Human Services.