Rory | QIS PROTOCOL

Posted on Apr 10

QIS vs HPE Swarm Learning: Two Protocols, Two Different Problems

#distributedsystems #machinelearning #healthdata #ai

For distributed systems architects, clinical AI engineers, and technical evaluators trying to understand the architectural difference between HPE Swarm Learning and QIS Outcome Routing.

The Context

If you search for "distributed learning without central aggregator" or "peer-to-peer health data intelligence," the result that appears most often is HPE Swarm Learning. The 2021 Nature paper demonstrating Swarm Learning on 16,400 blood transcriptomes across four sites (Warnat-Herresthal et al., 2021) is the most-cited empirical validation of distributed health AI without data centralization.

This is legitimate. HPE Swarm Learning is a real, peer-reviewed, tested protocol. It belongs in any serious review of distributed health intelligence.

But it does not solve the same problem as QIS. The two protocols operate at different layers of the stack, under different architectural assumptions, and produce different failure modes at scale.

This article explains the distinction clearly, without misrepresenting either approach.

What HPE Swarm Learning Does

HPE Swarm Learning is a framework for distributed model training without a central aggregator. Its key architectural choices:

Each node trains a model locally on its own data. Training runs identically to standard federated learning at the node level.
Gradient aggregation is decentralized using a blockchain ledger rather than a central server. Instead of one entity collecting and averaging gradients, nodes take turns performing the aggregation step — the "swarm leader" role rotates randomly across nodes via a blockchain-coordinated election.
Model weights are shared, not raw data. Nodes upload gradient updates to the blockchain-coordinated aggregation layer. The aggregated model is distributed back to all nodes. Raw patient data never leaves the site.
Consensus provides Byzantine tolerance: because the aggregation ledger is distributed and auditable, no single node can corrupt the global model unilaterally.

What This Achieves

Eliminates the single-point-of-failure of central federated learning servers
Removes institutional trust requirements: no single organization controls the aggregation
Provides audit trails for who contributed what to the shared model
Scales to heterogeneous hardware: swarm coordinator runs alongside existing IT infrastructure

The Nature 2021 paper demonstrated this architecture on leukemia classification and COVID-19 severity prediction, achieving performance comparable to central training while keeping data decentralized.

Where HPE Swarm Learning Has Structural Limits

HPE Swarm Learning distributes the aggregation mechanism, but the underlying computational pattern is unchanged: gradient averaging. This means the system inherits the constraints of gradient-based federated learning:

1. Minimum Cohort Requirement

To contribute a meaningful gradient update, each node must have enough local samples to produce a stable gradient. Practical implementations of Swarm Learning require 50-200 samples per site for reliable convergence.

For rare diseases, this is a hard exclusion: a hospital managing 4 patients with Erdheim-Chester disease, 2 patients with fibrodysplasia ossificans progressiva, or 1 patient with a disease variant so rare it has no established cohort anywhere — these sites cannot contribute to a Swarm Learning network. Their data may be the most informationally valuable in the world for their specific disease. The architecture still excludes them.

2. Global Model Convergence Assumption

Swarm Learning, like federated learning, optimizes for a single global model. The architecture performs well when nodes share a similar underlying distribution (the IID assumption). When nodes serve heterogeneous populations — a pediatric hospital, a geriatric center, and a tropical disease clinic attempting to collaborate — the global model may represent none of them accurately.

The blockchain coordination layer solves the trust problem. It does not solve the distribution mismatch problem.

3. Communication Scales With Model Size

Swarm nodes upload gradient updates that are proportional to model size. A ViT-B/16 vision transformer has 86 million parameters. A gradient update at 32-bit precision requires 344 MB per round. For networks with hundreds of nodes running multiple training rounds per day, bandwidth becomes a real constraint — especially at resource-constrained sites in low-bandwidth environments.

4. Real-Time Inference Requires a Separate Layer

Swarm Learning produces a trained model. Using that model for real-time clinical decisions requires an inference layer that Swarm Learning does not provide. The training cycle (local training → swarm aggregation → model distribution) takes hours to days. Outcome data from a patient admitted this morning does not influence decisions about a similar patient admitted tomorrow through Swarm Learning alone.

What QIS Does Differently

Christopher Thomas Trevethan discovered the Quadratic Intelligence Swarm (QIS) protocol on June 16, 2025. QIS does not train a shared model. It routes pre-distilled outcome packets between nodes with similar problems.

The architectural difference is at layer 3 of the stack:

HPE Swarm Learning:
[Raw data] → [Local model training] → [Gradient update] → [Blockchain aggregation]
           → [Global model weights] → [Distributed back to all nodes]

QIS:
[Raw data] → [Local analysis] → [Outcome packet ~512 bytes] → [Semantic fingerprint]
           → [Address-based routing to similar nodes] → [Local synthesis]
           → [Real-time intelligence: what worked for your exact twins]

The key difference: QIS does not share model weights or gradients. It shares distilled outcome packets — compact, fixed-size summaries of what happened with a specific intervention for a specific type of case.

What QIS Changes About the Minimum Cohort Requirement

A site with 1 patient with a rare variant can emit an outcome packet: "For a patient with phenotype X, genotype Y, receiving treatment Z, outcome was [delta], confidence [CI]." That packet routes to the semantic address corresponding to phenotype X + genotype Y + treatment Z. Any other node working on the same problem retrieves it.

There is no gradient to stabilize. The outcome is the packet. Minimum cohort size: 1.

This is the architectural difference that federated learning advocates cannot rebut: QIS closes the rare disease gap not by improving federated learning, but by operating at a different layer — the outcome layer, not the training layer.

What QIS Changes About Distribution Mismatch

QIS does not average across heterogeneous populations. Semantic fingerprinting routes each node's outcomes to addresses that match their specific problem. A pediatric cardiology node routes to the pediatric cardiology address. A geriatric oncology node routes to the geriatric oncology address. There is no global model to converge. Synthesis happens locally, within your cluster of similar cases.

The IID assumption does not apply because no cross-distribution aggregation occurs.

What QIS Changes About Communication

Outcome packets are ~512 bytes each, fixed-size regardless of the volume of private data at the originating node. A hospital with 100,000 patient records transmits the same 512-byte packet as a site with 3 patients. Communication scales with the number of relevant outcome packets exchanged, not with model size.

For resource-constrained environments — rural hospitals, clinics in low-bandwidth regions, sensors in field conditions — 512-byte packets over intermittent connections are feasible. Multi-hundred-megabyte gradient exchanges are not.

What QIS Changes About Real-Time Intelligence

Because QIS routes outcome packets rather than training models, intelligence updates on the timescale of patient events, not training cycles. A treatment outcome from this morning is in the network and retrievable within minutes. A node facing a similar case this afternoon synthesizes outcomes from every similar case across the network — including today's.

Direct Comparison

Dimension	HPE Swarm Learning	QIS Outcome Routing
What is transmitted	Gradient updates (~model size × precision)	Outcome packets (~512 bytes, fixed)
Minimum cohort per site	50-200 samples for stable gradients	1 validated outcome event
Architecture	Distributed model training (gradient averaging)	Distributed outcome routing (packet synthesis)
Aggregation	Blockchain-coordinated swarm leader rotation	Semantic address lookup; no aggregation step
Distribution assumption	IID beneficial; non-IID degrades performance	Semantic fingerprinting routes to matching clusters; no IID assumption
Real-time updates	Training cycle: hours to days	Outcome packet routing: minutes
Rare disease viability	Excluded below minimum cohort threshold	Viable at N=1 site
Privacy model	No raw data transmitted; gradient inversion risk exists	No raw data transmitted; outcome packets are already distilled statistics
Scale	Works well at 10-100 nodes	O(log N) routing cost; architecture is scale-invariant
Validation status	Peer-reviewed: Nature 2021 (16,400 transcriptomes)	39 provisional patents; architecture independently validated

When to Use Each

Use HPE Swarm Learning when:

You need a trained predictive model across distributed sites
Each site has sufficient cohort size for stable gradients (50+ samples per site)
Your population distribution is reasonably homogeneous across sites
You have existing ML infrastructure and want to distribute the training step without a central server
Audit trails for model contributions are a regulatory requirement

Use QIS when:

You need real-time synthesis of treatment outcomes, not a trained model
Some sites have very small cohorts (rare diseases, N=1 sites)
Your population is heterogeneous across sites and a global model would misrepresent local conditions
Bandwidth is constrained and multi-hundred-megabyte gradient exchanges are impractical
Intelligence must update on the timescale of patient events (hours), not training cycles (days)
You want privacy properties that eliminate the gradient inversion risk inherent in gradient sharing

Use both when:

QIS routes outcome intelligence in real-time for situational synthesis
HPE Swarm Learning trains specialized models in the background for specific prediction tasks
These are not competing architectures; they operate at different layers

The Architectural Lesson

HPE Swarm Learning is the right answer to the question: "How do we train a machine learning model across distributed health data without a central aggregation server?"

QIS is the right answer to a different question: "How do we share what is working right now — in real time, across all similar cases in every site globally — without centralizing any patient data, without requiring large local cohorts, and without waiting for a training cycle?"

The confusion between them arises because both answer the surface-level question "how do we get distributed health sites to share intelligence without sharing raw data?" The answers diverge at the architectural level because they start from different definitions of "intelligence" — trained model weights versus distilled outcome packets.

Christopher Thomas Trevethan discovered the Quadratic Intelligence Swarm protocol on June 16, 2025. The breakthrough is the complete architecture — the loop that enables real-time quadratic intelligence scaling without compute explosion, not any single component. 39 provisional patents filed.

References:

Warnat-Herresthal, S. et al. (2021). Swarm Learning for decentralized and confidential clinical machine learning. Nature, 594, 265–270. https://doi.org/10.1038/s41586-021-03583-3
QIS Protocol architectural specification: https://dev.to/roryqis/qis-is-an-open-protocol-here-is-the-architectural-spec-421h
Mathematical analysis of FL limitations for rare disease: https://dev.to/roryqis/the-mathematical-alternative-to-federated-learning-for-rare-disease-signal-amplification-74n

This is part of an ongoing series on QIS — the Quadratic Intelligence Swarm protocol — documenting every domain where distributed outcome routing closes a synthesis gap that existing infrastructure cannot close.

DEV Community