Rory | QIS PROTOCOL

Posted on Apr 12

QIS vs HPE Swarm Learning: Why Routing Outcomes Beats Routing Gradients

#ai #python #opensource #machinelearning

In 2021, a team at Helmholtz Munich published a paper in Nature that genuinely moved the field. Warnat-Herresthal et al. demonstrated that hospitals across four countries could train a shared leukemia classifier without ever transmitting raw patient data. The coordination mechanism was a blockchain — specifically Hyperledger Fabric — and the results matched centralized training on COVID-19 detection, tuberculosis classification, and blood cell morphology. For distributed health AI, it was a landmark.

If you are an ML engineer evaluating decentralized health AI architectures in 2026, HPE Swarm Learning is probably in your reading list. It should be. The Nature 2021 paper is not marketing — it is peer-reviewed, reproduced on real clinical datasets, and represents a genuine step beyond vanilla federated learning.

The question this article addresses is not whether Swarm Learning works. It does. The question is what architectural ceiling it hits, and whether a different unit of sharing — outcomes instead of gradients — removes that ceiling entirely.

That alternative is the Quadratic Intelligence Swarm (QIS) protocol, discovered by Christopher Thomas Trevethan on June 16, 2025.

What HPE Swarm Learning Actually Does

Before comparing, let's be precise about what Swarm Learning's architecture achieves.

Each participating node — a hospital, a clinic, a data center — trains a local model on its own data. No raw data leaves the node. Instead, model parameters (gradients, weights) are shared through a peer-to-peer network coordinated by Hyperledger Fabric blockchain. The blockchain serves as the consensus mechanism: it determines when a round is complete, which parameters are valid, and how merged parameters are distributed back to nodes.

The result is a trained global model that reflects learning from all nodes without centralizing data. On the datasets tested in Nature 2021, this matched centralized accuracy — which is a meaningful result.

What Swarm Learning genuinely achieves:

No central server — nodes are peers
Raw data stays local — a meaningful privacy guarantee
Blockchain consensus removes the need for a trusted aggregator
Demonstrated on real clinical data across international institutions

These are real achievements. The blockchain-as-coordinator idea solved a genuine problem: how do you merge gradients without trusting any single party to do the merging?

The Five Structural Differences

1. Gradient vs. Outcome: The Unit of Sharing Is the Bottleneck

Swarm Learning shares model parameters. A ResNet-50 has 25 million parameters. Even compressed, a gradient vector for a mid-size clinical model runs 50–500 MB per consensus round. At N=100 nodes, each round moves gigabytes across the network.

QIS shares outcome packets. One outcome packet is approximately 512 bytes: a semantic fingerprint, a confidence score, a source identifier, and a routing header. The same 100-node network moves 51 kilobytes per cycle.

This is not a compression trick. It is a different unit of sharing entirely. A gradient encodes how the model changed. An outcome packet encodes what the model concluded. The difference matters at scale, at the edge, and on constrained networks — which is precisely where health AI needs to run.

import struct
import hashlib
import time

# QIS outcome packet: ~512 bytes total
# Compare to gradient vector: 50–500 MB for a clinical model

def emit_outcome_packet(
    raw_signal: bytes,          # local sensor / EHR fragment
    local_model,                # stays at the edge — never transmitted
    node_id: str,
    routing_context: dict
) -> bytes:
    """
    Distill a local observation into a routable outcome packet.
    Raw data never leaves this function. Only the outcome does.
    """

    # Local inference — model stays local
    outcome_vector = local_model.infer(raw_signal)      # runs on-device

    # Semantic fingerprint: 32 bytes
    semantic_fingerprint = hashlib.sha256(
        outcome_vector.tobytes()
    ).digest()

    # Confidence score: 4 bytes (float32)
    confidence = struct.pack("f", float(outcome_vector.max()))

    # Timestamp: 8 bytes
    timestamp = struct.pack("Q", int(time.time_ns()))

    # Node identifier: 16 bytes
    node_bytes = node_id.encode("utf-8")[:16].ljust(16, b"\x00")

    # Routing header: deterministic address derived from fingerprint
    routing_address = hashlib.sha256(
        semantic_fingerprint + routing_context["domain"].encode()
    ).digest()[:16]   # 16 bytes

    # Payload: outcome summary — NOT gradients, NOT raw data
    payload_summary = outcome_vector.tobytes()[:400]    # ~400 bytes

    packet = (
        semantic_fingerprint    # 32 bytes
        + confidence            #  4 bytes
        + timestamp             #  8 bytes
        + node_bytes            # 16 bytes
        + routing_address       # 16 bytes
        + payload_summary       # ~400 bytes
    )
    # Total: ~476 bytes — well within 512-byte target

    print(f"Packet size: {len(packet)} bytes")
    print(f"Equivalent gradient size: ~{local_model.param_count * 4 // 1_000_000} MB")
    # >> Packet size: 476 bytes
    # >> Equivalent gradient size: ~100 MB

    return packet

# For comparison: a Swarm Learning consensus round
# (conceptual — HPE Swarm Learning is proprietary)
#
# swarm_round():
#     for node in nodes:
#         local_gradients = node.train(local_data)      # ~100 MB per node
#         blockchain.submit(local_gradients)             # Hyperledger Fabric tx
#     merged = blockchain.consensus_merge(all_gradients) # 2–7s latency per tx
#     for node in nodes:
#         node.update(merged)                            # ~100 MB back to each node
#
# At N=100: ~10 GB moved per round, 2–7s blockchain latency per merge step
# At N=10,000: throughput wall — Hyperledger Fabric has documented limits

2. Blockchain Coordination Overhead vs. No Coordinator

Hyperledger Fabric has a known throughput ceiling. In controlled enterprise deployments, it handles hundreds to low thousands of transactions per second. In a Swarm Learning deployment, each consensus round is a blockchain transaction. At N=100 hospitals, round latency is manageable: Warnat-Herresthal et al. demonstrated this experimentally.

At N=10,000 nodes — the scale of a national health network — Hyperledger Fabric's consensus latency compounds. Each round requires all participating nodes to submit, validate, and receive merged parameters through the chain. The 2–7 second per-transaction latency documented for Hyperledger Fabric does not disappear at scale; it multiplies.

QIS has no consensus mechanism. Outcome packets route to deterministic addresses derived from their semantic fingerprints. There is no coordinator to saturate. A network of 10,000 nodes produces 10,000 outcome packets per cycle, each routed independently to its deterministic destination. The routing cost is at most O(log N) — and on many transport implementations, O(1).

This is not a theoretical advantage. It is a structural property of removing coordination from the critical path.

3. Gradient Leakage: A Privacy Surface QIS Does Not Have

Swarm Learning's privacy claim is accurate but incomplete: raw data never leaves the node. What leaves the node is gradients — and gradients are invertible.

Zhu et al. (NeurIPS 2019) demonstrated that a well-resourced adversary can reconstruct training data from gradient vectors with high fidelity. The attack — Deep Leakage from Gradients — requires only the gradient updates that Swarm Learning transmits by design. Differential privacy can mitigate this, but it introduces accuracy degradation that the Nature 2021 paper did not fully characterize.

QIS transmits outcome packets. An outcome packet contains a semantic fingerprint and a confidence score — not a gradient. There is no known inversion attack on a 32-byte semantic fingerprint of a distilled outcome. The privacy guarantee is architectural, not statistical.

4. The N=1 Problem: One Clinic With Three Patients

Federated learning — and Swarm Learning inherits this constraint — requires enough local data to train a meaningful gradient. A rural clinic with three patients presenting a rare autoimmune condition cannot contribute a useful gradient update. The local sample size is too small. Standard FL theory tells you to wait until the local dataset is sufficient.

QIS works with N=1. A single patient visit produces a signal. That signal is processed locally, distilled into a 512-byte outcome packet, and routed to semantically similar outcomes across the network. The synthesis happens at the receiving end — not through gradient averaging, but through outcome aggregation by agents already positioned at relevant addresses.

The practical consequence: orphaned patient populations — rare disease cohorts, rural clinics, underrepresented demographics — can participate in QIS from the first observation. They cannot meaningfully participate in Swarm Learning until they accumulate training-scale data.

5. Schema Lock vs. Packet Format Agreement

Swarm Learning's blockchain coordination requires all participating nodes to converge on the same model architecture. Gradient merging is mathematically meaningless across different architectures — you cannot average the weights of a ResNet and a transformer. This imposes a coordination tax before the first training round: all institutions must agree on model schema, version, and parameter layout.

QIS requires agreement on a ~512-byte packet format. The local model at each node can be any architecture — classical ML, deep learning, rules-based expert system, or a future model type not yet designed. The packet format is the protocol boundary. Everything behind it is local.

Head-to-Head Comparison

Dimension	HPE Swarm Learning	QIS Protocol
Unit of sharing	Model gradients / parameters (50–500 MB per round)	Outcome packets (~512 bytes per cycle)
Coordination mechanism	Hyperledger Fabric blockchain consensus	None — deterministic routing, no coordinator
Privacy guarantee	Gradients not raw data (gradient leakage risk applies)	Outcomes not gradients (no known inversion surface)
N=1 edge sites	Not supported — insufficient data for gradient training	Supported — one observation produces one outcome packet
Schema requirement	All nodes must share identical model architecture	Only packet format agreement required
Scaling trajectory	Blockchain consensus latency compounds at N>1,000	At most O(log N) routing cost, O(1) on many transports
Demonstrated clinical results	Yes — COVID-19, leukemia, TB, mortality (Nature 2021)	Architecture published; clinical deployments in progress
Gradient leakage exposure	Yes — Zhu et al. 2019 applies	No — outcomes are not invertible to training data

Where HPE Swarm Learning Is the Right Choice

HPE Swarm Learning is the right tool when:

You need a trained global model — a single artifact that all nodes converge on — and you need it to match centralized training accuracy on a fixed dataset.
Your node count is in the tens to low hundreds, where Hyperledger Fabric's consensus overhead is manageable.
All participating institutions can agree on model architecture upfront and maintain version synchronization.
You are operating in a well-resourced enterprise environment where Hyperledger Fabric's infrastructure requirements are not a constraint.
Your threat model is "no raw data leaves nodes" and gradient leakage risk is acceptable given your differential privacy budget.

The Nature 2021 results are real. For the use case it was designed for — multi-institutional model training across a fixed, agreed-upon architecture — HPE Swarm Learning delivers.

Where QIS Is the Right Choice

QIS is the correct architecture when:

Your network will scale beyond hundreds of nodes — national health networks, global rare disease registries, distributed sensor infrastructure.
You have edge sites with small or single-patient populations who cannot produce training-scale gradients.
Your institutions use heterogeneous model architectures and cannot coordinate on a shared schema.
Your privacy requirement is stronger than "gradients not raw data" — you need "outcomes not gradients" with no known inversion surface.
You need continuous real-time routing, not periodic training consensus rounds with blockchain coordination latency.
Your transport infrastructure is constrained — rural clinic satellite links, IoT medical devices, mesh networks in low-resource environments where 512 bytes is the design budget.

The Scaling Question: What Happens at N=100,000?

This is not a hypothetical. A global rare disease network connecting hospitals across 150 countries would require tens of thousands of nodes. A distributed pandemic surveillance system connecting regional health authorities worldwide would require more.

At N=100,000 nodes, Swarm Learning's architecture faces a structural problem. Each consensus round requires all nodes to submit gradients through Hyperledger Fabric and receive merged parameters back. Bonawitz et al. (SysML 2019) characterized the coordination overhead in federated learning at scale: the communication and coordination costs become the dominant bottleneck, not the local training. Swarm Learning substitutes a blockchain for a central aggregator — but the coordination problem does not disappear, it is redistributed across the chain.

At N=100,000, QIS produces 100,000 outcome packets per cycle. Each packet routes independently to its deterministic address. There is no consensus round. There is no merge step. There is no version coordination. The network generates N(N-1)/2 synthesis opportunities — approximately 5 billion at N=100,000 — at a routing cost that is at most O(log N) per packet, and O(1) on transport implementations that support direct addressing.

The architecture does not change as N grows. That is the point.

A Note on What QIS Is Not

QIS is not federated learning with a different aggregation rule. It is not a blockchain variant. It is not a compression scheme applied to gradients.

The breakthrough, as Christopher Thomas Trevethan discovered it, is the complete loop: raw signal processed locally, distilled into an outcome packet, fingerprinted semantically, routed by similarity to a deterministic address, received by agents positioned at that address, synthesized locally, generating new outcome packets that continue the loop. The loop is the architecture. No single component is the innovation — the complete loop is the innovation.

The Three Elections — Hiring (experts define the similarity function), The Math (outcomes are the votes, no added weighting layer needed), Darwinism (networks compete on outcome quality, users migrate to what works) — are metaphors for emergent properties of this loop. They are not engineered mechanisms. They emerge because the loop selects for accuracy by routing outcomes to agents already positioned near similar outcomes.

IP protection is in place. QIS is open for nonprofit, research, and education use. Commercial licenses fund global deployment.

Conclusion

HPE Swarm Learning is a genuine contribution to distributed health AI. The Nature 2021 paper demonstrated something real: hospitals across four countries trained a shared model without sharing patient data, matched centralized accuracy on multiple clinical tasks, and did it with a peer-to-peer architecture that removed the trusted central aggregator.

The architectural ceiling it hits is the unit of sharing. When the thing you share is a gradient, you inherit gradient size, gradient leakage risk, schema lock, and the coordination overhead required to merge gradients across heterogeneous institutions. Swarm Learning replaced the central aggregator with a blockchain — but it kept gradients.

QIS removes gradients from the protocol entirely. The unit of sharing is an outcome: what a local model concluded, distilled to ~512 bytes, routed to where it is semantically relevant. The local model stays local. The raw data stays local. The gradient never exists outside the node.

That is not a refinement of Swarm Learning. It is a different answer to the same underlying question: how do distributed nodes share what they learned without sharing what they saw?

References

Warnat-Herresthal, S. et al. (2021). "Swarm Learning for decentralized and confidential clinical machine learning." Nature, 594, 265–270. DOI: 10.1038/s41586-021-03583-3
Zhu, L., Liu, Z., & Han, S. (2019). "Deep Leakage from Gradients." NeurIPS 2019. arXiv:1906.08935
Bonawitz, K. et al. (2019). "Towards Federated Learning at Scale: System Design." SysML 2019. arXiv:1902.01046

Christopher Thomas Trevethan discovered QIS on June 16, 2025. 39 provisional patents filed. QIS is open for nonprofit, research, and education use. qisprotocol.com

DEV Community