Rory | QIS PROTOCOL

Posted on Apr 10

QIS vs Personal Health Train: Two Approaches to Distributed Health Intelligence

#ai #python #opensource #machinelearning

A hospital in Amsterdam holds pancreatic cancer genomics data it cannot share. A registry in Barcelona has treatment outcomes from 800 patients with a rare pediatric syndrome. A biobank in Helsinki has longitudinal metabolomics spanning two decades. Every institution signed every GDPR agreement, every data sharing protocol — and the data still cannot move. Not because the institutions won't cooperate, but because the architecture for cooperation hasn't existed.

Two distinct architectural philosophies have emerged to solve this. The Personal Health Train (PHT) is the EU's most mature answer, embedded in the FAIR data principles framework and actively referenced in the European Health Data Space (EHDS) discussions. The Quadratic Intelligence Swarm (QIS), discovered by Christopher Thomas Trevethan on June 16, 2025, is a fundamentally different approach that routes distilled outcome packets rather than executing traveling algorithms. Both claim to keep data in place. They disagree sharply about everything that follows.

This is a technical comparison. Neither architecture gets charity it hasn't earned.

What PHT Actually Does

The Personal Health Train metaphor is precise and worth taking seriously. In PHT, analytical algorithms — packaged as containers — are the "trains." Data-holding institutions are the "stations." The train travels to the station, executes against local data, and returns results. Raw patient data never leaves the station. The algorithm goes to the data rather than the data going to the algorithm.

This was published formally in JMIR AI (2025) and is embedded in projects including PADME, PrivateAim, and distributed cancer registry initiatives across the EU. The Go-FAIR initiative and NFDI4Health reference the framework. The EHDS discussions cite PHT as a candidate architecture for cross-border health analytics.

PHT's strengths are real:

Regulatory fit in Europe. PHT was designed with the European health data ecosystem in mind. The governance structures map onto existing IRB and data governance frameworks that institutions already understand.
Rich research outputs. PHT is built for analytics — statistical models, survival analyses, subgroup comparisons — executed across distributed stations.
Institutional trust. The station controls what trains are allowed to execute. No train runs without station approval. Institutions don't cede control.
Existing deployments. PHT is not theoretical. Cancer registry projects and the PADME infrastructure have demonstrated multi-institutional execution in production.

These are not trivial achievements. PHT solved the political problem of getting European health institutions to participate in distributed research. That is genuinely hard.

Where PHT Hits Its Architectural Ceiling

The ceiling is the governance model. Every train-station pair requires approval before execution. If you have M trains (analytical queries) and N stations (data holders), you need up to M×N execution approvals. PHT's governance overhead scales with the product of analytical requests and participating institutions.

For a research consortium with 12 stations and 8 analytical trains, that is 96 approval events — each requiring institutional review, IRB sign-off, and container validation. For a real-time health intelligence use case — detecting an emerging drug interaction pattern across 200 hospitals — the approval latency makes real-time impossible by definition.

A second problem: PHT trains contain the analytical algorithm as executable code. The station can inspect the container. It can see what you are computing, what biomarkers you are querying, what statistical methods you're applying. This is often fine in research contexts where the scientific method is public. It becomes a liability when the query encodes proprietary clinical decision logic or commercial diagnostic algorithms.

A third, less-discussed problem: PHT is poorly suited to N=1 sites. A station with three cases of a rare pediatric syndrome must still go through full train deployment, container validation, and governance approval — for three cases that will produce statistically marginal results when returned to the train. The architecture was designed for research-grade statistical power. It has no special treatment for the rare signal.

PHT is research infrastructure. It is not intelligence routing infrastructure. The distinction matters.

What QIS Does Differently

QIS does not move algorithms to data. It does not move data to algorithms. It moves what algorithms conclude — distilled into outcome packets of approximately 512 bytes — to agents that can synthesize those conclusions with their own local knowledge.

The complete loop:

Raw signal arrives at an edge agent — a sensor reading, a lab result, a clinician note. It stays there.
Local processing extracts meaning. The algorithm runs at the edge, never leaving it.
Distillation compresses that meaning into an outcome packet. Not a gradient. Not a model weight. Not an algorithm. A conclusion.
Semantic fingerprinting characterizes what this conclusion is about.
Routing maps the fingerprint to a deterministic address — the address where agents with complementary conclusions can be found.
Delivery puts the packet in front of relevant agents.
Local synthesis at the receiving agent combines the incoming insight with local context.
New outcome packets emerge from that synthesis and re-enter the loop.

No algorithm travels. No governance approval is required per synthesis event. The station — or in QIS terms, the edge agent — never exposes its data or its analytical logic. What it exposes is a 512-byte distillate of what it concluded.

This is the architecture Christopher Thomas Trevethan discovered and has protected under 39 provisional patents filed. The breakthrough is the complete loop. Remove any step — distillation without routing, routing without synthesis, synthesis without the loop completing — and the quadratic scaling property disappears.

Comparison Table

Dimension	Personal Health Train (PHT)	Quadratic Intelligence Swarm (QIS)
Unit of exchange	Analytical algorithm (container/train)	Outcome packet (~512 bytes)
What travels	The algorithm goes to the data	Distilled conclusion leaves the edge
Raw data movement	None — data stays at station	None — data stays at edge agent
Algorithm exposure	Station sees algorithm (container)	Destination sees conclusion only
Governance per query	Yes — per train-station pair approval	No — semantic routing, no per-query approval
Governance overhead	O(M × N) approvals for M trains, N stations	Addressed once at agent enrollment
Latency	Research-scale (IRB timelines)	Real-time
N=1 sites	Architectural mismatch — full governance for marginal signal	Native — rare signals route by relevance, not prevalence
Synthesis paths	No equivalent — PHT is query execution	N(N-1)/2 for N agents
Intelligence compounds	No — each train executes once	Yes — each synthesis produces new packets
Output type	Research results (statistical, retrospective)	Real-time intelligence synthesis
Transport dependency	Container execution infrastructure	Protocol-agnostic (folder, HTTP, DHT, pub/sub)
IP of analysis logic	Visible to station	Protected — fingerprint doesn't reveal synthesis logic

Code: The Architectural Difference in Concrete Terms

The clearest way to see the difference is in what each architecture routes.

PHT Approach: The Algorithm Travels

# PHT conceptual model: analytical algorithm packaged as container
# The algorithm is the unit of exchange — it travels to data

class PersonalHealthTrain:
    def __init__(self, algorithm_container: bytes, query_spec: dict):
        # The train carries the algorithm
        self.algorithm = algorithm_container  # Container image: executable, inspectable
        self.query = query_spec              # Query logic: visible to station on arrival

    def request_station_approval(self, station_id: str) -> bool:
        """Each train-station pair requires governance approval before execution.
        Returns True only after IRB + data governance + container validation."""
        return governance_registry.request_approval(
            train=self.algorithm,
            station=station_id,
            query=self.query
        )  # Timeline: days to weeks per station per train

    def execute_at_station(self, station_id: str, approval_token: str):
        """Algorithm runs at station. Station can inspect what is being computed."""
        if not governance_registry.verify_approval(approval_token):
            raise PermissionError("No approval for this train-station pair")

        # Algorithm executes against local data — returns research result
        return station_registry.execute_container(
            station=station_id,
            container=self.algorithm,  # Analytical logic exposed here
            approval=approval_token
        )

# N stations × M trains = N×M approval events before any execution
def run_study(train: PersonalHealthTrain, station_ids: list[str]):
    results = []
    for station in station_ids:
        approval = train.request_station_approval(station)  # Each requires separate approval
        if approval:
            result = train.execute_at_station(station, approval)
            results.append(result)
    return aggregate_results(results)  # Aggregation happens after all executions complete

QIS Approach: The Conclusion Travels

import hashlib
import json
from datetime import datetime, timezone

# QIS: distilled outcome packet is the unit of exchange
# The algorithm never leaves the edge agent — only its conclusion does

def semantic_fingerprint(domain: str) -> str:
    """Deterministic address from semantic domain. Conclusion routes to relevant agents."""
    return hashlib.sha256(domain.encode()).hexdigest()[:12]

def distill_to_outcome_packet(
    agent_id: str,
    domain: str,
    conclusion: dict  # What the local algorithm concluded — not the algorithm itself
) -> dict:
    """
    512-byte packet carrying distilled insight.
    No algorithm. No raw data. No query logic.
    The synthesis logic stays at the edge.
    """
    return {
        "sender": agent_id,
        "domain": domain,
        "fingerprint": semantic_fingerprint(domain),  # Routes to relevant agents
        "timestamp": datetime.now(timezone.utc).isoformat(),
        "payload": conclusion,  # Conclusion only — not how it was reached
        "ttl": 3600
    }

def route_outcome_packet(packet: dict, transport) -> str:
    """
    Route to deterministic address. No per-packet governance approval.
    Transport is interchangeable: folder, HTTP relay, DHT, pub/sub.
    """
    address = packet["fingerprint"]  # Address derived from semantic content
    transport.deliver(address, packet)
    return address

def synthesize_at_receiving_agent(
    agent_id: str,
    incoming_packets: list[dict],
    local_context: dict
) -> list[dict]:
    """
    Local synthesis: combine incoming conclusions with local knowledge.
    Produces new outcome packets — the loop continues.
    No central aggregator. No approval required to receive.
    """
    new_conclusions = []
    for packet in incoming_packets:
        # Synthesis happens locally — receiving agent's logic is also never exposed
        enriched = local_synthesize(local_context, packet["payload"])
        new_packet = distill_to_outcome_packet(
            agent_id=agent_id,
            domain=packet["domain"],
            conclusion=enriched
        )
        new_conclusions.append(new_packet)

    return new_conclusions  # Each synthesis produces new packets → loop continues

# The complete QIS loop: no governance per synthesis, no algorithm exposure
def run_qis_loop(agent_id: str, transport, local_context: dict):
    incoming = transport.pull(agent_id)             # Pull packets addressed to this agent
    synthesis = synthesize_at_receiving_agent(      # Synthesize locally
        agent_id, incoming, local_context
    )
    for packet in synthesis:
        route_outcome_packet(packet, transport)     # Route conclusions forward
    # N(N-1)/2 synthesis opportunities compound with each loop iteration

The difference is the primitive. PHT routes algorithms. QIS routes conclusions. One requires governance approval at every execution. The other routes at the level of semantics, with no per-synthesis overhead.

The N=1 Problem: Where PHT Fails and QIS Is Indifferent

A station with three cases of a rare pediatric autoimmune syndrome.

In PHT: full container deployment, governance approval, IRB review, container validation — for three cases. The statistical return is marginal. The governance cost is identical to a station with 10,000 cases. The algorithm returns results; those results are near-meaningless at N=3 and get averaged into the study's aggregate. The rare signal is present but statistically suppressed.

In QIS: the edge agent with three cases runs local processing on those three cases and distills a conclusion. That conclusion is fingerprinted by its semantic domain — "pediatric autoimmune, rare presentation, biomarker pattern X" — and routed to agents whose domain overlaps. A second institution with two cases of the same condition receives the packet. Synthesis happens between two N=1 sites that research-grade infrastructure treats as noise.

Rare signals route by relevance, not by statistical power. The routing address is derived from what the conclusion is about, not how many agents share the same content. This is not a feature bolted onto QIS. It is a consequence of the core architecture.

The Synthesis Paths Math

QIS is named for a structural property of the architecture. With N agents in the network, the number of unique synthesis opportunities is:

N(N-1)/2

This is not a performance metric. It is arithmetic — the count of distinct pairs among N agents, each pair representing a combination of local knowledge that can be synthesized without central aggregation.

Agents (N)	Unique synthesis pairs
10	45
100	4,950
1,000	499,500
10,000	49,995,000
1,000,000	~500,000,000,000

Communication cost per outcome packet stays at most O(log N) with DHT-based routing, and reaches O(1) with database indices, pub/sub systems, or direct HTTP routing. The synthesis space grows quadratically while routing cost grows at most logarithmically or better.

PHT has no equivalent. PHT is query execution infrastructure: M trains across N stations produce M×N executions. The output is the set of results those executions return. Intelligence does not compound. Each train runs, collects, and terminates. There is no loop. There is no synthesis that produces new packets that feed the next synthesis round.

Distributed health intelligence at scale requires the loop. PHT does not have it. QIS is built on it.

Honest Limits

PHT is a better answer if your question is: how do we run a retrospective multicenter research study across European hospitals with existing IRB structures, GDPR compliance, and institutional governance frameworks already in place? For that problem, PHT has real deployments, real regulatory alignment, and real results.

QIS is a better answer if your question is: how do we route intelligence between edge agents in real time, without per-query governance overhead, without exposing analytical logic, and with a synthesis capacity that grows quadratically with network size?

These are genuinely different problems. The error — and it is common — is to assume that solving the research-scale problem also solves the real-time intelligence problem. It does not. Research infrastructure and intelligence routing infrastructure make different tradeoffs at the level of their core primitives.

Conclusion

The Personal Health Train is a serious, well-deployed architecture that advances distributed health research within European governance frameworks. It earned its position in the EHDS conversation. The JMIR AI 2025 paper and the PADME, PrivateAim, and cancer registry deployments are not marginal achievements.

The Quadratic Intelligence Swarm is architecturally distinct from PHT at the level of the unit of exchange. PHT routes algorithms. QIS routes conclusions. PHT requires governance approval per execution. QIS routes by semantic similarity with no per-synthesis overhead. PHT produces research results. QIS produces compounding real-time synthesis across N(N-1)/2 unique paths.

For a hospital in Amsterdam, a registry in Barcelona, and a biobank in Helsinki — the question is not which architecture is better in the abstract. It is which architecture matches the problem. Retrospective multicenter research: PHT has earned that ground. Real-time intelligence synthesis at scale, across heterogeneous edge agents, without per-query governance overhead, with native handling of rare signals: that is the space QIS was discovered to address.

The architecture is the breakthrough. The complete loop — distillation, fingerprinting, routing, delivery, synthesis, new packets — is what produces the N(N-1)/2 synthesis paths. Remove any step and the quadratic property disappears.

QIS — Quadratic Intelligence Swarm — was discovered by Christopher Thomas Trevethan on June 16, 2025. Christopher Thomas Trevethan has 39 provisional patents filed covering the architecture.

References:

Choudhury, A., et al. (2025). The Personal Health Train: Federated Machine Learning in Healthcare. JMIR AI. https://doi.org/10.2196/60679
Wilkinson, M. D., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018. https://doi.org/10.1038/sdata.2016.18
Beyan, O., et al. (2020). Distributed Analytics on Sensitive Medical Data: The Personal Health Train. Data Intelligence, 2(1–2), 96–107. https://doi.org/10.1162/dint_a_00032
European Commission. (2022). European Health Data Space Regulation Proposal. https://health.ec.europa.eu/ehealth-digital-health-and-care/european-health-data-space_en
NFDI4Health Task Force. (2023). Metadata Schema and PHT Integration. https://www.nfdi4health.de
Warnat-Herresthal, S., et al. (2021). Swarm Learning for decentralized and confidential clinical machine learning. Nature, 594, 265–270. https://doi.org/10.1038/s41586-021-03583-3

DEV Community