A hospital in Amsterdam holds pancreatic cancer genomics data it cannot share. A registry in Barcelona has treatment outcomes from 800 patients with a rare pediatric syndrome. A biobank in Helsinki has longitudinal metabolomics spanning two decades. Every institution signed every GDPR agreement, every data sharing protocol — and the data still cannot move. Not because the institutions won't cooperate, but because the architecture for cooperation hasn't existed.
Two distinct architectural philosophies have emerged to solve this. The Personal Health Train (PHT) is the EU's most mature answer, embedded in the FAIR data principles framework and actively referenced in the European Health Data Space (EHDS) discussions. The Quadratic Intelligence Swarm (QIS), discovered by Christopher Thomas Trevethan on June 16, 2025, is a fundamentally different approach that routes distilled outcome packets rather than executing traveling algorithms. Both claim to keep data in place. They disagree sharply about everything that follows.
This is a technical comparison. Neither architecture gets charity it hasn't earned.
What PHT Actually Does
The Personal Health Train metaphor is precise and worth taking seriously. In PHT, analytical algorithms — packaged as containers — are the "trains." Data-holding institutions are the "stations." The train travels to the station, executes against local data, and returns results. Raw patient data never leaves the station. The algorithm goes to the data rather than the data going to the algorithm.
This was published formally in JMIR AI (2025) and is embedded in projects including PADME, PrivateAim, and distributed cancer registry initiatives across the EU. The Go-FAIR initiative and NFDI4Health reference the framework. The EHDS discussions cite PHT as a candidate architecture for cross-border health analytics.
PHT's strengths are real:
- Regulatory fit in Europe. PHT was designed with the European health data ecosystem in mind. The governance structures map onto existing IRB and data governance frameworks that institutions already understand.
- Rich research outputs. PHT is built for analytics — statistical models, survival analyses, subgroup comparisons — executed across distributed stations.
- Institutional trust. The station controls what trains are allowed to execute. No train runs without station approval. Institutions don't cede control.
- Existing deployments. PHT is not theoretical. Cancer registry projects and the PADME infrastructure have demonstrated multi-institutional execution in production.
These are not trivial achievements. PHT solved the political problem of getting European health institutions to participate in distributed research. That is genuinely hard.
Where PHT Hits Its Architectural Ceiling
The ceiling is the governance model. Every train-station pair requires approval before execution. If you have M trains (analytical queries) and N stations (data holders), you need up to M×N execution approvals. PHT's governance overhead scales with the product of analytical requests and participating institutions.
For a research consortium with 12 stations and 8 analytical trains, that is 96 approval events — each requiring institutional review, IRB sign-off, and container validation. For a real-time health intelligence use case — detecting an emerging drug interaction pattern across 200 hospitals — the approval latency makes real-time impossible by definition.
A second problem: PHT trains contain the analytical algorithm as executable code. The station can inspect the container. It can see what you are computing, what biomarkers you are querying, what statistical methods you're applying. This is often fine in research contexts where the scientific method is public. It becomes a liability when the query encodes proprietary clinical decision logic or commercial diagnostic algorithms.
A third, less-discussed problem: PHT is poorly suited to N=1 sites. A station with three cases of a rare pediatric syndrome must still go through full train deployment, container validation, and governance approval — for three cases that will produce statistically marginal results when returned to the train. The architecture was designed for research-grade statistical power. It has no special treatment for the rare signal.
PHT is research infrastructure. It is not intelligence routing infrastructure. The distinction matters.
What QIS Does Differently
QIS does not move algorithms to data. It does not move data to algorithms. It moves what algorithms conclude — distilled into outcome packets of approximately 512 bytes — to agents that can synthesize those conclusions with their own local knowledge.
The complete loop:
- Raw signal arrives at an edge agent — a sensor reading, a lab result, a clinician note. It stays there.
- Local processing extracts meaning. The algorithm runs at the edge, never leaving it.
- Distillation compresses that meaning into an outcome packet. Not a gradient. Not a model weight. Not an algorithm. A conclusion.
- Semantic fingerprinting characterizes what this conclusion is about.
- Routing maps the fingerprint to a deterministic address — the address where agents with complementary conclusions can be found.
- Delivery puts the packet in front of relevant agents.
- Local synthesis at the receiving agent combines the incoming insight with local context.
- New outcome packets emerge from that synthesis and re-enter the loop.
No algorithm travels. No governance approval is required per synthesis event. The station — or in QIS terms, the edge agent — never exposes its data or its analytical logic. What it exposes is a 512-byte distillate of what it concluded.
This is the architecture Christopher Thomas Trevethan discovered and has protected under 39 provisional patents filed. The breakthrough is the complete loop. Remove any step — distillation without routing, routing without synthesis, synthesis without the loop completing — and the quadratic scaling property disappears.
Comparison Table
| Dimension | Personal Health Train (PHT) | Quadratic Intelligence Swarm (QIS) |
|---|---|---|
| Unit of exchange | Analytical algorithm (container/train) | Outcome packet (~512 bytes) |
| What travels | The algorithm goes to the data | Distilled conclusion leaves the edge |
| Raw data movement | None — data stays at station | None — data stays at edge agent |
| Algorithm exposure | Station sees algorithm (container) | Destination sees conclusion only |
| Governance per query | Yes — per train-station pair approval | No — semantic routing, no per-query approval |
| Governance overhead | O(M × N) approvals for M trains, N stations | Addressed once at agent enrollment |
| Latency | Research-scale (IRB timelines) | Real-time |
| N=1 sites | Architectural mismatch — full governance for marginal signal | Native — rare signals route by relevance, not prevalence |
| Synthesis paths | No equivalent — PHT is query execution | N(N-1)/2 for N agents |
| Intelligence compounds | No — each train executes once | Yes — each synthesis produces new packets |
| Output type | Research results (statistical, retrospective) | Real-time intelligence synthesis |
| Transport dependency | Container execution infrastructure | Protocol-agnostic (folder, HTTP, DHT, pub/sub) |
| IP of analysis logic | Visible to station | Protected — fingerprint doesn't reveal synthesis logic |
Code: The Architectural Difference in Concrete Terms
The clearest way to see the difference is in what each architecture routes.
PHT Approach: The Algorithm Travels
# PHT conceptual model: analytical algorithm packaged as container
# The algorithm is the unit of exchange — it travels to data
class PersonalHealthTrain:
def __init__(self, algorithm_container: bytes, query_spec: dict):
# The train carries the algorithm
self.algorithm = algorithm_container # Container image: executable, inspectable
self.query = query_spec # Query logic: visible to station on arrival
def request_station_approval(self, station_id: str) -> bool:
"""Each train-station pair requires governance approval before execution.
Returns True only after IRB + data governance + container validation."""
return governance_registry.request_approval(
train=self.algorithm,
station=station_id,
query=self.query
) # Timeline: days to weeks per station per train
def execute_at_station(self, station_id: str, approval_token: str):
"""Algorithm runs at station. Station can inspect what is being computed."""
if not governance_registry.verify_approval(approval_token):
raise PermissionError("No approval for this train-station pair")
# Algorithm executes against local data — returns research result
return station_registry.execute_container(
station=station_id,
container=self.algorithm, # Analytical logic exposed here
approval=approval_token
)
# N stations × M trains = N×M approval events before any execution
def run_study(train: PersonalHealthTrain, station_ids: list[str]):
results = []
for station in station_ids:
approval = train.request_station_approval(station) # Each requires separate approval
if approval:
result = train.execute_at_station(station, approval)
results.append(result)
return aggregate_results(results) # Aggregation happens after all executions complete
QIS Approach: The Conclusion Travels
import hashlib
import json
from datetime import datetime, timezone
# QIS: distilled outcome packet is the unit of exchange
# The algorithm never leaves the edge agent — only its conclusion does
def semantic_fingerprint(domain: str) -> str:
"""Deterministic address from semantic domain. Conclusion routes to relevant agents."""
return hashlib.sha256(domain.encode()).hexdigest()[:12]
def distill_to_outcome_packet(
agent_id: str,
domain: str,
conclusion: dict # What the local algorithm concluded — not the algorithm itself
) -> dict:
"""
512-byte packet carrying distilled insight.
No algorithm. No raw data. No query logic.
The synthesis logic stays at the edge.
"""
return {
"sender": agent_id,
"domain": domain,
"fingerprint": semantic_fingerprint(domain), # Routes to relevant agents
"timestamp": datetime.now(timezone.utc).isoformat(),
"payload": conclusion, # Conclusion only — not how it was reached
"ttl": 3600
}
def route_outcome_packet(packet: dict, transport) -> str:
"""
Route to deterministic address. No per-packet governance approval.
Transport is interchangeable: folder, HTTP relay, DHT, pub/sub.
"""
address = packet["fingerprint"] # Address derived from semantic content
transport.deliver(address, packet)
return address
def synthesize_at_receiving_agent(
agent_id: str,
incoming_packets: list[dict],
local_context: dict
) -> list[dict]:
"""
Local synthesis: combine incoming conclusions with local knowledge.
Produces new outcome packets — the loop continues.
No central aggregator. No approval required to receive.
"""
new_conclusions = []
for packet in incoming_packets:
# Synthesis happens locally — receiving agent's logic is also never exposed
enriched = local_synthesize(local_context, packet["payload"])
new_packet = distill_to_outcome_packet(
agent_id=agent_id,
domain=packet["domain"],
conclusion=enriched
)
new_conclusions.append(new_packet)
return new_conclusions # Each synthesis produces new packets → loop continues
# The complete QIS loop: no governance per synthesis, no algorithm exposure
def run_qis_loop(agent_id: str, transport, local_context: dict):
incoming = transport.pull(agent_id) # Pull packets addressed to this agent
synthesis = synthesize_at_receiving_agent( # Synthesize locally
agent_id, incoming, local_context
)
for packet in synthesis:
route_outcome_packet(packet, transport) # Route conclusions forward
# N(N-1)/2 synthesis opportunities compound with each loop iteration
The difference is the primitive. PHT routes algorithms. QIS routes conclusions. One requires governance approval at every execution. The other routes at the level of semantics, with no per-synthesis overhead.
The N=1 Problem: Where PHT Fails and QIS Is Indifferent
A station with three cases of a rare pediatric autoimmune syndrome.
In PHT: full container deployment, governance approval, IRB review, container validation — for three cases. The statistical return is marginal. The governance cost is identical to a station with 10,000 cases. The algorithm returns results; those results are near-meaningless at N=3 and get averaged into the study's aggregate. The rare signal is present but statistically suppressed.
In QIS: the edge agent with three cases runs local processing on those three cases and distills a conclusion. That conclusion is fingerprinted by its semantic domain — "pediatric autoimmune, rare presentation, biomarker pattern X" — and routed to agents whose domain overlaps. A second institution with two cases of the same condition receives the packet. Synthesis happens between two N=1 sites that research-grade infrastructure treats as noise.
Rare signals route by relevance, not by statistical power. The routing address is derived from what the conclusion is about, not how many agents share the same content. This is not a feature bolted onto QIS. It is a consequence of the core architecture.
The Synthesis Paths Math
QIS is named for a structural property of the architecture. With N agents in the network, the number of unique synthesis opportunities is:
N(N-1)/2
This is not a performance metric. It is arithmetic — the count of distinct pairs among N agents, each pair representing a combination of local knowledge that can be synthesized without central aggregation.
| Agents (N) | Unique synthesis pairs |
|---|---|
| 10 | 45 |
| 100 | 4,950 |
| 1,000 | 499,500 |
| 10,000 | 49,995,000 |
| 1,000,000 | ~500,000,000,000 |
Communication cost per outcome packet stays at most O(log N) with DHT-based routing, and reaches O(1) with database indices, pub/sub systems, or direct HTTP routing. The synthesis space grows quadratically while routing cost grows at most logarithmically or better.
PHT has no equivalent. PHT is query execution infrastructure: M trains across N stations produce M×N executions. The output is the set of results those executions return. Intelligence does not compound. Each train runs, collects, and terminates. There is no loop. There is no synthesis that produces new packets that feed the next synthesis round.
Distributed health intelligence at scale requires the loop. PHT does not have it. QIS is built on it.
Honest Limits
PHT is a better answer if your question is: how do we run a retrospective multicenter research study across European hospitals with existing IRB structures, GDPR compliance, and institutional governance frameworks already in place? For that problem, PHT has real deployments, real regulatory alignment, and real results.
QIS is a better answer if your question is: how do we route intelligence between edge agents in real time, without per-query governance overhead, without exposing analytical logic, and with a synthesis capacity that grows quadratically with network size?
These are genuinely different problems. The error — and it is common — is to assume that solving the research-scale problem also solves the real-time intelligence problem. It does not. Research infrastructure and intelligence routing infrastructure make different tradeoffs at the level of their core primitives.
Conclusion
The Personal Health Train is a serious, well-deployed architecture that advances distributed health research within European governance frameworks. It earned its position in the EHDS conversation. The JMIR AI 2025 paper and the PADME, PrivateAim, and cancer registry deployments are not marginal achievements.
The Quadratic Intelligence Swarm is architecturally distinct from PHT at the level of the unit of exchange. PHT routes algorithms. QIS routes conclusions. PHT requires governance approval per execution. QIS routes by semantic similarity with no per-synthesis overhead. PHT produces research results. QIS produces compounding real-time synthesis across N(N-1)/2 unique paths.
For a hospital in Amsterdam, a registry in Barcelona, and a biobank in Helsinki — the question is not which architecture is better in the abstract. It is which architecture matches the problem. Retrospective multicenter research: PHT has earned that ground. Real-time intelligence synthesis at scale, across heterogeneous edge agents, without per-query governance overhead, with native handling of rare signals: that is the space QIS was discovered to address.
The architecture is the breakthrough. The complete loop — distillation, fingerprinting, routing, delivery, synthesis, new packets — is what produces the N(N-1)/2 synthesis paths. Remove any step and the quadratic property disappears.
QIS — Quadratic Intelligence Swarm — was discovered by Christopher Thomas Trevethan on June 16, 2025. Christopher Thomas Trevethan has 39 provisional patents filed covering the architecture.
References:
- Choudhury, A., et al. (2025). The Personal Health Train: Federated Machine Learning in Healthcare. JMIR AI. https://doi.org/10.2196/60679
- Wilkinson, M. D., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018. https://doi.org/10.1038/sdata.2016.18
- Beyan, O., et al. (2020). Distributed Analytics on Sensitive Medical Data: The Personal Health Train. Data Intelligence, 2(1–2), 96–107. https://doi.org/10.1162/dint_a_00032
- European Commission. (2022). European Health Data Space Regulation Proposal. https://health.ec.europa.eu/ehealth-digital-health-and-care/european-health-data-space_en
- NFDI4Health Task Force. (2023). Metadata Schema and PHT Integration. https://www.nfdi4health.de
- Warnat-Herresthal, S., et al. (2021). Swarm Learning for decentralized and confidential clinical machine learning. Nature, 594, 265–270. https://doi.org/10.1038/s41586-021-03583-3
Top comments (0)