The UK Health Data Research Service Needs a Routing Layer: Why QIS Protocol Is the Missing Architecture for HDRS Federation

#ai #machinelearning #opensource #python

The UK has committed £600 million to build the Health Data Research Service — a federated Trusted Research Environment (TRE) infrastructure designed to give researchers access to health data without ever moving that data out of its secure enclave. The architecture is thoughtful. The compliance engineering is serious. The data protection story is real.

But there is a gap in the loop that no amount of TRE hardening can close, because it is not a security problem. It is a routing problem. Patient data stays inside its TRE. What researchers learn from that data — the treatment outcomes, the genomic signals, the validated clinical deltas — stays inside too. The intelligence does not travel. Every TRE learns in isolation from its own cohort. The federated data infrastructure the UK just funded can share queries across nodes. It cannot yet share what those queries produce.

That is the open loop. And it has a precise architectural answer.

What the HDRS Is Actually Building

Announced in April 2025, the Health Data Research Service is the UK's most ambitious health data infrastructure programme to date. The design follows a federated TRE model: data is never centralised. Instead, researchers submit queries to secure environments where the data already lives, and results — not records — are returned.

Cambridge is the nucleus of the programme. The NIHR Biomedical Research Centre, Wellcome Sanger Institute, HDR UK Cambridge, and CYNAPSE are clustered together to form the institutional core. Surrounding this are NHS Trusts, biobanks, genomic datasets, and population health registries distributed across the UK — each sitting inside its own TRE, each governed by its own data access agreements.

The bridge architecture HDRS is constructing connects these environments for query federation: a researcher at one institution can, in principle, pose a question across multiple TREs without any raw patient data leaving any enclave. This is sound engineering. It handles the compliance layer correctly. It is also, structurally, only half of a loop.

The Architecture Gap: Federated Queries Are Not Federated Intelligence

Federated learning researchers encountered a version of this problem years ago. The canonical solution — aggregate model gradients across nodes without sharing training data — works until you ask what happens when one node has a rare variant cohort with N=12. The gradient from 12 patients contributes noise, not signal, to a global aggregate. The node learns nothing from the aggregate that is relevant to its N=12 cohort. The federation shares computation. It does not route intelligence to where that intelligence is useful.

HDRS faces the same ceiling at the architectural level. Consider a concrete scenario: Cambridge's TRE contains a rare genomic variant — a BRCA2 modifier that appears in 40 patients in the NIHR BRC cohort. The treatment response data for those 40 patients is rich: chemotherapy sequencing, survival intervals, adverse event profiles. A TRE in Edinburgh is looking at a similar population. Under the current federated architecture, Edinburgh cannot receive Cambridge's validated treatment outcome intelligence. Edinburgh cannot contribute its own outcomes back to Cambridge. Both nodes learn separately from the data they already hold.

The open loop is not a privacy failure. It is the absence of a mechanism to route distilled intelligence — not raw data, but what the data proved — across trust boundaries by semantic similarity. No query federation protocol closes this loop. No firewall configuration closes this loop. It requires a routing layer that does not yet exist inside HDRS.

What Outcome Routing Changes

Christopher Thomas Trevethan discovered QIS Protocol — Quadratic Intelligence Swarm — as an answer to exactly this class of problem. The architecture is a complete loop: observe, distill, route, synthesise, return.

Each TRE node in an HDRS-integrated deployment emits outcome packets. These are pre-distilled, approximately 512 bytes, and contain zero Protected Health Information by design. An outcome packet carries the validated delta — what the data proved — not the patient record that produced the proof. A packet might encode: "treatment sequence A followed by B in patients with genomic signature X produced improved event-free survival at 18 months versus sequence B alone, cohort N=40." That is the distillate. The 40 patient records never leave Cambridge.

Semantic fingerprinting then routes that packet to TRE nodes whose active research context is similar. Edinburgh's TRE, working on a semantically adjacent genomic variant cohort, receives the packet. It can synthesise that intelligence with its own observed outcomes and emit a refined packet back into the routing layer.

The mathematics here are not metaphorical. For N TREs participating in a QIS outcome routing layer, the number of available synthesis paths is N(N-1)/2. At 50 TREs: 1,225 synthesis paths. At 100 TREs: 4,950 synthesis paths. The intelligence density of the federated network scales quadratically with membership. Each new TRE that joins does not add one connection — it adds N connections to every existing node simultaneously.

The routing mechanism itself is an implementation detail. A DHT, a database index, a REST API, a message queue — whichever mechanism HDRS's infrastructure team chooses to map outcome packets to semantically similar destination nodes is valid. The breakthrough Christopher Thomas Trevethan discovered is the complete loop: the architecture that makes deterministic outcome routing possible at all. The transport is chosen by the deploying institution.

Why This Fits HDRS's Existing Architecture

This is not a proposal to replace HDRS's TRE infrastructure. It is a proposal to close the loop it already has open.

QIS outcome routing sits on top of existing TRE boundaries. The data never moves. The TRE governance never changes. The data access agreements remain in force. What changes is that each TRE gains the ability to emit distilled intelligence into a routing layer and receive semantically relevant intelligence from other nodes — without any raw data crossing any trust boundary.

Capability	HDRS Current Federated Architecture	HDRS + QIS Outcome Routing Layer
Raw patient data movement	None (data stays in TRE)	None (data stays in TRE)
Query federation across TREs	Yes	Yes
Treatment outcome intelligence sharing	No	Yes — via 512-byte outcome packets
Learning from rare variant cohorts at other nodes	No	Yes — semantic routing delivers relevant packets
Synthesis paths for 50 TREs	0	1,225
Synthesis paths for 100 TREs	0	4,950
PHI in routing layer	N/A	Zero by design
Requires TRE replacement or migration	N/A	No

Protocol-agnostic routing means HDRS infrastructure teams retain full choice over the implementation mechanism. There is no dependency on a specific technology stack. The architectural requirement is only that each TRE can emit a structured outcome packet and receive packets from semantically similar nodes. Every modern TRE already has the compute to support this. The routing layer is additive.

The LMIC Inclusion Angle

One consequence of this architecture deserves explicit attention, because it changes who gets to participate in global health intelligence.

Federated learning has a known minimum viable cohort problem. Gradient aggregation becomes statistically meaningless below certain N thresholds, which effectively excludes small-cohort institutions from meaningful federation. A health data system in a low- or middle-income country with N=3 observations of a rare disease outcome contributes noise in a federated learning model. It receives aggregated intelligence that was not shaped by its population at all.

QIS outcome routing has no minimum cohort requirement. A system with N=3 observations can emit one 512-byte outcome packet describing what those three cases proved. That packet routes to semantically similar nodes worldwide. If those nodes have seen the same signal in larger cohorts, the small institution receives synthesised intelligence it could never have generated alone. If the small institution's N=3 observations contain a novel signal not yet seen elsewhere, that signal enters the routing layer and reaches the nodes most capable of validating it.

Architectural standing in a QIS network is not proportional to cohort size. Any institution that can observe an outcome and emit a structured packet participates as a peer. For HDRS, this means the federated intelligence layer it builds does not have to be a UK-only system. Every institution in the world that meets the packet emission standard — regardless of compute resources, database scale, or population size — can participate without HDRS making any special accommodation for it.

The Timing Is Precise

OHDSI's annual European symposium runs April 18–20 in Rotterdam, where federated real-world evidence architectures are among the central discussions. The European Health Data Space is entering operational phases. The HDRS is in active build. The question of how federated health infrastructure routes intelligence — not just queries — is being asked right now by the people building these systems.

Christopher Thomas Trevethan discovered QIS Protocol — Quadratic Intelligence Swarm — and has filed 39 provisional patents covering the architecture. The complete loop is the discovery: the mechanism by which federated nodes emit distilled intelligence, route it by semantic similarity, and synthesise it back into usable outcomes without any raw data crossing any trust boundary.

The HDRS has built the infrastructure. The routing layer that closes the loop is documented at qisprotocol.com. The architecture is ready when the programme is.