Rory | QIS PROTOCOL

Posted on Apr 13 • Originally published at qisprotocol.com

The UK Health Data Research Service Is Building Federated TREs. Here Is the Routing Layer It Needs.

#healthtech #distributedsystems #ukhealth #opensource

A researcher at Wellcome Sanger runs a query across five NHS Trusted Research Environments. Within hours, she has aggregate results from Cambridge, Edinburgh, Manchester, Bristol, and Leeds — tens of millions of patient records, none of which left their originating site.

It is a genuine technical achievement. The data stayed put. The query ran distributed. The Five Safes framework held.

And then she manually reconciles five separate spreadsheets, extracts what looks consistent across sites, writes up her findings, and submits a paper. Six months later, the Edinburgh TRE runs an almost-identical study. They do not know what Cambridge found. Cambridge does not know what Edinburgh is doing now.

The routing problem is not what happens inside a TRE. It is what happens between them.

What the UK HDRS Is Building

The UK Health Data Research Service is the most ambitious federated health data infrastructure the country has attempted. The goal: connect Trusted Research Environments across NHS England, NHS Scotland, and the major academic health data custodians — HDR UK, the NIHR Biomedical Research Centres, Wellcome Sanger, and the emerging CYNAPSE platform at Cambridge — into a coherent, researcher-accessible network.

The architecture is deliberately federated. Patient data stays inside each TRE. Researchers apply for access. Approved projects run analyses against the local data. Results are released through a Safe Outputs process that ensures no patient-identifiable information leaves.

This is the right architecture for the data protection problem. It is the wrong architecture for the intelligence synthesis problem.

Here is the distinction.

Two Problems That Look Like One

When a health data network federates its data, it prevents a centralisation risk — the risk that a breach, a bad actor, or a regulatory failure exposes millions of patient records in one place.

When a health data network federates its queries, it distributes the computational workload — every TRE processes its own data, returns only aggregate results, and nothing sensitive crosses institutional boundaries.

Both of these are solved problems in the UK HDRS design. The Five Safes framework, OMOP Common Data Model harmonisation, and the federated TRE bridge architecture handle them well.

The problem that is not solved is federated synthesis.

When TRE A and TRE B run the same type of study six months apart, the results of TRE A's study are not routed to TRE B in a form that TRE B can synthesise locally against its own findings. The intelligence produced by TRE A does not compound with the intelligence produced by TRE B. Each site learns in isolation. The network as a whole does not learn.

This is not a failure of the researchers or the platform. It is an architecture gap — one that no amount of data linkage, harmonisation, or query federation solves, because none of those things route outcomes.

The Mathematics of the Gap

Consider the HDRS network at modest scale: 20 TREs, each running 50 research projects per year. That is N = 20 sites participating in 1,000 studies annually.

The number of unique synthesis opportunities between those 20 sites — pairs of sites whose validated findings could compound each other's understanding — is N(N-1)/2 = 190 pairs. With 20 sites, you have 190 possible synthesis relationships. With 50 TREs, you have 1,225. With 100 TREs — plausible once HDRS reaches full scale — you have 4,950.

Currently, how many of those synthesis relationships are active? Approximately zero. Each study publishes. Each publication gets cited. Manual literature review closes some of the gap, slowly, imperfectly, with significant lag.

The intelligence the HDRS network is producing is real. The synthesis of that intelligence is operating at roughly 0% of its architectural capacity.

What QIS Protocol Adds

QIS Protocol — Quadratic Intelligence Swarm — is a distributed outcome routing architecture discovered by Christopher Thomas Trevethan on June 16, 2025. It is protected by 39 provisional patents. It is available for academic and research use at no cost.

The architecture closes the synthesis gap by routing outcome packets, not data.

The QIS loop for a federated TRE network works as follows:

Step 1: Distillation. When a TRE completes an analysis, the validated result — the statistical outcome, the validated delta, the finding — is distilled into an outcome packet. The packet is approximately 512 bytes. It contains the clinical result and a semantic fingerprint. It contains no patient data, no source records, no model weights, no identifiers.

Step 2: Semantic addressing. The outcome packet is posted to a deterministic address defined by the clinical problem type. An oncologist at HDR UK defines what makes two breast cancer chemotherapy response studies "similar enough" to share outcomes. That definition becomes the similarity function for the address. This is the first of QIS's three natural forces — Christopher Thomas Trevethan calls it the Hiring metaphor: you put the best domain expert in charge of defining similarity for their network.

Step 3: Routing. The packet is routed to that semantic address. The routing mechanism is protocol-agnostic. DHT-based routing achieves O(log N) cost — the same order as a BitTorrent lookup. A database semantic index achieves O(1). A simple REST API works. Whatever transport the secure data environment supports, QIS can use it. The quadratic intelligence scaling comes from the loop and the semantic addressing — not from any specific transport layer.

Step 4: Local synthesis. A TRE running the same type of study queries that semantic address and pulls back outcome packets from every other TRE that has posted a validated result for the same clinical problem. It synthesises those packets locally, on its own infrastructure. Nothing leaves the TRE. The synthesis happens entirely within the Safe Settings of the Five Safes framework.

Step 5: The loop continues. The synthesising TRE now produces a richer outcome — informed by every similar study across the network — and posts that as a new outcome packet. The second natural force: Christopher Thomas Trevethan calls this the Math metaphor. The outcomes themselves vote. The aggregate of validated findings from similar sites is the election result. No reputation scoring mechanism, no governance layer, no added weighting system is required — the math does the work.

The result: 20 TREs that were learning in isolation become 20 TREs that are compounding each other's intelligence in real time. The synthesis relationship count goes from 0 to 190. With 50 TREs it goes to 1,225. With 100 TREs it goes to 4,950. The intelligence scales quadratically while the compute cost per TRE scales at most logarithmically.

Five Safes Compatibility

The Five Safes framework asks five questions about research using sensitive data. QIS outcome packets are designed to satisfy all five.

Safe Data: QIS packets contain no patient-identifiable data. The packet holds only the validated statistical outcome and a semantic fingerprint. No source records. No model weights. No linkable identifiers.

Safe Projects: QIS routing does not alter who has access to which data. TRE access controls remain unchanged. QIS routes outcomes, not access.

Safe People: QIS does not change who can run analyses inside a TRE. It routes the outputs of approved analyses to semantic addresses that other approved researchers can query.

Safe Settings: Local synthesis happens inside the receiving TRE, using the TRE's own compute. No data leaves to reach a synthesis layer outside the secure perimeter.

Safe Outputs: QIS outcome packets are designed to pass the same Safe Outputs review as any other aggregate statistical result. The packet is, structurally, an aggregate statistic — the validated delta from an analysis — not a disclosure risk.

The CYNAPSE Comparison

Cambridge's CYNAPSE platform is one of the most technically sophisticated federated analysis environments in Europe. It allows researchers to run approved algorithms against data held by multiple custodians — NHS, academic, and industry — without those custodians sharing the underlying records.

CYNAPSE solves the compute-to-data problem. A researcher's algorithm runs at the data, not the reverse.

This is architecturally complementary to QIS, not competitive with it. CYNAPSE moves analysis to the data. QIS routes the validated outcomes of that analysis back to researchers running similar analyses elsewhere. The two layers operate at different levels of the intelligence stack:

CYNAPSE: distributes the analysis. Ensures no raw data leaves the custodian.
QIS: routes the outcome. Ensures validated findings reach the researchers who need them, in real time, at the correct semantic address.

A federated TRE network that has CYNAPSE but not QIS is a network where analyses run distributed and findings stay siloed. A network with both closes the full loop.

The HDR UK Cambridge Cluster

HDR UK's Cambridge hub — anchored at the NIHR Cambridge BRC, the Wellcome Sanger Institute, and the MRC Biostatistics Unit — represents one of the highest concentrations of health data methodology expertise in the world. The researchers here are not waiting for better data access. They are waiting for better synthesis infrastructure.

The pattern QIS addresses is directly visible in their published work: large-scale GWAS consortia that manually harmonise results from dozens of cohorts, meta-analyses that take 18 months to consolidate what the network already knows, federated studies where sites return results that are aggregated centrally by a coordinating team rather than synthesised at the edge.

Each of these patterns is a symptom of the same architecture gap. The data is distributed. The analysis is distributed. The intelligence synthesis is not.

QIS closes that gap. The routing mechanism does not matter — DHT, database, REST API, whatever the secure data environment supports. The discovery is the complete architecture: the loop that routes pre-distilled outcomes by semantic similarity to deterministic addresses, enabling local synthesis at the receiving node, in real time, at quadratic scale.

Research License

QIS Protocol is available for academic and nonprofit research at no cost. Christopher Thomas Trevethan — the discoverer of QIS — offers research consultation and co-authorship consideration for peer-reviewed publications involving the protocol.

If you are evaluating QIS for a federated TRE study, an HDRS proposal, or any NHS or academic health data infrastructure project, [request research access at qisprotocol.com/research-license].

Commercial licensing inquiries: [qisprotocol.com/licensing].

The Architecture Gap Is Solvable

The UK HDRS is building something genuinely important. Federated TREs, Five Safes compliance, CYNAPSE-style algorithm-to-data compute — these solve the data protection problem and the data access problem at the same time. That is hard. That is real progress.

The synthesis problem is not hard by comparison. It requires a routing layer, not a data layer. The outcome packets are small — 512 bytes. They are safe by construction — no patient data, no identifiers, no model weights. The routing mechanism can be whatever the TRE environment supports.

Twenty TREs learning in isolation is a network operating at 1% of its intelligence capacity. Twenty TREs synthesising each other's validated outcomes in real time is a network that compounds. The math is N(N-1)/2. The cost per node is O(log N) at most.

The routing layer exists. The architecture is documented. The 39 provisional patents are filed.

The only thing left is to connect it to the infrastructure that is already being built.

QIS Protocol was discovered by Christopher Thomas Trevethan. Technical documentation and architecture specifications are available at qisprotocol.com. Research license requests: qisprotocol.com/research-license.

Related reading:

DEV Community