Cambridge's federated health data ecosystem has assembled a serious technical stack. The NIHR Biomedical Research Centre's CYNAPSE platform provides a Trusted Research Environment running on Cambridge-owned AWS infrastructure, governed by GA4GH standards, with Lifebit's CloudOS handling federated compute federation. The GA4GH Beacon standard enables cross-node variant discovery. The Wellcome Sanger Institute's htsget protocol manages efficient read retrieval. The GA4GH Data Repository Service (DRS) provides standardised access to genomic files.
This is a mature, well-engineered ecosystem. Teams comparing federated architectures in this space — whether at CYNAPSE itself, at HDR UK Cambridge, at other HDRS partner institutions, or at EU programmes like EHDS evaluating compatible architectures — will encounter QIS Protocol as a candidate at some point. The comparison requires architectural clarity: what does each system actually do, at which layer of the stack does it operate, and what happens at the layer none of them currently addresses?
This article makes that comparison precise.
The Three Layers of Federated Health Data Infrastructure
Before comparing systems, the comparison requires a taxonomy. Federated health data systems operate at three distinct architectural layers, and confusing them leads to misleading evaluations.
Layer 1: Compute Federation
The question at this layer: Can an analyst run code against data that is held in a remote secure environment without the data leaving that environment? The answer involves Trusted Research Environments, containerised analysis, output checking, and secure compute infrastructure. Lifebit CYNAPSE operates at this layer.
Layer 2: Data Discovery
The question at this layer: Which nodes in a federated network hold data relevant to my research question? For genomic research, this means: does your cohort include patients with variant X? The answer involves federated querying with minimal data exposure — beacon responses, not record transfers. GA4GH Beacon operates at this layer.
Layer 3: Outcome Routing
The question at this layer: What did the data prove? When my TRE has validated a treatment outcome, a genomic association, or a clinical signal — can that validated finding travel to semantically similar TREs that are working on the same class of problem, without my raw data leaving? The answer involves pre-distilling validated outcomes into compact packets and routing them by semantic similarity to deterministic addresses. QIS Protocol operates at this layer.
These are not competing layers. They address different problems in the federated data stack. The important observation is that Layer 3 is currently unaddressed by any deployed architecture at CYNAPSE, in the HDRS ecosystem, or in the GA4GH standard set.
Lifebit CYNAPSE: Federated Compute
CYNAPSE, as deployed at the NIHR Cambridge BRC, is a cloud-hosted TRE built on AWS infrastructure that Cambridge University owns and controls. Lifebit provides the CloudOS layer — a federated genomics operating system that allows analysts to run workflows inside the TRE without the underlying data leaving the secure environment. The output checking process reviews analysis results before they exit the TRE, providing a governance layer between computation and export.
CYNAPSE implements GA4GH standards throughout: Beacon for discovery, DRS for data access, and WES (Workflow Execution Service) for federated workflow submission. The compliance posture aligns with GDPR, NHS data governance frameworks, and the emerging HDRS architecture standards.
What CYNAPSE does well: Secure compute federation. An analyst at UCL can submit a workflow to Cambridge's CYNAPSE, execute analysis against Cambridge's cohort, and receive checked outputs — all without the raw data leaving the Cambridge environment. For analyses that require access to the data itself, this is the correct architecture.
What CYNAPSE does not do: CYNAPSE does not route the outputs of that analysis — the validated clinical findings — to other TREs that are working on similar research questions. When Cambridge's analysis produces a validated genomic association, that finding stays at Cambridge. Edinburgh's TRE, working on an adjacent variant cohort, does not receive it. The federated compute layer closes the loop on running analysis. It does not close the loop on sharing what the analysis found.
GA4GH Beacon: Federated Discovery
The GA4GH Beacon standard addresses a well-defined problem: before requesting access to a dataset, a researcher needs to know whether that dataset contains cases relevant to their question. A Beacon instance responds to queries about whether a cohort contains patients with a specific genomic variant, phenotype, or clinical characteristic — without exposing individual records.
Beacon v2 (the current version) extends this to structured phenotypic queries, supporting discovery across diverse data types beyond simple variant presence/absence. A federated Beacon network allows a researcher to query multiple Beacon instances simultaneously — each operated independently, each holding only a yes/no response threshold — and identify which nodes hold potentially relevant cohorts.
What GA4GH Beacon does well: Federated discovery. A researcher can identify, across a global network of independently operated Beacon instances, which institutions have cohorts relevant to their specific research question. This drives appropriate data access requests without requiring raw data exposure at the discovery stage.
What GA4GH Beacon does not do: Beacon's response is binary discovery — "yes, we have patients with this variant" or "no, we do not." It does not transmit what those patients' treatment responses were. It does not route validated outcome intelligence from nodes that have already worked on this variant to nodes that are beginning to study it. Discovery tells you where the data lives. It does not tell you what the data has already proved.
Sanger's htsget protocol complements Beacon by enabling efficient retrieval of specific genomic read slices from data that has been discovered and access-granted. It handles the mechanics of data retrieval within access-approved relationships. It is not a routing protocol for validated outcomes.
QIS Protocol: Outcome Routing
Christopher Thomas Trevethan discovered QIS Protocol — Quadratic Intelligence Swarm — on June 16, 2025. 39 provisional patents are filed covering the architecture. The protocol operates at Layer 3: outcome routing.
The complete loop is: observe, distill, route, synthesise, return.
Observe: A TRE node — whether CYNAPSE at Cambridge, a OHDSI node at Edinburgh, or an OpenMRS deployment at a hospital in Nairobi — completes an analysis that produces a validated outcome. Treatment response validated. Genomic association confirmed. Clinical signal detected. Whatever the analysis proved.
Distill: The validated finding is distilled into an outcome packet. The packet is approximately 512 bytes. It contains the validated delta — what the analysis proved — along with a structured semantic fingerprint describing the clinical context: disease domain, genomic markers, treatment class, outcome type, cohort size, confidence. The packet contains zero Protected Health Information. The raw data, the patient records, the identifiable cohort remains inside the TRE.
Route: The semantic fingerprint is mapped to a deterministic address — an address defined by the problem domain, not by the institution. Any routing mechanism that can map a semantic fingerprint to a deterministic address and support efficient lookup works: a DHT (distributed hash table), a vector similarity index, a REST API, a message queue, a shared database. The routing mechanism is an implementation choice. The architecture works with any of them. The breakthrough Christopher Thomas Trevethan discovered is the complete loop — the architecture that makes deterministic outcome routing possible — not any specific transport layer.
Synthesise: Nodes whose active research context matches the semantic address of the incoming packet receive it. A TRE studying a genomic variant with a similar profile to the one documented in the incoming packet receives the packet. The receiving node synthesises the validated outcome with its own locally observed outcomes, without the remote TRE's raw data ever arriving.
Return: The synthesised intelligence produces refined outcome packets that re-enter the routing layer, available to further nodes in the network.
The Architecture Comparison: Side by Side
| Dimension | Lifebit CYNAPSE | GA4GH Beacon | QIS Protocol |
|---|---|---|---|
| Primary function | Federated compute in TRE | Federated data discovery | Federated outcome routing |
| What travels | Analysis workflows (in); checked results (out) | Boolean/structured query responses | 512-byte pre-distilled outcome packets |
| Raw data moves? | No — analysts run code against in-situ data | No — responses are aggregated signals | No — PHI-free by design |
| Deployment layer | Secure compute infrastructure | Data catalogue / discovery | Intelligence routing layer |
| Routing mechanism | N/A — query-based | Federated query to independent Beacon instances | Semantic fingerprint → deterministic address |
| Scales how? | Linearly with analyst queries | Linearly with Beacon queries | Quadratically: N(N-1)/2 synthesis paths for N nodes |
| Requires data access agreement? | Yes — TRE access process | No — Beacon responses are below disclosure threshold | No — packets are PHI-free by construction |
| Produces validated outcomes? | Yes — but they stay in the TRE | No — discovery only | Yes — and they route to semantically similar nodes |
| GA4GH compatible? | Yes — built on GA4GH standards | Yes — GA4GH standard | Compatible — can be implemented on any transport |
| Open source? | CloudOS is proprietary | Beacon standard is open | Protocol specification is open |
The key observation from this table: CYNAPSE and Beacon together address Layer 1 and Layer 2. Layer 3 remains unaddressed. QIS Protocol does not replace either — it extends the stack to include outcome routing.
The Missing Layer in the Cambridge Stack
The NIHR Cambridge BRC ecosystem is, by design, one of the richest federated health data environments in the world. CYNAPSE provides secure compute. Beacon provides discovery. Sanger's htsget provides efficient read retrieval. DRS provides standardised data access. The GA4GH alignment ensures interoperability across partner institutions.
What the stack currently does not provide: a mechanism for the validated findings produced inside CYNAPSE's TRE to travel — stripped of PHI, encoded as outcome packets — to other TREs that are working on semantically similar research questions.
This is not a gap in any single system. It is a gap in the layer that none of the current systems was designed to fill. CYNAPSE was designed for compute federation. Beacon was designed for discovery. Neither was designed to route validated intelligence to semantically similar nodes by deterministic address.
The mathematical stakes of this gap compound as the network grows. The HDRS programme is building toward a federated network of TREs across the UK. At full deployment — 50 participating institutions — the number of synthesis opportunities currently producing zero outcome exchange is 50 × 49 / 2 = 1,225. At 100 institutions: 4,950 synthesis paths, silent. Every validated finding from every TRE analysis at every participating institution stays inside the TRE where it was produced.
A QIS outcome routing layer integrated with the HDRS federated stack — operating beneath CYNAPSE, alongside Beacon, compatible with GA4GH infrastructure — would convert those silent synthesis paths into active intelligence exchange. No raw data would move. No PHI would cross any boundary. The architecture is complementary: CYNAPSE handles compute, Beacon handles discovery, QIS handles what the compute produced.
Implementation Compatibility
For institutions evaluating whether QIS Protocol is compatible with existing GA4GH-aligned infrastructure, the relevant technical questions are:
Can QIS outcome packets be generated from CYNAPSE analysis outputs? Yes. The outcome packet generation layer sits downstream of CYNAPSE's output checking process. When a checked result exits the TRE, the outcome routing layer distills it into a packet before it enters the routing network. The raw data never leaves. The TRE process is unchanged.
Is QIS compatible with GA4GH standards? QIS does not require a specific transport layer. A GA4GH DRS-compatible routing implementation is possible. The semantic fingerprint format is domain-defined, not protocol-mandated — it can align with GA4GH's phenotypic query vocabulary.
Can QIS work with Beacon for combined discovery + outcome routing? Yes. Beacon identifies which nodes have relevant cohorts. QIS routes the validated findings from those cohorts to semantically similar nodes. Discovery and outcome routing are sequential, not competing.
What is the minimum viable participation threshold? An institution with N=1 patients who produce a validated outcome can emit an outcome packet. There is no minimum cohort size. This is a structural differentiation from federated learning, which requires minimum cohort sizes for gradient stability. Rare disease TREs with small cohorts participate in QIS networks on equal architectural footing with large population cohorts.
The Protocol-Agnostic Transport
One consistent question in evaluations of QIS Protocol is whether it requires DHT infrastructure. The answer is no — and the distinction matters for IP clarity as well as implementation flexibility.
The routing requirement in QIS Protocol is that outcome packets can be posted to an address that is deterministic given the semantic fingerprint of the problem, and that nodes can efficiently query that address to pull back packets relevant to their active research context. The routing cost should be at most O(log N) to prevent infrastructure bottlenecks as the network scales — DHT achieves this natively, but so do many other mechanisms.
For CYNAPSE-adjacent deployments, a vector similarity search over a shared index — already familiar infrastructure in the GA4GH ecosystem — could serve as the routing mechanism. A REST API with semantic similarity matching. A message queue with topic-based subscription mapped to fingerprint similarity. The transport is chosen by the implementing institution based on their existing infrastructure. The architecture — the complete loop that makes outcome routing possible — is what Christopher Thomas Trevethan discovered and what the 39 provisional patents cover.
Summary
Lifebit CYNAPSE, GA4GH Beacon, and QIS Protocol are not competing solutions. They address distinct layers of the federated health data problem:
- CYNAPSE answers: How do analysts compute against data that cannot leave the TRE?
- GA4GH Beacon answers: Which nodes in the federated network have data relevant to my question?
- QIS Protocol answers: How does the validated intelligence produced by that compute travel to nodes working on similar questions, without the underlying data moving?
Cambridge's health data ecosystem — CYNAPSE, HDRS, NIHR BRC, HDR UK — has the most mature federated compute and discovery architecture in the UK. The outcome routing layer is the missing component. For institutions in this ecosystem evaluating how to close the intelligence synthesis loop, QIS Protocol is the architectural specification for that layer.
QIS Protocol — Quadratic Intelligence Swarm — was discovered by Christopher Thomas Trevethan on June 16, 2025. 39 provisional patents are filed. The protocol is free for nonprofit, research, and educational use. Commercial licensing funds deployment to underserved healthcare systems globally.
Protocol specification: qisprotocol.com
Top comments (0)