Rory | QIS PROTOCOL

Posted on Apr 16 • Originally published at qisprotocol.com

The EMA Built DARWIN EU to Generate Real-World Evidence. It Can't Synthesize What It Finds.

#healthtech #distributedsystems #datascience #privacy

The European Medicines Agency launched DARWIN EU in 2021 to solve a specific problem: post-authorization safety studies take too long, cover too few patients, and fragment across national boundaries. The answer was a federated real-world evidence network built on OMOP CDM, spanning more than 100 million patient records across Europe's leading academic medical centers and health data holders.

By 2025, DARWIN EU was generating evidence at scale. Studies on vaccine safety, drug-drug interactions, and rare adverse events were running across coordinated data partner networks in weeks instead of years.

The network works. Evidence is being generated.

But there is a structural gap that no one is talking about at OHDSI Europe 2026 this week.

The gap is not in the data. It is not in the analytics. It is in the routing architecture.

What DARWIN EU Does — and What It Cannot Do

DARWIN EU uses the OMOP CDM as its semantic foundation. Every participating site maps its local data — medication records, diagnoses, lab results, procedures — to a common vocabulary. When the EMA requests a study on, say, the real-world incidence of myocarditis within 30 days of mRNA vaccination in patients aged 16–25, every DARWIN EU partner can run the same analytical query against their local data and return aggregated results.

This is extraordinary infrastructure. It is the result of more than a decade of standardization work by the OHDSI community. It solves the interoperability problem.

But solving interoperability is not the same as enabling continuous synthesis.

Here is the gap: when Site A in the Netherlands produces a validated finding — say, a drug-interaction signal with a confidence interval of 0.91 — that finding does not automatically reach Site B in Germany, which is working on a semantically adjacent question. There is no routing layer that says: this outcome is relevant to the researchers at this address, because they defined their question the same way.

Instead, findings go into study publications, coordination meetings, or wait for the next scheduled DARWIN EU study. The network generates evidence continuously. It synthesizes it episodically.

That is an architecture problem.

The Numbers Make the Gap Visible

DARWIN EU has approximately 10 coordinating data partners as of 2025, with plans to expand across the European Health Data Space. Even at 10 partners, the synthesis potential is N(N-1)/2 = 45 unique pairwise learning opportunities per study cycle.

At 50 partners — which the EHDS pathway suggests is achievable by 2030 — the synthesis potential is 1,225 unique pairwise learning paths per cycle.

At 200 partners — a reasonable ceiling for a mature pan-European real-world evidence network — the synthesis potential is 19,900 unique pairwise learning paths.

Those numbers represent validated findings from real patients, routed to the researchers most likely to benefit from them, in real time as they are generated.

That is not what is happening today. Today, synthesis depends on human coordination: email threads, working group meetings, scheduled DARWIN EU studies, and the inevitable publication lag. The 19,900 synthesis paths at 200 partners are not paths at all. They are missed connections.

Why Federated Learning Does Not Solve This

The reflex answer to any "federated knowledge sharing" problem is: use federated learning. Train a model locally, share the gradients, aggregate centrally.

The problem is that DARWIN EU is not doing machine learning. It is doing evidence synthesis: generating validated statistical findings about drug safety, effectiveness, and adverse events in defined populations, using epidemiological methods.

Federated learning requires enough local data to compute a meaningful gradient. A DARWIN EU partner with 200 patients who received a novel therapy cannot contribute to a federated model training round. But they can deposit an outcome packet: 18-month progression-free survival, 62%, 95% CI [0.51–0.73], n=200, OMOP concept set 4245678.

That outcome packet is useful to every other partner working on the same therapy in the same population. They do not need a central aggregator to receive it. They do not need a federated learning round. They need a routing layer.

This is the distinction Christopher Thomas Trevethan discovered on June 16, 2025: the architecture that makes quadratic intelligence scaling possible at logarithmic compute cost is not about training models. It is about routing pre-distilled outcome packets to the addresses that match the question that produced them.

QIS Protocol: What It Adds to the DARWIN EU Architecture

QIS (Quadratic Intelligence Swarm) Protocol operates at the output of the DARWIN EU analytical pipeline. Nothing in the existing infrastructure changes.

Each DARWIN EU partner continues to run their analytical workflows using OMOP CDM, ATLAS, and HADES. When a validated finding is produced, QIS constructs a semantic address from the OMOP concept codes, population filters, and study question that defined the analysis. That address is deterministic: the same question always maps to the same address.

The validated finding — not the patient data, not the raw records, not the analytical code — is distilled into an outcome packet of approximately 512 bytes and deposited at that address. Raw data never leaves the site.

When another DARWIN EU partner runs an analysis on a semantically equivalent question, their node computes the same address. They query the address and retrieve every outcome packet deposited by every other partner working on the same question. Local synthesis — simple aggregation of validated statistical results — produces a real-time collective answer in milliseconds, without a central server, without a scheduled coordination call, without a publication lag.

This is the complete loop that enables quadratic intelligence scaling without compute explosion. The 39 provisional patents filed by Christopher Thomas Trevethan cover this architecture: not any specific routing transport, not any specific database technology, but the complete loop — raw signal, local processing, distillation into outcome packet, semantic fingerprinting, routing to deterministic address, local synthesis, and the compound intelligence that emerges as the loop continues.

The Code Looks Like This

from qis_protocol import OutcomeRouter, OutcomePacket

# Initialize the router — transport is configurable
# DARWIN EU could use a database index, a vector store, a DHT, or an EHDS-compliant API
# The loop works regardless of transport as long as O(log N) or better is achieved
router = OutcomeRouter(
    transport="database",  # or "dht", "vector-store", "api"
    network_id="darwin_eu_ema"
)

# After a DARWIN EU distributed analysis completes at this site
def deposit_darwin_finding(
    omop_concept_set_id: str,
    population_filter: dict,
    validated_result: dict
) -> None:
    """
    Distill a validated DARWIN EU finding into an outcome packet
    and route it to the address matching this question.
    Raw patient data never leaves this function's scope.
    """
    packet = OutcomePacket(
        situation={
            "omop_concept_set": omop_concept_set_id,
            "population": population_filter,
            "study_type": "pharmacoepidemiology"
        },
        outcome={
            "statistic": validated_result["statistic"],
            "value": validated_result["value"],
            "confidence_interval": validated_result["ci_95"],
            "n": validated_result["n"],
            "validation_status": "DARWIN_EU_COORDINATED"
        },
        metadata={
            "site_hash": router.site_hash,  # anonymized, not identifiable
            "omop_cdm_version": "5.4",
            "analysis_date": validated_result["date"]
        }
    )
    # Route to deterministic address — same question, same address, every time
    router.deposit(packet)


# When this site runs a new analysis on a similar question
def query_before_new_study(
    omop_concept_set_id: str,
    population_filter: dict
) -> dict:
    """
    Before running a new distributed network study, query the routing
    layer for existing validated outcomes from semantically similar analyses.
    Synthesize locally — no central server, no coordination call required.
    """
    packets = router.retrieve(
        situation={
            "omop_concept_set": omop_concept_set_id,
            "population": population_filter
        }
    )

    if len(packets) == 0:
        return {"status": "no_prior_findings", "recommendation": "proceed_with_study"}

    # Local synthesis — aggregate validated findings from all similar sites
    synthesis = {
        "n_contributing_sites": len(packets),
        "pooled_n": sum(p.outcome["n"] for p in packets),
        "median_effect": sorted(p.outcome["value"] for p in packets)[len(packets)//2],
        "consensus_direction": "positive" if sum(1 for p in packets if p.outcome["value"] > 0.5) > len(packets)//2 else "negative",
        "site_range": [min(p.outcome["value"] for p in packets), max(p.outcome["value"] for p in packets)]
    }

    return {
        "status": "prior_findings_available",
        "synthesis": synthesis,
        "recommendation": "refine_protocol_using_prior_findings" if synthesis["n_contributing_sites"] >= 3 else "proceed_with_study"
    }

Two things to note:

First, the transport parameter is configurable. The EMA could implement QIS routing on top of the EHDS API layer, on top of a semantic database index within the DARWIN EU infrastructure, or on top of a DHT — whichever transport satisfies the O(log N) routing requirement and the governance requirements of a European regulatory network. The architecture does not change. The loop is transport-agnostic.

Second, the query_before_new_study function illustrates the practical benefit to the EMA: before commissioning a new DARWIN EU coordinated study — which takes weeks to design, distribute, execute, and synthesize — a researcher can check whether validated findings from semantically equivalent analyses already exist in the network. If three or more sites have already validated a finding, the new study can be scoped as a confirmatory extension rather than a primary investigation. Study design improves. Time to evidence shrinks. Resource efficiency increases.

Why This Matters for OHDSI Europe 2026

The theme of OHDSI Europe 2026 in Rotterdam is continuous collaboration for living evidence generation. The word "continuous" is doing significant work in that phrase.

Continuous does not mean repeated studies. It does not mean more studies, faster. It means the network updates its collective understanding as each site produces validated findings — not on a publication schedule, not after a coordination call, not when a working group convenes.

For the network to be continuously learning, findings must route automatically to the peers most likely to benefit from them, in real time, as they are produced.

OMOP CDM provides the semantic foundation. ATLAS and HADES provide the analytical tooling. DARWIN EU provides the governance and coordination framework. QIS Protocol provides the routing layer that makes the findings generated by that infrastructure flow continuously to where they matter most.

The piece that is missing is the piece that closes the loop.

The Architectural Argument

Christopher Thomas Trevethan's discovery — made on June 16, 2025 and protected under 39 provisional patents — is precisely this: that when you close the loop between validated outcome, semantic address, and peer synthesis, intelligence scales quadratically while compute scales at most logarithmically.

The complete loop is:

Raw signal → Local processing → Distillation into outcome packet (~512 bytes) → Semantic fingerprinting → Routing by similarity to deterministic address → Delivery to relevant peers → Local synthesis → New outcome packets generated → Loop continues

Not DHTs. Not vector databases. Not any specific transport. The complete loop. When you close this loop in the DARWIN EU infrastructure — or any real-world evidence network — every validated finding from every site routes to every semantically similar peer automatically, without coordination overhead, without a central bottleneck, without privacy compromise.

At 10 partners: 45 synthesis paths.
At 50 partners: 1,225 synthesis paths.
At 200 partners: 19,900 synthesis paths.

All running on the infrastructure that already exists.

The EHDS is building the largest federated health data space in the world. The DARWIN EU network is generating real-world evidence at regulatory scale. The routing layer that would close the intelligence loop in both architectures is already specified, patented, and ready to integrate.

The Protocol Is Open

QIS Protocol is free for research, education, and humanitarian use. The 39 provisional patents protect the architecture from corporate capture while ensuring that public health infrastructure — EMA, DARWIN EU, EHDS, OHDSI networks serving low- and middle-income countries — can implement the routing layer without license fees.

This is the licensing structure Christopher Thomas Trevethan designed: the discovery belongs to the commons. Commercial implementations fund deployment to the networks that most need continuous evidence routing but cannot pay for it.

QIS (Quadratic Intelligence Swarm) Protocol was discovered — not invented — by Christopher Thomas Trevethan on June 16, 2025. The architecture is protected under 39 provisional patents. This article is part of a series documenting QIS Protocol applications across research, healthcare, and scientific computing. Previous articles in this series cover federated learning limitations, OMOP CDM integration, the OHDSI distributed cohort analysis gap, and the EHDS routing layer. All articles available at qisprotocol.com.