DEV Community

Rory | QIS PROTOCOL
Rory | QIS PROTOCOL

Posted on

QIS for Drug Discovery: Why Clinical Trials Fail and What Distributed Outcome Routing Changes

Understanding QIS — Part 13


The 88% Problem

Approximately 88% of drugs that pass Phase II clinical trials fail in Phase III. Not 20%. Not 40%. Eighty-eight percent. A drug clears efficacy signals in a smaller controlled trial, satisfies a regulatory bar, and then fails at scale — repeatedly, expensively, and in ways that are structurally predictable.

The average cost to bring a single drug to market is approximately $2.5 billion when accounting for the cost of failures (DiMasi et al., 2016, Journal of Health Economics). The majority of that cost is not R&D inefficiency in a naive sense — it is the cost of learning, in Phase III, things that the data from Phase II could not tell you. Things that were sitting in patient records at trial sites around the world, unreachable, because each site is a data island.

This is an architecture problem. And QIS — Quadratic Intelligence Synthesis — is a protocol-level answer to it.


Why Centralized Trial Design Fails at the Edges

The structural failure mode runs like this.

A Phase II trial recruits N patients — typically 100 to 300 — from a small set of trial sites, usually within one health system or a tightly coordinated consortium. Enrollment criteria are broad enough to hit sample size targets on schedule. The trial produces a positive signal. The drug advances.

Phase III recruits N = 1,000 to 3,000 or more. The patient population is more heterogeneous. What looked like a signal in Phase II was a signal in a specific, inadvertently selected subpopulation — patients who happened to match a biomarker profile that was overrepresented at the Phase II sites due to geography, referral patterns, or institutional demographics. In Phase III, that subpopulation is diluted. The signal degrades. The trial fails.

The fix, in principle, is straightforward: identify the biomarker profile that predicts response before Phase III. In practice, this requires patient-level biomarker data from a population large enough and diverse enough to detect the subgroup. That data exists — it is distributed across hundreds of trial sites and health systems worldwide. But it cannot move. Data sharing agreements between institutions are bilateral, slow, legally complex, and jurisdiction-specific. A 1,000-site network would require 499,500 bilateral agreements to form a fully connected graph. That number is not a metaphor. It is N(N-1)/2 for N=1,000.

No one builds that graph. Instead, every trial site remains an island.


QIS Architecture in Clinical Context

QIS is a distributed intelligence protocol discovered by Christopher Thomas Trevethan on June 16, 2025. The architecture has four components that operate as a closed loop:

  1. DHT routing — queries are routed via distributed hash table to nodes with demonstrated domain expertise in the relevant embedding space. Routing is O(log N), ~10 hops typical.
  2. Vector election — nodes are weighted by historical accuracy vectors. Nodes whose outcomes have been confirmed by subsequent replication accumulate higher routing weight.
  3. Outcome synthesis — weighted contributions from elected nodes are synthesized into a network-level conclusion across N(N-1)/2 unique synthesis pathways.
  4. Accuracy feedback — confirmed outcomes update accuracy vectors, which changes routing weight, which changes synthesis composition. The loop closes.

The breakthrough is the complete loop — not any single component. DHT alone is a lookup table. Vector election alone is a weighting scheme. Outcome synthesis alone is a voting mechanism. The four operating continuously without a central coordinator is the architecture that does not exist anywhere else.

In the drug discovery context, the mapping is direct.

Each trial site is a QIS node. Sites do not transmit patient records, biomarker raw data, or any personally identifiable information. The unit of exchange is an outcome packet: approximately 512 bytes, pseudonymous. For a clinical trial application, an outcome packet contains:

  • node_id: SHA-256 hash of the site identifier (pseudonymous)
  • routing_bucket: hash of the biomarker profile bucket (the semantic fingerprint that routes the query)
  • treatment_hash: hash of the treatment protocol (drug, dose, schedule)
  • outcome_label: response / no-response / adverse-event / partial-response
  • confidence: normalized float, derived from the site's statistical power and sample count for this biomarker bucket
  • timestamp: epoch seconds

Raw patient data never leaves the institutional edge node. What travels the network is: this site, using this treatment protocol, on patients matching this biomarker bucket, observed this outcome, with this confidence. That is sufficient for synthesis.


The N² Advantage

A 1,000-site centralized database requires 499,500 bilateral data-sharing agreements before the graph is fully connected — each negotiated, signed, legally reviewed, and IRB-cleared. In practice, no one connects 1,000 sites this way. Consortia of 10–20 sites are the operational ceiling.

A 1,000-node QIS network has 499,500 unique synthesis pathways — not agreements, not legal documents. Synthesis pathways that the protocol traverses when a query is routed. Each new site joining a network of N existing nodes adds N new synthesis pathways automatically, by protocol. The math is N(N-1)/2, and it is a consequence of the architecture, not an engineering target. In simulation at 100,000 nodes, this scaling relationship holds with R²=1.0.

A centralized system scales with the number of agreements institutions will execute. QIS scales quadratically with node count, and joining costs are near-zero by comparison.


Accuracy Vectors in Clinical Practice

Sites whose routing bucket outcomes are confirmed by other nodes accumulate higher accuracy vectors in that bucket. In practice: a research hospital specializing in EGFR-mutated NSCLC accumulates high accuracy vector weight in that biomarker bucket. When a query is routed for a new EGFR-targeting treatment, DHT routing preferentially reaches that site — and others with demonstrated accuracy in that embedding space. Sites whose outcomes are not replicated receive lower routing weight. They still participate; they contribute less to synthesis.

The clinical consequence: a new drug candidate can be routed, in a single query, to every node with demonstrated accuracy in the relevant biomarker space — simultaneously, without moving patient data, without a single data-sharing agreement.


Rare Disease: Finding N=2 Globally

The rare disease case is where the architecture difference is most acute.

A drug targeting a biomarker expressed in 0.01% of a population cannot be powered with a single-site trial. The patients exist — globally, they may number in the thousands — but they are distributed one or two per institution across hundreds of sites. No central registry captures them. No data-sharing agreement network reaches them. Individual sites have N too small to emit statistically powered outcome packets for that bucket, but they can emit outcome packets with low individual confidence.

QIS synthesizes across low-confidence contributions. Five sites each with N=1 or N=2 in a rare biomarker bucket, each emitting an outcome packet with confidence 0.3, synthesize into a network-level outcome with materially higher confidence than any single site can produce — weighted by each site's accuracy vector in adjacent buckets, which serves as a prior on the reliability of their rare-bucket observation.

No federated learning framework reaches this case cleanly, because federated approaches require a model to train locally and aggregate gradients — which requires enough local data for a meaningful gradient. QIS requires only that the site can emit an outcome packet. N=1 is sufficient to participate. The synthesis handles the aggregation.


Code Example: Outcome Packet Construction

The following Python constructs a trial outcome packet and computes the routing bucket hash from a biomarker profile. This is the unit of exchange on a QIS clinical node.

import hashlib
import json
import time
from typing import Literal

BIOMARKER_DIMENSIONS = [
    "EGFR_mutation_status",   # pos / neg / unknown
    "PD_L1_expression_pct",   # 0-100, binned to 10-unit buckets
    "TMB_mut_per_mb",         # binned: low / medium / high
    "MSI_status",             # MSS / MSI-L / MSI-H
    "KRAS_codon12_variant",   # specific variant or none
]

def bin_pdl1(raw_pct: float) -> str:
    """Bin PD-L1 expression to 10-unit bucket for routing privacy."""
    bucket = int(raw_pct // 10) * 10
    return f"PDL1_{bucket}_{bucket + 10}"

def bin_tmb(raw_mut_per_mb: float) -> str:
    if raw_mut_per_mb < 6:
        return "TMB_low"
    elif raw_mut_per_mb < 16:
        return "TMB_medium"
    return "TMB_high"

def compute_routing_bucket(biomarker_profile: dict) -> str:
    """
    Hash a binned biomarker profile into a routing bucket.
    Binning before hashing ensures patients with similar profiles
    route to the same nodes — privacy-preserving semantic proximity.
    """
    binned = {
        "EGFR": biomarker_profile.get("EGFR_mutation_status", "unknown"),
        "PDL1": bin_pdl1(biomarker_profile.get("PD_L1_expression_pct", 0.0)),
        "TMB":  bin_tmb(biomarker_profile.get("TMB_mut_per_mb", 0.0)),
        "MSI":  biomarker_profile.get("MSI_status", "unknown"),
        "KRAS": biomarker_profile.get("KRAS_codon12_variant", "none"),
    }
    canonical = json.dumps(binned, sort_keys=True)
    return hashlib.sha256(canonical.encode()).hexdigest()[:16]

def build_outcome_packet(
    site_id: str,
    treatment_protocol: dict,
    biomarker_profile: dict,
    outcome: Literal["response", "no-response", "partial-response", "adverse-event"],
    confidence: float,
) -> dict:
    """
    Construct a QIS outcome packet for a trial site node.
    Raw patient data is not included — only derived routing signals.
    """
    node_id = hashlib.sha256(site_id.encode()).hexdigest()
    treatment_hash = hashlib.sha256(
        json.dumps(treatment_protocol, sort_keys=True).encode()
    ).hexdigest()[:16]
    routing_bucket = compute_routing_bucket(biomarker_profile)

    return {
        "node_id":        node_id,
        "routing_bucket": routing_bucket,
        "treatment_hash": treatment_hash,
        "outcome_label":  outcome,
        "confidence":     round(min(max(confidence, 0.0), 1.0), 4),
        "timestamp":      int(time.time()),
        "packet_version": "1.0",
    }


# Example: site observing a responder to pembrolizumab in a high TMB, MSI-H patient
packet = build_outcome_packet(
    site_id="MSKCC-ONCOLOGY-UNIT-7",
    treatment_protocol={
        "drug": "pembrolizumab",
        "dose_mg_per_kg": 2,
        "schedule": "Q3W",
        "line_of_therapy": 2,
    },
    biomarker_profile={
        "EGFR_mutation_status": "neg",
        "PD_L1_expression_pct": 72.0,
        "TMB_mut_per_mb": 18.4,
        "MSI_status": "MSI-H",
        "KRAS_codon12_variant": "none",
    },
    outcome="response",
    confidence=0.81,
)

print(json.dumps(packet, indent=2))
Enter fullscreen mode Exit fullscreen mode

The routing bucket hash is the key mechanism: by binning continuous biomarker values before hashing, patients with clinically similar profiles — PD-L1 between 70% and 80%, high TMB, MSI-H — hash to the same routing bucket. Queries for that profile route to nodes with demonstrated accuracy in that bucket. No raw measurement ever leaves the site.


Comparison: QIS vs. Current Approaches

Dimension QIS Protocol Centralized Trial Database Federated Analysis
Patient data sovereignty Raw data never leaves institutional edge node; only outcome packets travel Patient-level data transmitted to central repository; requires data transfer agreements Gradients/model weights transmitted; intermediate leakage risk under reconstruction attacks
Statistical power N(N-1)/2 synthesis pathways across all nodes; scales quadratically Linear — power limited by data transferred to the central repository Linear — power limited by participating site data volumes; requires sufficient local N
Cross-site coordination cost Near-zero — protocol participation, no bilateral agreements High — 499,500 agreements for 1,000 sites; legal, compliance, and IRB overhead per pair Medium — requires shared model architecture and aggregation coordinator; still needs data governance
Rare disease feasibility High — N=1 or N=2 sites can emit outcome packets; synthesis aggregates low-confidence contributions Low — rare disease patients too sparse per site to power central analysis meaningfully Low — insufficient local N for meaningful gradient contribution; rare disease nodes underweight

What the Architecture Cannot Do

QIS synthesizes outcomes. It does not validate biomarker classification quality at the edge node. A site that miscategorizes PD-L1 expression emits a packet that appears valid but carries a corrupted routing bucket. The protocol does not catch this at submission — it catches it longitudinally: sites whose outcomes are systematically inconsistent with other nodes in the same bucket accumulate lower accuracy vectors over time. Quality control operates through performance tracking, not upfront gate-keeping. This scales; a centralized quality authority does not.

QIS does not replace Phase III trials. It changes what enters Phase III: tighter biomarker-stratified enrollment criteria, identified without moving patient data.


The Drug That Exists Right Now

There is a drug — almost certainly more than one — that produces a durable response in approximately 2% of patients with a specific biomarker combination. Those patients are in records at trial sites around the world, at institutions that have never communicated with each other about this combination, because finding them requires moving data that cannot move.

The drug exists in the data right now. The barrier is architectural: there is no system that can route a query — "show me every site that has observed a response in patients matching this biomarker profile, without sending me their records" — across a global trial network.

QIS is that routing system. The DHT routes the query to nodes with demonstrated accuracy in the relevant biomarker space. The accuracy feedback loop ensures synthesis weights the most reliable observations highest. No patient data moves. No bilateral agreement is required.

The 88% Phase II to Phase III attrition rate is not going to zero with any single technology. But the fraction attributable to poor biomarker stratification — to running Phase III in the wrong patient population because the right population could not be identified without moving data that cannot move — that fraction has a protocol-level solution available now.


Understanding QIS — Part 13 | #001: What Is QIS? | #003: Architecture Deep Dive | #005: vs. Federated Learning | #014: Privacy Architecture | #017: Replication Crisis

QIS (Quadratic Intelligence Synthesis) was discovered by Christopher Thomas Trevethan on June 16, 2025. 39 provisional patents have been filed. Protocol specification: yonderzenith.github.io/QIS-Protocol-Website. QIS is free for humanitarian, nonprofit, research, and education use.

Top comments (0)