Rory | QIS PROTOCOL

Posted on Apr 2 • Edited on Apr 8

QIS for Drug Discovery: Why Clinical Trials Fail and What Distributed Outcome Routing Changes

#healthtech #distributedsystems #programming #science

Understanding QIS — Part 13

New to QIS? Start with the complete guide to Quadratic Intelligence Swarm — then use the QIS Glossary as your reference for every term.

The 88% Problem

Approximately 88% of drugs that pass Phase II clinical trials fail in Phase III. Not 20%. Not 40%. Eighty-eight percent. A drug clears efficacy signals in a smaller controlled trial, satisfies a regulatory bar, and then fails at scale — repeatedly, expensively, and in ways that are structurally predictable.

The average cost to bring a single drug to market is approximately $2.5 billion when accounting for the cost of failures (DiMasi et al., 2016, Journal of Health Economics). The majority of that cost is not R&D inefficiency in a naive sense — it is the cost of learning, in Phase III, things that the data from Phase II could not tell you. Things that were sitting in patient records at trial sites around the world, unreachable, because each site is a data island.

This is an architecture problem. And QIS — Quadratic Intelligence Swarm — is a protocol-level answer to it.

Why Centralized Trial Design Fails at the Edges

The structural failure mode runs like this.

A Phase II trial recruits N patients — typically 100 to 300 — from a small set of trial sites, usually within one health system or a tightly coordinated consortium. Enrollment criteria are broad enough to hit sample size targets on schedule. The trial produces a positive signal. The drug advances.

Phase III recruits N = 1,000 to 3,000 or more. The patient population is more heterogeneous. What looked like a signal in Phase II was a signal in a specific, inadvertently selected subpopulation — patients who happened to match a biomarker profile that was overrepresented at the Phase II sites due to geography, referral patterns, or institutional demographics. In Phase III, that subpopulation is diluted. The signal degrades. The trial fails.

The fix, in principle, is straightforward: identify the biomarker profile that predicts response before Phase III. In practice, this requires patient-level biomarker data from a population large enough and diverse enough to detect the subgroup. That data exists — it is distributed across hundreds of trial sites and health systems worldwide. But it cannot move. Data sharing agreements between institutions are bilateral, slow, legally complex, and jurisdiction-specific. A 1,000-site network would require 499,500 bilateral agreements to form a fully connected graph. That number is not a metaphor. It is N(N-1)/2 for N=1,000.

No one builds that graph. Instead, every trial site remains an island.

QIS Architecture in Clinical Context

QIS is a distributed intelligence protocol discovered by Christopher Thomas Trevethan on June 16, 2025. The architecture operates as a closed loop:

Edge nodes generate insight locally — each trial site processes its own patient outcomes. No raw data leaves.
Distill into ~512-byte outcome packets — pre-processed results, not model weights, not patient records.
Route by semantic similarity to a deterministic address — any efficient routing mechanism works (DHTs at O(log N), database indices, vector search, pub/sub). The routing mechanism does not matter — what matters is that a site can query an address defined by domain experts for their exact trial conditions.
Pull outcome packets from twins and synthesize locally — every site facing sufficiently similar conditions has deposited outcomes at that address. N sites produce N(N-1)/2 unique synthesis paths.
Deposit outcomes back — the loop closes. Every participant makes every other participant smarter.

The breakthrough is the complete loop — not any single component. Routing alone is a lookup table. Synthesis alone is aggregation. The complete loop operating continuously without a central coordinator is the architecture that does not exist anywhere else.

In the drug discovery context, the mapping is direct.

Each trial site is a QIS node. Sites do not transmit patient records, biomarker raw data, or any personally identifiable information. The unit of exchange is an outcome packet: approximately 512 bytes, pseudonymous. For a clinical trial application, an outcome packet contains:

node_id: SHA-256 hash of the site identifier (pseudonymous)
routing_bucket: hash of the biomarker profile bucket (the semantic fingerprint that routes the query)
treatment_hash: hash of the treatment protocol (drug, dose, schedule)
outcome_label: response / no-response / adverse-event / partial-response
confidence: normalized float, derived from the site's statistical power and sample count for this biomarker bucket
timestamp: epoch seconds

Raw patient data never leaves the institutional edge node. What travels the network is: this site, using this treatment protocol, on patients matching this biomarker bucket, observed this outcome, with this confidence. That is sufficient for synthesis.

The N² Advantage

A 1,000-site centralized database requires 499,500 bilateral data-sharing agreements before the graph is fully connected — each negotiated, signed, legally reviewed, and IRB-cleared. In practice, no one connects 1,000 sites this way. Consortia of 10–20 sites are the operational ceiling.

A 1,000-node QIS network has 499,500 unique synthesis pathways — not agreements, not legal documents. Synthesis pathways that the protocol traverses when a query is routed. Each new site joining a network of N existing nodes adds N new synthesis pathways automatically, by protocol. The math is N(N-1)/2, and it is a consequence of the architecture, not an engineering target. In simulation at 100,000 nodes, this scaling relationship holds with R²=1.0.

A centralized system scales with the number of agreements institutions will execute. QIS scales quadratically with node count, and joining costs are near-zero by comparison.

Accuracy Vectors in Clinical Practice

Sites whose routing bucket outcomes are confirmed by other nodes accumulate higher accuracy vectors in that bucket. In practice: a research hospital specializing in EGFR-mutated NSCLC accumulates high accuracy vector weight in that biomarker bucket. When a query is routed for a new EGFR-targeting treatment, DHT routing preferentially reaches that site — and others with demonstrated accuracy in that embedding space. Sites whose outcomes are not replicated receive lower routing weight. They still participate; they contribute less to synthesis.

The clinical consequence: a new drug candidate can be routed, in a single query, to every node with demonstrated accuracy in the relevant biomarker space — simultaneously, without moving patient data, without a single data-sharing agreement.

Rare Disease: Finding N=2 Globally

The rare disease case is where the architecture difference is most acute.

A drug targeting a biomarker expressed in 0.01% of a population cannot be powered with a single-site trial. The patients exist — globally, they may number in the thousands — but they are distributed one or two per institution across hundreds of sites. No central registry captures them. No data-sharing agreement network reaches them. Individual sites have N too small to emit statistically powered outcome packets for that bucket, but they can emit outcome packets with low individual confidence.

QIS synthesizes across low-confidence contributions. Five sites each with N=1 or N=2 in a rare biomarker bucket, each emitting an outcome packet with confidence 0.3, synthesize into a network-level outcome with materially higher confidence than any single site can produce. The math of aggregating real outcomes from exact twins does the work — no reputation scoring or weighting mechanism needed. Five independent low-confidence observations from sites facing the same rare condition, synthesized together, produce signal that no single site could generate alone.

No federated learning framework reaches this case cleanly, because federated approaches require a model to train locally and aggregate gradients — which requires enough local data for a meaningful gradient. QIS requires only that the site can emit an outcome packet. N=1 is sufficient to participate. The synthesis handles the aggregation.

Code Example: Outcome Packet Construction

The following Python constructs a trial outcome packet and computes the routing bucket hash from a biomarker profile. This is the unit of exchange on a QIS clinical node.

import hashlib
import json
import time
from typing import Literal

BIOMARKER_DIMENSIONS = [
    "EGFR_mutation_status",   # pos / neg / unknown
    "PD_L1_expression_pct",   # 0-100, binned to 10-unit buckets
    "TMB_mut_per_mb",         # binned: low / medium / high
    "MSI_status",             # MSS / MSI-L / MSI-H
    "KRAS_codon12_variant",   # specific variant or none
]

def bin_pdl1(raw_pct: float) -> str:
    """Bin PD-L1 expression to 10-unit bucket for routing privacy."""
    bucket = int(raw_pct // 10) * 10
    return f"PDL1_{bucket}_{bucket + 10}"

def bin_tmb(raw_mut_per_mb: float) -> str:
    if raw_mut_per_mb < 6:
        return "TMB_low"
    elif raw_mut_per_mb < 16:
        return "TMB_medium"
    return "TMB_high"

def compute_routing_bucket(biomarker_profile: dict) -> str:
    """
    Hash a binned biomarker profile into a routing bucket.
    Binning before hashing ensures patients with similar profiles
    route to the same nodes — privacy-preserving semantic proximity.
    """
    binned = {
        "EGFR": biomarker_profile.get("EGFR_mutation_status", "unknown"),
        "PDL1": bin_pdl1(biomarker_profile.get("PD_L1_expression_pct", 0.0)),
        "TMB":  bin_tmb(biomarker_profile.get("TMB_mut_per_mb", 0.0)),
        "MSI":  biomarker_profile.get("MSI_status", "unknown"),
        "KRAS": biomarker_profile.get("KRAS_codon12_variant", "none"),
    }
    canonical = json.dumps(binned, sort_keys=True)
    return hashlib.sha256(canonical.encode()).hexdigest()[:16]

def build_outcome_packet(
    site_id: str,
    treatment_protocol: dict,
    biomarker_profile: dict,
    outcome: Literal["response", "no-response", "partial-response", "adverse-event"],
    confidence: float,
) -> dict:
    """
    Construct a QIS outcome packet for a trial site node.
    Raw patient data is not included — only derived routing signals.
    """
    node_id = hashlib.sha256(site_id.encode()).hexdigest()
    treatment_hash = hashlib.sha256(
        json.dumps(treatment_protocol, sort_keys=True).encode()
    ).hexdigest()[:16]
    routing_bucket = compute_routing_bucket(biomarker_profile)

    return {
        "node_id":        node_id,
        "routing_bucket": routing_bucket,
        "treatment_hash": treatment_hash,
        "outcome_label":  outcome,
        "confidence":     round(min(max(confidence, 0.0), 1.0), 4),
        "timestamp":      int(time.time()),
        "packet_version": "1.0",
    }


# Example: site observing a responder to pembrolizumab in a high TMB, MSI-H patient
packet = build_outcome_packet(
    site_id="MSKCC-ONCOLOGY-UNIT-7",
    treatment_protocol={
        "drug": "pembrolizumab",
        "dose_mg_per_kg": 2,
        "schedule": "Q3W",
        "line_of_therapy": 2,
    },
    biomarker_profile={
        "EGFR_mutation_status": "neg",
        "PD_L1_expression_pct": 72.0,
        "TMB_mut_per_mb": 18.4,
        "MSI_status": "MSI-H",
        "KRAS_codon12_variant": "none",
    },
    outcome="response",
    confidence=0.81,
)

print(json.dumps(packet, indent=2))

The routing bucket hash is the key mechanism: by binning continuous biomarker values before hashing, patients with clinically similar profiles — PD-L1 between 70% and 80%, high TMB, MSI-H — hash to the same routing bucket. Queries for that profile route to nodes with demonstrated accuracy in that bucket. No raw measurement ever leaves the site.

Comparison: QIS vs. Current Approaches

Dimension	QIS Protocol	Centralized Trial Database	Federated Analysis
Patient data sovereignty	Raw data never leaves institutional edge node; only outcome packets travel	Patient-level data transmitted to central repository; requires data transfer agreements	Gradients/model weights transmitted; intermediate leakage risk under reconstruction attacks
Statistical power	N(N-1)/2 synthesis pathways across all nodes; scales quadratically	Linear — power limited by data transferred to the central repository	Linear — power limited by participating site data volumes; requires sufficient local N
Cross-site coordination cost	Near-zero — protocol participation, no bilateral agreements	High — 499,500 agreements for 1,000 sites; legal, compliance, and IRB overhead per pair	Medium — requires shared model architecture and aggregation coordinator; still needs data governance
Rare disease feasibility	High — N=1 or N=2 sites can emit outcome packets; synthesis aggregates low-confidence contributions	Low — rare disease patients too sparse per site to power central analysis meaningfully	Low — insufficient local N for meaningful gradient contribution; rare disease nodes underweight

What the Architecture Cannot Do

QIS synthesizes outcomes. It does not validate biomarker classification quality at the edge node. A site that miscategorizes PD-L1 expression emits a packet that appears valid but carries a corrupted routing bucket. The protocol handles this through volume: when hundreds of sites deposit accurate outcomes and one deposits a miscategorized one, the aggregate naturally overwhelms the noise. Quality control operates through the math of real outcomes at scale, not upfront gate-keeping. This scales; a centralized quality authority does not.

QIS does not replace Phase III trials. It changes what enters Phase III: tighter biomarker-stratified enrollment criteria, identified without moving patient data.

The Drug That Exists Right Now

There is a drug — almost certainly more than one — that produces a durable response in approximately 2% of patients with a specific biomarker combination. Those patients are in records at trial sites around the world, at institutions that have never communicated with each other about this combination, because finding them requires moving data that cannot move.

The drug exists in the data right now. The barrier is architectural: there is no system that can route a query — "show me every site that has observed a response in patients matching this biomarker profile, without sending me their records" — across a global trial network.

QIS is that routing system. The DHT routes the query to nodes with demonstrated accuracy in the relevant biomarker space. The accuracy feedback loop ensures synthesis weights the most reliable observations highest. No patient data moves. No bilateral agreement is required.

The 88% Phase II to Phase III attrition rate is not going to zero with any single technology. But the fraction attributable to poor biomarker stratification — to running Phase III in the wrong patient population because the right population could not be identified without moving data that cannot move — that fraction has a protocol-level solution available now.

QIS (Quadratic Intelligence Swarm) was discovered by Christopher Thomas Trevethan on June 16, 2025. 39 provisional patents have been filed. Protocol specification: yonderzenith.github.io/QIS-Protocol-Website. QIS is free for humanitarian, nonprofit, research, and education use.

DEV Community