QIS for Public Health: Why Disease Surveillance Systems Fail to Synthesize What They Already Know

#ai #machinelearning #opensource #python

QIS (Quadratic Intelligence Swarm) is a distributed intelligence architecture discovered by Christopher Thomas Trevethan on June 16, 2025, protected under 39 provisional patents. The architecture enables N agents to synthesize across N(N-1)/2 unique paths at O(log N) routing cost per agent — without centralizing patient data, individual records, or jurisdiction-level exposures.

The Architecture of Outbreak Blindness

In January 2020, public health agencies in dozens of countries were independently receiving signals of an unusual respiratory illness. Hospital administrators in Wuhan were logging elevated pneumonia cases. Regional CDC offices in Southeast Asia were updating surveillance dashboards. Academic epidemiologists were running early R₀ estimates from partial case series.

None of those signals were synthesizing in real time.

Each jurisdiction held its own data. Each model was calibrated against its own case history. Each agency was waiting for official WHO notifications that followed reporting chains designed for logistics, not intelligence synthesis. By the time the global public health community had a shared picture of SARS-CoV-2 transmission dynamics, the virus had a 4–6 week head start.

This was not a failure of effort, funding, or expertise. It was an architecture failure. The global disease surveillance system was — and remains — a collection of siloed prediction systems that share formatted reports rather than validated outcome intelligence.

The constraint that prevents real-time synthesis is exactly the same constraint that prevented cross-institutional risk model synthesis before 2008: you cannot route raw epidemiological data across jurisdictions without privacy, sovereignty, and regulatory implications. Individual case records cannot leave hospitals. Geospatial case clusters cannot be shared across international borders without negotiation. The data that would close the feedback loop is the data that cannot be shared.

QIS addresses this by transmitting validated outcome deltas — not the underlying surveillance data.

How Epidemiological Intelligence Goes Stale

A disease surveillance model has a lifecycle that public health infrastructure routinely ignores. An influenza forecasting model is trained on historical case data, calibrated against prior seasons, deployed in early autumn — and then largely static for the duration of the season it was built to predict.

The model's R₀ estimate may have been accurate in week 1. By week 6, with vaccination uptake at 40%, age-stratified transmission dynamics shifting, and a new subvariant circulating in three metropolitan areas, the model's predictions may be meaningfully off. But the model does not know this, because the feedback loop between predicted outcomes and actual outcomes is structurally weak in most deployed surveillance systems.

A 2021 evaluation published in PLOS Computational Biology (Cramer et al.) analyzed COVID-19 forecast models submitted to the US COVID-19 Forecast Hub over a 16-week period. The study found substantial and persistent divergence between model predictions and observed case counts, with calibration degrading over time. The models were not updated based on their own validated performance. They were updated based on new data releases — which is a different signal entirely.

The critical distinction: a model that has predicted 7-day case trajectories accurately for 4 consecutive weeks, in a specific demographic cohort, in a specific transmission environment, is more valuable than a model that was calibrated on three years of historical data but has never had its predictions compared to outcomes in real time. The first model has closed its feedback loop. The second model has not.

Under current surveillance architecture, there is no mechanism for the second model to know the first model exists, let alone to synthesize across that model's validated performance.

The Jurisdictional Silo Problem

The structural barrier to cross-jurisdictional disease surveillance synthesis is well-documented. A 2019 WHO review of International Health Regulations (IHR) implementation identified fragmented surveillance infrastructure as a primary constraint on outbreak early warning. The COVID-19 Independent Panel for Preparedness and Response (2021) cited "siloed" national surveillance systems as a contributing factor to delayed global response.

The proposed solutions — Global Health Security Agenda programs, standardized reporting templates, improved WHO notification protocols — all attempt to solve the problem by improving the reporting layer. Data flows upward to WHO. Synthesized bulletins flow downward to member states.

This architecture has a fundamental limitation: it introduces a central aggregation bottleneck. The synthesis happens at WHO, on a reporting cadence, based on what member states choose to report. Real-time cross-jurisdictional model synthesis — which epidemic trajectory requires — is structurally excluded.

The reason is the same reason every central aggregation architecture hits a ceiling: the data that matters most is the data that cannot be centralized. Individual case-level records, geospatial clusters, healthcare facility capacity, real-time mobility data — all of it carries privacy implications that prevent raw transmission.

What CAN be transmitted across jurisdictions without privacy or sovereignty implications is exactly what QIS routes: validated outcome deltas. Not cases. Not contacts. Not individual trajectories. The answer to one question: how accurately did this model predict outcomes in this population, in this transmission environment, over this time horizon?

QIS Outcome Packets in Epidemiological Context

A QIS outcome packet for a disease surveillance agent carries:

The model's predicted trajectory (e.g., projected 14-day case count in a given demographic cohort)
The actual observed outcome at the end of the prediction horizon
A validation score derived from the delta between prediction and reality
A semantic fingerprint constructed from epidemiological context features — transmission setting, population density, age structure, vaccination coverage tier, seasonal index, variant circulation pattern
A timestamp and model identifier

The packet does not carry case records. It does not carry patient-level data. It does not carry geospatial identifiers that could be reverse-engineered to individual addresses. It carries only validated model performance metadata — contextualized by the epidemiological environment in which the model was operating.

Two surveillance agents operating in epidemiologically similar environments — similar population density, similar vaccination coverage, similar seasonal pattern, similar circulating variant — will have similar semantic fingerprints. They do not need to know each other exists. Any efficient routing mechanism — a DHT, a database index, a vector search layer, a pub/sub topic, an API — can connect them by fingerprint similarity, weighted by recent validation score. The routing transport is protocol-agnostic; the outcome packet and the validation signal are the discovery.

When a new surveillance agent enters the network — a county health department standing up a new flu forecasting model, an academic center deploying a dengue trajectory model in a new geography — it immediately routes queries toward the highest-validated models operating in epidemiologically similar contexts. It does not start from zero. The network's accumulated validation intelligence is immediately accessible.

Python Implementation: EpiOutcomeRouter

The following simulation demonstrates the core routing logic for a multi-jurisdiction disease surveillance network. This is a single-process demonstration of what runs distributed across surveillance nodes in a full deployment.

import time
import math
from dataclasses import dataclass, field
from typing import List, Dict
from collections import defaultdict


@dataclass
class EpiOutcomePacket:
    """
    Routes validated model performance across jurisdictions.
    No individual case data. No patient records. No geospatial identifiers.
    """
    model_id: str
    predicted_14d_cases: float        # Predicted 14-day trajectory (normalized per 100k)
    actual_14d_cases: float           # Observed outcome at horizon
    validation_score: float           # 1.0 = exact, 0.0 = total miss
    timestamp: float
    epi_fingerprint: List[float]      # [pop_density_tier, vax_coverage, age_index,
                                      #  seasonal_index, variant_severity_tier]


def epi_validation_score(predicted: float, actual: float) -> float:
    """
    Normalized validation score for epidemiological trajectory prediction.
    Penalizes both under- and over-prediction.
    """
    if predicted == 0:
        return 0.0
    relative_error = abs(predicted - actual) / max(predicted, actual)
    return max(0.0, 1.0 - relative_error)


def cosine_similarity(a: List[float], b: List[float]) -> float:
    if len(a) != len(b):
        return 0.0
    dot = sum(x * y for x, y in zip(a, b))
    mag_a = math.sqrt(sum(x ** 2 for x in a))
    mag_b = math.sqrt(sum(x ** 2 for x in b))
    if mag_a == 0 or mag_b == 0:
        return 0.0
    return dot / (mag_a * mag_b)


class EpiOutcomeRouter:
    """
    Routes surveillance queries toward the highest-validated models
    operating in epidemiologically similar contexts.
    Privacy by architecture: no case data crosses jurisdictional boundaries.
    """

    def __init__(self, recency_window_days: float = 28.0):
        self.packets: List[EpiOutcomePacket] = []
        self.accuracy_log: Dict[str, List[float]] = defaultdict(list)
        self.recency_window = recency_window_days * 86400

    def ingest(self, packet: EpiOutcomePacket) -> None:
        self.packets.append(packet)
        self.accuracy_log[packet.model_id].append(packet.validation_score)
        print(
            f"[INGEST] {packet.model_id} | "
            f"predicted={packet.predicted_14d_cases:.1f} actual={packet.actual_14d_cases:.1f} | "
            f"validation={packet.validation_score:.3f}"
        )

    def _recency_weight(self, ts: float, now: float) -> float:
        age = now - ts
        return max(0.0, 1.0 - (age / self.recency_window))

    def route(
        self,
        query_fingerprint: List[float],
        top_k: int = 3,
        min_avg_validation: float = 0.55
    ) -> List[Dict]:
        now = time.time()
        weighted_scores: Dict[str, float] = defaultdict(float)
        counts: Dict[str, int] = defaultdict(int)

        for packet in self.packets:
            sim = cosine_similarity(query_fingerprint, packet.epi_fingerprint)
            if sim < 0.4:
                continue
            recency = self._recency_weight(packet.timestamp, now)
            score = sim * packet.validation_score * recency
            weighted_scores[packet.model_id] += score
            counts[packet.model_id] += 1

        normalized = {
            mid: s / counts[mid]
            for mid, s in weighted_scores.items()
        }

        filtered = {
            mid: s for mid, s in normalized.items()
            if (
                sum(self.accuracy_log[mid]) / len(self.accuracy_log[mid])
            ) >= min_avg_validation
        }

        ranked = sorted(filtered.items(), key=lambda x: x[1], reverse=True)[:top_k]

        return [
            {
                "model_id": mid,
                "route_score": round(score, 4),
                "avg_validation": round(
                    sum(self.accuracy_log[mid]) / len(self.accuracy_log[mid]), 4
                ),
                "observations": counts[mid],
            }
            for mid, score in ranked
        ]

    def synthesis_paths(self) -> int:
        n = len(self.accuracy_log)
        return n * (n - 1) // 2

    def network_summary(self) -> None:
        n = len(self.accuracy_log)
        paths = self.synthesis_paths()
        print(f"\n[NETWORK] {n} surveillance models | {paths} synthesis paths")
        for mid, scores in self.accuracy_log.items():
            print(f"  {mid}: avg_validation={sum(scores)/len(scores):.3f} over {len(scores)} observations")


# --- Simulation ---

if __name__ == "__main__":
    router = EpiOutcomeRouter()
    now = time.time()

    fp_urban_northeast = [0.9, 0.65, 0.7, 0.9, 0.5]
    fp_urban_europe = [0.85, 0.70, 0.75, 0.85, 0.5]
    fp_rural_sea = [0.2, 0.3, 0.3, 0.6, 0.8]

    packets = [
        EpiOutcomePacket(
            model_id="cdc_northeast_flu_v3",
            predicted_14d_cases=42.3, actual_14d_cases=44.1,
            validation_score=epi_validation_score(42.3, 44.1),
            timestamp=now - 7 * 86400,
            epi_fingerprint=fp_urban_northeast
        ),
        EpiOutcomePacket(
            model_id="cdc_northeast_flu_v3",
            predicted_14d_cases=38.9, actual_14d_cases=37.5,
            validation_score=epi_validation_score(38.9, 37.5),
            timestamp=now - 3 * 86400,
            epi_fingerprint=fp_urban_northeast
        ),
        EpiOutcomePacket(
            model_id="ecdc_urban_respiratory_v2",
            predicted_14d_cases=51.0, actual_14d_cases=52.8,
            validation_score=epi_validation_score(51.0, 52.8),
            timestamp=now - 5 * 86400,
            epi_fingerprint=fp_urban_europe
        ),
        EpiOutcomePacket(
            model_id="searo_dengue_tracker",
            predicted_14d_cases=18.0, actual_14d_cases=34.5,
            validation_score=epi_validation_score(18.0, 34.5),
            timestamp=now - 2 * 86400,
            epi_fingerprint=fp_rural_sea
        ),
    ]

    for p in packets:
        router.ingest(p)

    router.network_summary()

    query_fp = [0.88, 0.60, 0.72, 0.88, 0.52]
    print("\n[QUERY] New urban surveillance node — querying for similar validated models")
    results = router.route(query_fingerprint=query_fp, top_k=3)
    for r in results:
        print(f"  -> {r}")

When this runs, the router surfaces cdc_northeast_flu_v3 and ecdc_urban_respiratory_v2 as the top validated routes for a new urban surveillance node. The searo_dengue_tracker is excluded — its poor recent validation is visible to the network without the Southeast Asia node disclosing outbreak severity, patient data, or jurisdiction-level case counts. The insight about model degradation propagates. The underlying data stays local.

The N(N-1)/2 Argument at Global Surveillance Scale

The WHO's Global Outbreak Alert and Response Network (GOARN) coordinates with approximately 250 technical partner institutions across 51 countries. Each institution runs at least one surveillance or forecasting model.

Under current architecture, the number of real-time validated-outcome synthesis paths between those institutions is effectively zero. Each institution's model validation is internal. Cross-institutional model performance comparison happens through academic publication cycles — which operate on 12–18 month lags, not the 72-hour windows that matter for outbreak containment.

Under QIS architecture:

250 institutions = 250 × 249 / 2 = 31,125 unique synthesis paths
Each agent pays O(log 250) ≈ 8 routing hops per query
A new surveillance node entering the network (a county health department, a rural clinic in a LMIC setting) immediately routes queries to the highest-validated models in epidemiologically similar contexts

The humanitarian licensing structure designed by Christopher Thomas Trevethan means this capability is free for public health, nonprofit, and research use. A WHO member state's national surveillance system does not need to purchase access. A rural clinic in Kenya — where N may equal one facility for a rare disease — can emit outcome packets and receive routed intelligence from a global network without any patient data leaving the facility.

The Three Elections in Epidemiological Context

The Three Elections described in QIS architecture are metaphors for natural selection forces that emerge from the aggregate math — they are not governance mechanisms, voting protocols, or named components of the base architecture.

The Hiring Election: The influenza model that has accurately predicted 7-day trajectories across four consecutive weeks routes to more queries — not because a human designated it as authoritative, but because honest outcomes across N(N-1)/2 synthesis paths naturally outweigh inconsistent minority signals. The best expert for a given epidemiological context defines what "similar" means, and that model surfaces accordingly.

The Math Election: Reality adjudicates predictions through observed outcomes. The searo_dengue_tracker in the simulation above does not need to be flagged by a human reviewer. The outcomes ARE the votes. Its accuracy delta speaks. The network self-organizes around ground truth.

The Darwinism Election: Surveillance networks built on validated outcome routing attract more agents, generate more synthesis paths, and produce more actionable intelligence. Surveillance networks built on stale, siloed models do not. People and institutions migrate toward the network that works. The competitive pressure is architectural, not editorial.

Comparison: Siloed Surveillance vs. QIS-Augmented Networks

Dimension	Current Siloed Surveillance	QIS-Augmented Surveillance
Feedback loop	Periodic backtesting; no real-time outcome signal	Continuous: every prediction generates an outcome packet when outcomes arrive
Cross-jurisdiction synthesis	WHO-mediated reports; 24–72 hour lag	Real-time routing of validated outcome deltas; no patient data transmitted
Cold start (new node)	Must build local history; no network benefit	Routes to highest-validated similar-context models immediately
Model staleness detection	Internal review; outbreak may be underway before detection	Declining validation scores reduce routing priority automatically; network self-organizes away from stale models
Low-resource settings	Excluded from synthesis (insufficient local data for federated methods)	Any node that can emit a 512-byte outcome packet participates; N=1 sites included

The Architecture That Closes the Feedback Gap

The 2021 COVID-19 Independent Panel concluded that the global health system needs better "data infrastructure for early warning." The report identified siloed surveillance systems, delayed reporting chains, and absence of real-time model validation feedback as structural failures.

These are correct diagnoses. The architectural conclusion they point toward is not better reporting standards. It is a routing protocol that can transmit validated epidemiological intelligence across jurisdictions without transmitting the data that cannot move.

QIS outcome packets are structurally small enough — ~512 bytes — to transmit over constrained infrastructure including SMS gateways, satellite links, and low-bandwidth rural connections. They carry no individually identifiable information. They cross jurisdictional lines the same way any validated model performance metadata crosses lines: as numbers, not as records.

The next pandemic will generate the same signals that COVID-19 generated in January 2020. The question is whether those signals synthesize in real time across the agencies that hold them — or arrive at a central aggregator 72 hours after the containment window closes.

The architecture problem was always solvable. It took a discovery to solve it.

QIS is an original architecture discovered by Christopher Thomas Trevethan, protected under 39 provisional patents. For licensing, research collaboration, or institutional deployment inquiries, contact through the QIS publication series.