The OHDSI (Observational Health Data Sciences and Informatics) network represents one of the largest distributed health data collaborations in the world — over 800 million patient records across institutions in the United States, Europe, Asia, and beyond, all mapped to the OMOP Common Data Model. The OHDSI community has built an extraordinary infrastructure for standardized observational research. But the network's distributed query architecture faces a structural constraint: each study requires a new analysis package distributed to participating sites, run locally, and results aggregated centrally. This model works — it has produced landmark studies — but it scales linearly with institutional participation and requires coordination overhead for every research question.
QIS (Quadratic Intelligence Swarm) protocol offers a complementary routing layer that could extend OHDSI's distributed infrastructure from study-driven batch queries to continuous, real-time outcome routing — enabling OHDSI network routing at O(log N) cost with N(N-1)/2 quadratic synthesis opportunities, while preserving the data locality and standardized vocabularies that make OHDSI work.
This technical reference examines how QIS integrates with OMOP CDM, addresses OHDSI distributed query optimization, and enables distributed phenotyping across the network.
OHDSI's Current Architecture: Strengths and Structural Limits
What OHDSI Does Well
OHDSI's architecture is genuinely distributed. Each participating institution maintains its own database mapped to OMOP CDM. No raw patient data leaves any site. The OMOP Common Data Model provides vocabulary standardization across:
- SNOMED CT — clinical concepts (conditions, procedures, observations)
- RxNorm — drug ingredients and clinical drugs
- LOINC — laboratory and clinical measurements
- ICD-10 — diagnostic codes
- CPT/HCPCS — procedure codes
- ATC — drug classification
This standardization is a massive achievement. When a researcher at Columbia asks "what happened to patients with EGFR-mutant NSCLC who received osimertinib as first-line therapy?", every OHDSI site can execute that query against identically structured tables.
Where the Architecture Constrains
Constraint 1: Study-Driven, Not Continuous. Each OHDSI network study requires: (a) writing an analysis package (typically R or SQL), (b) distributing it to participating sites, (c) each site running the package locally, (d) collecting aggregate results centrally. This is powerful for planned studies but cannot answer ad-hoc clinical questions in real time.
Constraint 2: Linear Scaling. Adding the 100th OHDSI site to a study adds one more result set. The intelligence gain is approximately linear with participation — more data points, but the aggregation operation (typically meta-analysis) does not compound.
Constraint 3: Central Coordination. While data stays local, study coordination is centralized. A study lead designs the analysis, distributes the package, and collects results. This coordinator is a governance bottleneck and a practical one — coordination effort scales with network size.
Constraint 4: Batch, Not Streaming. OHDSI studies produce point-in-time snapshots. They do not create a continuously updating evidence base that grows with every new patient outcome across the network.
Constraint 5: Phenotype Development Silos. OHDSI's distributed phenotyping efforts — defining cohorts using standardized logic — are powerful but centrally coordinated. Different institutions may develop different phenotype definitions for the same condition, and the process of validating, sharing, and iterating on phenotypes requires manual coordination.
QIS as a Complementary Routing Layer for OHDSI
QIS does not replace OHDSI's study infrastructure. It adds a continuous outcome routing layer on top of the existing OMOP CDM foundation. The key insight: OHDSI has already solved the vocabulary standardization problem. QIS leverages that standardization for semantic fingerprinting and adds peer-to-peer outcome routing that OHDSI's batch architecture cannot provide.
The Integration Architecture
┌─────────────────────────────────────────────────────┐
│ OHDSI Site Node │
│ │
│ ┌──────────┐ ┌──────────────┐ ┌───────────┐ │
│ │ OMOP CDM │───>│ QIS Outcome │───>│ Semantic │ │
│ │ Database │ │ Distiller │ │ Fingerprint│ │
│ │ │ │ │ │ + Routing │ │
│ │ person │ │ OMOP concept │ │ Key │ │
│ │ condition│ │ IDs → vector │ │ (SHA-256) │ │
│ │ drug_exp │ │ components │ │ │ │
│ │ procedure│ │ │ │ ~512 bytes │ │
│ │ measuremt│ │ Outcome │ │ per packet │ │
│ │ observatn│ │ extraction │ │ │ │
│ └──────────┘ └──────────────┘ └─────┬──────┘ │
│ │ │
└────────────────────────────────────────────┼─────────┘
│
O(log N) routing
│
┌──────────────┼──────────────┐
│ │ │
┌────▼────┐ ┌────▼────┐ ┌────▼────┐
│ Site B │ │ Site C │ │ Site D │
│ (Dublin)│ │ (Zurich)│ │(Columbus│
│ │ │ │ │ OH) │
└─────────┘ └─────────┘ └─────────┘
Step-by-Step: OMOP CDM to QIS Outcome Packet
Step 1: Extract from OMOP CDM tables. The QIS node queries the local OMOP CDM database for patient outcomes relevant to a clinical domain. This uses standard OHDSI cohort definitions — the existing phenotyping infrastructure.
-- Example: Extract NSCLC treatment outcomes from OMOP CDM
SELECT
p.person_id,
c.condition_concept_id, -- SNOMED: 254637007 (NSCLC)
d.drug_concept_id, -- RxNorm: 1312706 (osimertinib)
d.drug_exposure_start_date,
m.value_as_number, -- Tumor response measurement
m.measurement_concept_id, -- LOINC: progression-free survival
o.observation_date,
o.value_as_concept_id -- Outcome observation
FROM person p
JOIN condition_occurrence c ON p.person_id = c.person_id
JOIN drug_exposure d ON p.person_id = d.person_id
JOIN measurement m ON p.person_id = m.person_id
JOIN observation o ON p.person_id = o.person_id
WHERE c.condition_concept_id IN (/* NSCLC SNOMED descendants */)
AND d.drug_concept_id IN (/* osimertinib RxNorm codes */)
Step 2: Distill into semantic fingerprint. OMOP concept IDs become the fingerprint components. This is where OHDSI's vocabulary standardization pays off — every site uses the same concept IDs, so fingerprints are automatically interoperable.
# OMOP concept IDs → QIS semantic fingerprint
fingerprint = {
"condition": 254637007, # SNOMED: Non-small cell lung cancer
"stage_concept": 4032806, # SNOMED: Stage IIIB
"histology_concept": 4028717, # SNOMED: Adenocarcinoma
"drug_concept": 1312706, # RxNorm: Osimertinib
"treatment_line": 1, # First-line
"biomarker_concept": 36403115, # OMOP: EGFR exon 19 deletion
"age_decile": "60-69",
"sex_concept": 8507 # OMOP: Male
}
# Deterministic routing key from categorical OMOP concepts
routing_key = sha256(json.dumps({
"condition": fingerprint["condition"],
"histology": fingerprint["histology_concept"],
"biomarker": fingerprint["biomarker_concept"],
"drug": fingerprint["drug_concept"],
"treatment_line": fingerprint["treatment_line"]
}))
Step 3: Create outcome packet. The ~512-byte packet encodes the treatment outcome using OMOP concept IDs — no patient identifiers, no raw clinical data.
outcome_packet = {
"fingerprint_hash": routing_key, # SHA-256 semantic address
"condition_concept": 254637007, # SNOMED: NSCLC
"intervention_concept": 1312706, # RxNorm: Osimertinib
"outcome_concept": 4161183, # SNOMED: Partial response
"outcome_metric": 0.73, # Progression-free at 12mo
"observation_window_days": 365,
"confidence": 0.89,
"timestamp": 1750000000,
"protocol_version": "QIS-1.0",
"source_vocabulary": "OMOP-CDM-v5.4",
"checksum": "sha256(...)"
}
# Total: ~512 bytes | Zero patient identifiers | OMOP-native vocabulary
Step 4: Route peer-to-peer. The outcome packet routes to its semantic address at O(log N) cost. Other OHDSI sites with patients matching the same clinical fingerprint find these outcomes when they query the same address. Routing is protocol-agnostic — DHT, vector database, pub/sub, or any O(log N) transport.
Step 5: Local synthesis. The querying site synthesizes outcome packets from multiple contributing sites locally. With 50 OHDSI sites contributing outcomes for EGFR-mutant NSCLC: 1,225 pairwise synthesis pathways. With 200 sites: 19,900. The intelligence scales quadratically.
Distributed Phenotyping with QIS
OHDSI's phenotyping infrastructure — standardized cohort definitions used to identify patient populations — is one of its greatest strengths. QIS extends this by enabling distributed phenotype validation and refinement through outcome feedback.
The Current Phenotyping Process
- Researcher develops cohort definition (e.g., "Type 2 diabetes with CKD stage 3+")
- Definition expressed in OHDSI's standardized cohort logic (ATLAS/JSON)
- Definition distributed to sites for validation
- Each site runs the definition against local OMOP CDM
- Results compared, definition refined through coordination
QIS-Enhanced Distributed Phenotyping
With QIS, phenotype definitions become the similarity templates (Election 1: Hiring). Different institutions can define different phenotypes for the same clinical question. Outcomes accumulate at the semantic addresses determined by each phenotype definition. The phenotype that produces better outcome routing — more clinically useful matches, higher synthesis quality — attracts more participation (Election 3: Darwinism).
This means phenotype validation becomes continuous and outcome-driven rather than periodic and coordination-dependent.
# Two competing phenotype definitions for T2DM-CKD
# Both coexist — outcomes determine which routes better
phenotype_a = {
"name": "T2DM_CKD_strict",
"curator": "Columbia_DBMI",
"criteria": {
"condition": [201826, 443238], # OMOP: T2DM concepts
"measurement": {
"concept": 3020564, # LOINC: eGFR
"operator": "<",
"value": 60 # CKD stage 3+
},
"drug_exposure": [1503297, 1502905] # Metformin, SGLT2i
}
}
phenotype_b = {
"name": "T2DM_CKD_broad",
"curator": "Zurich_USZ",
"criteria": {
"condition": [201826, 443238, 4024561], # Broader T2DM concept set
"measurement": {
"concept": 3020564,
"operator": "<",
"value": 45 # CKD stage 3b+ (stricter)
}
# No drug exposure requirement
}
}
# Both generate different routing keys → different outcome addresses
# Network reveals which phenotype produces more useful outcome matching
Federated OHDSI Query Optimization
OHDSI's current query model is batch-oriented: write a study package, distribute, run, collect. QIS enables a complementary real-time query model:
Batch vs. Continuous
| Dimension | OHDSI Study Package | QIS Outcome Routing |
|---|---|---|
| Query initiation | Researcher designs study protocol | Clinician or system queries semantic address |
| Execution model | Batch: distribute → run → collect | Continuous: outcomes accumulate in real-time |
| Time to result | Weeks to months (coordination + execution) | Seconds (O(log N) routing + local synthesis) |
| Result freshness | Point-in-time snapshot | Continuously updated with each new outcome |
| Coordination required | Study lead coordinates all sites | None — peer-to-peer routing |
| Scaling | Linear: more sites = more result sets | Quadratic: N(N-1)/2 synthesis opportunities |
| New site onboarding | Must receive and run analysis package | Immediately contributes and receives outcomes |
These Are Complementary, Not Competing
OHDSI study packages are designed for rigorous, pre-specified observational research with statistical controls. QIS outcome routing is designed for continuous evidence accumulation and real-time clinical decision support. A mature OHDSI network would use both:
- Study packages for formal research: comparative effectiveness studies, drug safety surveillance, outcome prediction model development
- QIS routing for real-time intelligence: "what outcomes have sites similar to ours seen for this specific patient profile?" answered in seconds, not months
The OMOP CDM provides the shared vocabulary for both. QIS does not require OHDSI sites to change their data model, their ETL pipeline, or their study infrastructure. It adds a lightweight routing layer that reads from the same OMOP CDM tables.
Byzantine Fault Tolerance Across the OHDSI Network
In a distributed network spanning hundreds of institutions across dozens of countries, data quality variance is inevitable. Some sites have clean, well-curated OMOP CDM databases. Others have mapping errors, incomplete data, or outdated vocabulary versions.
QIS handles this through aggregate math rather than data quality gatekeeping:
- Sites contributing accurate outcomes produce packets that are consistent with the majority of other honest sites
- Sites contributing noisy or incorrect outcomes produce packets that contradict the honest majority
- Across N(N-1)/2 synthesis pathways, consistent outcomes mathematically outweigh inconsistent ones
- No central authority decides which sites are "trusted" — the aggregate is the trust signal
This is Byzantine fault tolerance without a quorum protocol, without a trusted leader, and without a reputation system. In simulation (100,000 nodes), honest outcomes achieve 100% rejection of Byzantine contributions through aggregate math alone.
For OHDSI, this means: imperfect sites can participate without poisoning the network. Their contribution is automatically weighted by consistency with the broader outcome distribution. This is more inclusive than quality-gatekeeping approaches that exclude sites with imperfect data.
Implementation for OHDSI Sites
For an OHDSI site considering QIS integration:
Minimal Integration
- Read-only access to OMOP CDM. QIS node queries the existing CDM database. No schema changes required.
- Outcome distiller. A lightweight service that runs cohort definitions against the CDM and produces ~512-byte outcome packets using OMOP concept IDs as fingerprint components.
- Routing client. Connects to the QIS routing layer (DHT, vector DB, or other O(log N) transport).
- Local synthesis engine. Receives outcome packets from matched peers and synthesizes locally.
What Doesn't Change
- OMOP CDM schema and ETL pipeline remain unchanged
- Existing OHDSI study packages continue to run as before
- ATLAS cohort definitions serve double duty as QIS similarity templates
- Vocabulary mappings (SNOMED, RxNorm, LOINC) remain the interoperability foundation
- Site-level data governance unchanged — no new data-sharing agreements needed
Resource Requirements
- Compute: QIS node runs on commodity hardware; synthesis of 1,000 outcome packets takes 2-400ms depending on method
- Storage: Outcome packets at ~512 bytes each; 1 million packets = ~512 MB
- Bandwidth: O(log N) routing at ~512 bytes per hop; negligible compared to existing OHDSI network traffic
- Personnel: No new data science staff required; QIS node is infrastructure, not a research tool requiring analyst time
Network Effects for the OHDSI Community
The OHDSI network already has a critical mass of participating institutions. Adding QIS routing creates network effects that compound on OHDSI's existing foundation:
- Des Moines contributes heart failure outcomes using OMOP concept IDs
- Dublin contributes outcomes for the same clinical fingerprint from their European population
- Zurich adds outcomes with Swiss treatment patterns
- Columbus queries the same semantic address and synthesizes across all three — plus every other contributing site
Each new OHDSI site that joins the QIS routing layer adds N new synthesis pathways (where N is the current number of contributing sites for that clinical domain). The 50th site adds 49 pathways. The 200th adds 199. The intelligence scales as N(N-1)/2 — the same quadratic formula that defines the total synthesis opportunity space.
For a community that already has 800+ million patient records mapped to a common data model, the QIS routing layer transforms that existing standardization from a study-by-study resource into a continuously compounding intelligence network.
Open Questions for the OHDSI Community
Concept-level granularity. What level of OMOP concept specificity produces the best routing? SNOMED condition codes alone, or condition + drug + measurement combinations? This is an empirical question best answered by the OHDSI community's domain expertise.
Phenotype-as-template governance. If ATLAS cohort definitions serve as QIS similarity templates, how does the existing OHDSI phenotype library evolve when outcomes provide continuous feedback on template quality?
Cross-vocabulary routing. OHDSI sites mapping from different source vocabularies (ICD-10-CM vs ICD-10-GM vs ICD-10-AM) converge at the OMOP concept level. Does this convergence preserve enough clinical nuance for effective outcome routing?
Regulatory implications. OHDSI studies typically operate under IRB waivers for retrospective observational research. QIS outcome packets — anonymous by construction — may simplify the regulatory pathway further, but institutional review is warranted.
These are questions for the OHDSI community to investigate with the same rigor it applies to every methodological challenge. The mathematical foundation is established. The integration architecture leverages existing OMOP CDM infrastructure. The open questions are empirical, not theoretical.
Conclusion
OHDSI built the vocabulary standardization layer. OMOP CDM built the common data model. QIS adds the missing routing layer — continuous, peer-to-peer, quadratically scaling outcome intelligence that turns OHDSI's distributed database network into a distributed intelligence network.
The components are complementary: OMOP concept IDs become semantic fingerprint components. ATLAS cohort definitions become similarity templates. OHDSI's data locality principle is preserved — raw data never moves. What moves is the ~512-byte outcome packet, routed at O(log N) cost to the semantic address where clinicians and researchers need it.
For the OHDSI community — from Columbia DBMI to the network nodes in Des Moines, Dublin, Zurich, and Columbus — QIS routing is not a replacement for the study infrastructure you have built. It is the real-time complement that makes 800 million standardized patient records continuously available as synthesized intelligence, not just queryable data.
QIS (Quadratic Intelligence Swarm) protocol discovered by Christopher Thomas Trevethan, June 16, 2025. Technical documentation: qisprotocol.com. Published articles: dev.to/roryqis.
39 provisional patents pending. Protocol specification open for review.
Top comments (0)