You have 100+ climate model ensemble members in CMIP6. Some of them predicted the 2015–2016 El Niño with a 93% skill score. Others got it at 30%. Your ensemble synthesis weights them equally.
That is not a software problem. That is an architecture problem.
This is Article #037 in the "Understanding QIS" series. Previous articles covered water systems (#035) and precision medicine (#036). Here we apply the same architectural lens to climate science — and the fit is unusually precise, because climate science has already named the problem.
The Ensemble Equality Problem
The Coupled Model Intercomparison Project Phase 6 (CMIP6) coordinates climate model submissions from modeling centers around the world — ECMWF, NCAR, GFDL, the UK Met Office, CSIRO, CMA, and dozens more. Eyring et al. (2016) documented how CMIP6 standardized forcing scenarios and output formats to enable cross-model comparison at a scale that was not previously possible.
But standardized format does not mean equal quality.
Knutti et al. (2017) published a landmark analysis in Nature Climate Change identifying what they called the "model genealogy" problem: many CMIP6 submissions share code ancestry. CESM begat CESM2, which was submitted multiple times under different configurations. When you count "100 independent models," you are overcounting genuinely independent predictions. The statistical independence assumption breaks down.
At the same time, most ensemble synthesis methods weight members equally — or use simple performance metrics that are computed once, offline, against a historical validation period. A model that got the 2010–2020 decade right gets the same vote as one that got it wrong, because the weighting system was set before the decade ended.
El Niño illustrates the cost. ENSO skill varies enormously across ensemble members. Boer et al. (2016) documented predictability limits at different lead times. Some members demonstrate strong predictive skill for Niño 3.4 SST anomalies at 6-month lead times. Others show skill collapse beyond 3 months. Average them with equal weights and you degrade the signal from the best members.
The ensemble equality problem is not a gap in the literature. Climate scientists know it exists. The constraint is architectural: there is no live feedback loop between forecast outcomes and ensemble weights. Validation is retrospective. By the time a model's skill is formally assessed, the next forecast cycle is already underway using the same weights.
Why Climate Intelligence Cannot Be Centralized
ECMWF, NCAR, GFDL, the UK Met Office, CSIRO, and CMA each run separate ensemble forecasting systems. Their models use different grids, different parameterizations, different horizontal resolutions, different convection schemes. Cross-center synthesis happens through the CMIP data portal — batch submission, not real-time.
The reanalysis products that provide ground truth — ERA5 from ECMWF, MERRA-2 from NASA — ingest observations from ground stations, ocean buoys, radiosondes, and satellite retrievals. But this ingestion is not instantaneous. There is a processing lag. Model validation against observations is retrospective by design.
Palmer (2012) argued in Bulletin of the American Meteorological Society that ensemble forecasting is fundamentally about representing uncertainty in initial conditions and model structure. But ensemble diversity is only useful if the synthesis step knows which sources of diversity are currently skillful.
Federated learning is sometimes proposed as a path toward cross-center synthesis. The idea: each center trains locally, shares gradients, global model improves. The problem Knutti et al. (2017) surfaces immediately applies here too — the models are structurally heterogeneous. ECMWF's Integrated Forecasting System does not share a parameter space with NCAR's CAM. Gradient exchange across heterogeneous architectures is not directly meaningful.
And even if it were, it would not capture what climate science actually needs from distributed collaboration: not shared model weights, but shared knowledge about where each model is currently right and currently wrong.
Lorenz (1963) established the foundational limit on deterministic atmospheric predictability at roughly 2 weeks. Below that horizon, ensemble forecasting captures uncertainty in initial conditions. Above it, multi-model synthesis captures structural uncertainty. Both require knowing, continuously, which ensemble members are performing.
Santer et al. (2009) formalized statistical approaches to climate model evaluation against observations. The machinery exists. What does not exist is a live routing layer that moves validated skill information between centers in real time.
That is the architectural gap. QIS was discovered by Christopher Thomas Trevethan as a general solution to this class of problem — distributed intelligence systems where validation is continuous and heterogeneous nodes need to share outcomes without sharing raw state.
What QIS Routes Instead
QIS does not route raw model output. It does not route model weights. It routes outcome packets: validated forecast-vs-observed deltas.
The packet structure for climate science looks like this:
-
model_id: which ensemble member generated the forecast -
center_id: which modeling center ran it -
region: spatial tag (Niño 3.4, South Asian monsoon basin, Sahel, etc.) -
variable: what was forecast (SST, precipitation, 500hPa wind, etc.) -
lead_time_months: 3, 6, 12, or longer -
forecast_value: what the model predicted -
observed_value: what ERA5/MERRA-2 recorded -
skill_score: computed delta as a normalized performance metric -
validation_period: the observation window used -
ensemble_generation: CMIP5, CMIP6, or operational
A ClimateOutcomePacket like this compresses to under 512 bytes. Any node that can run a forecast, compare it to a reanalysis observation, and emit this structure participates at full architectural standing.
That last sentence matters. The Kenya Meteorological Department observes unique regional phenomena that global models systematically miss — East African short rains, Indian Ocean Dipole interactions with the East African coast, local convective patterns. If Kenya Met issues a forecast for Nairobi precipitation and that forecast verifies against CHIRPS satellite-rainfall data, that outcome packet carries the same architectural weight as one from ECMWF.
The routing is semantic. A Bangladesh Meteorological Department validation delta for Bay of Bengal cyclogenesis routes to the ECMWF South Asian monsoon ensemble, not to NCAR's Arctic configuration. Regional skill finds regional consumers. The routing layer is not a centralized aggregator — it is a distributed matching system.
The Python Implementation
from __future__ import annotations
import hashlib
import json
import random
import time
from dataclasses import dataclass, field, asdict
from typing import Optional
from collections import defaultdict
@dataclass
class ClimateOutcomePacket:
model_id: str
center_id: str
region: str # e.g. "nino34", "south_asian_monsoon", "sahel"
variable: str # e.g. "sst", "precipitation", "z500"
lead_time_months: int
forecast_value: float
observed_value: float
skill_score: float # 0.0–1.0, higher = better
validation_period: str # e.g. "2024-Q4"
ensemble_generation: str # "CMIP5", "CMIP6", "operational"
packet_version: str = "1.0"
def semantic_fingerprint(self) -> str:
"""Deterministic hash of routing-relevant fields (not values)."""
key = f"{self.center_id}:{self.region}:{self.variable}:{self.lead_time_months}"
return hashlib.sha256(key.encode()).hexdigest()[:16]
def byte_size(self) -> int:
return len(json.dumps(asdict(self)).encode("utf-8"))
@property
def delta(self) -> float:
return self.observed_value - self.forecast_value
class ClimateOutcomeRouter:
"""
Routes ClimateOutcomePackets between climate modeling centers.
Three Elections continuously update routing weights:
- CURATE: high-skill members earn elevated routing weight
- VOTE: centers whose synthesized forecasts improved over time accumulate trust
- COMPETE: regional forecast skill determines routing density
"""
def __init__(self):
self.agents: dict[str, dict] = {} # center_id -> metadata
self.routing_weights: dict[str, float] = defaultdict(lambda: 1.0)
self.packet_log: list[ClimateOutcomePacket] = []
self.center_trust: dict[str, float] = defaultdict(lambda: 0.5)
self.regional_skill: dict[tuple, float] = defaultdict(lambda: 0.5)
def register_agent(
self,
center_id: str,
name: str,
regions: list[str],
variables: list[str],
ensemble_generation: str = "CMIP6",
is_lmic: bool = False,
) -> None:
self.agents[center_id] = {
"name": name,
"regions": regions,
"variables": variables,
"ensemble_generation": ensemble_generation,
"is_lmic": is_lmic,
"packet_count": 0,
}
print(f" [REGISTER] {name} ({center_id}) | regions={regions}")
def validate_outcome(self, packet: ClimateOutcomePacket) -> bool:
"""Basic sanity checks before accepting a packet."""
if packet.center_id not in self.agents:
print(f" [REJECT] Unknown center: {packet.center_id}")
return False
if not (0.0 <= packet.skill_score <= 1.0):
print(f" [REJECT] Skill score out of range: {packet.skill_score}")
return False
if packet.byte_size() > 1024:
print(f" [REJECT] Packet too large: {packet.byte_size()} bytes")
return False
return True
def route(self, packet: ClimateOutcomePacket) -> list[str]:
"""
Find relevant recipient centers for a given outcome packet.
Routing is semantic: match on region and variable overlap.
"""
if not self.validate_outcome(packet):
return []
recipients = []
for center_id, meta in self.agents.items():
if center_id == packet.center_id:
continue
region_match = packet.region in meta["regions"]
var_match = packet.variable in meta["variables"]
if region_match or var_match:
weight = self.routing_weights[center_id]
recipients.append((center_id, weight))
# Sort by routing weight descending
recipients.sort(key=lambda x: x[1], reverse=True)
self.packet_log.append(packet)
self.agents[packet.center_id]["packet_count"] += 1
routed_to = [c for c, _ in recipients]
print(
f" [ROUTE] {packet.center_id} -> {routed_to} "
f"| region={packet.region} var={packet.variable} "
f"skill={packet.skill_score:.2f} Δ={packet.delta:+.3f} "
f"size={packet.byte_size()}B fingerprint={packet.semantic_fingerprint()}"
)
return routed_to
def _election_curate(self) -> None:
"""
CURATE Election: ensemble members (model_ids) with consistently high
skill scores earn elevated routing weight for their center.
High-skill output is surfaced; low-skill output is deprioritized.
"""
model_skill: dict[str, list[float]] = defaultdict(list)
for p in self.packet_log:
model_skill[p.center_id].append(p.skill_score)
for center_id, scores in model_skill.items():
avg_skill = sum(scores) / len(scores)
# Weight moves toward skill score; anchored at 1.0 baseline
new_weight = 0.5 + avg_skill
self.routing_weights[center_id] = round(new_weight, 3)
def _election_vote(self) -> None:
"""
VOTE Election: centers whose synthesized regional forecasts improve
over successive validation periods accumulate trust score.
Trust amplifies routing weight — consistent improvement is rewarded.
"""
recent = self.packet_log[-20:] if len(self.packet_log) >= 20 else self.packet_log
center_recent: dict[str, list[float]] = defaultdict(list)
for p in recent:
center_recent[p.center_id].append(p.skill_score)
all_scores = self.packet_log
center_all: dict[str, list[float]] = defaultdict(list)
for p in all_scores:
center_all[p.center_id].append(p.skill_score)
for center_id in center_recent:
recent_avg = sum(center_recent[center_id]) / len(center_recent[center_id])
all_avg = sum(center_all[center_id]) / len(center_all[center_id])
if recent_avg > all_avg:
self.center_trust[center_id] = min(1.0, self.center_trust[center_id] + 0.05)
else:
self.center_trust[center_id] = max(0.1, self.center_trust[center_id] - 0.02)
def _election_compete(self) -> None:
"""
COMPETE Election: regional forecast skill determines routing density.
Centers with demonstrated regional skill receive more incoming packets
from that region — the routing network self-organizes around expertise.
"""
for p in self.packet_log:
key = (p.center_id, p.region)
old = self.regional_skill[key]
self.regional_skill[key] = round((old * 0.9) + (p.skill_score * 0.1), 3)
def synthesize(self) -> dict:
"""Run all three elections and return current state summary."""
self._election_curate()
self._election_vote()
self._election_compete()
print("\n === POST-ELECTION STATE ===")
summary = {}
for center_id, meta in self.agents.items():
trust = self.center_trust[center_id]
weight = self.routing_weights[center_id]
count = meta["packet_count"]
summary[center_id] = {
"name": meta["name"],
"routing_weight": weight,
"trust": round(trust, 3),
"packets_emitted": count,
"is_lmic": meta["is_lmic"],
}
print(
f" {meta['name']:40s} weight={weight:.3f} "
f"trust={trust:.3f} packets={count}"
)
return summary
def run_simulation(self, cycles: int = 12) -> None:
"""
Simulate forecast-validate-route cycles across all registered centers.
Each cycle: centers emit outcome packets for their regional specialties.
"""
regions_per_center = {
cid: meta["regions"] for cid, meta in self.agents.items()
}
variables = ["sst", "precipitation", "z500", "t2m", "wind_850"]
generations = ["CMIP6", "CMIP6", "CMIP6", "operational"]
print(f"\n--- SIMULATION START ({cycles} cycles) ---")
for cycle in range(1, cycles + 1):
print(f"\n[Cycle {cycle:02d}]")
for center_id, meta in self.agents.items():
for region in meta["regions"]:
# Skill varies by center and region — some are genuinely better
base_skill = random.gauss(
0.65 if not meta["is_lmic"] else 0.58, 0.12
)
base_skill = max(0.15, min(0.97, base_skill))
# ECMWF gets a regional boost for nino34 (known strength)
if center_id == "ecmwf" and region == "nino34":
base_skill = min(0.97, base_skill + 0.2)
lead_time = random.choice([3, 6, 12])
fcast = round(random.gauss(0.0, 1.5), 3)
obs = round(fcast + random.gauss(0.0, 0.3), 3)
packet = ClimateOutcomePacket(
model_id=f"{center_id}_m{random.randint(1, 5):02d}",
center_id=center_id,
region=region,
variable=random.choice(meta["variables"]),
lead_time_months=lead_time,
forecast_value=fcast,
observed_value=obs,
skill_score=round(base_skill, 3),
validation_period=f"2025-Q{(cycle % 4) + 1}",
ensemble_generation=random.choice(generations),
)
self.route(packet)
# Elections run after each cycle
if cycle % 3 == 0:
print(f"\n [ELECTIONS @ cycle {cycle}]")
self.synthesize()
print("\n--- SIMULATION END ---")
print(f"Total packets routed: {len(self.packet_log)}")
# ── Entry point ──────────────────────────────────────────────────────────────
if __name__ == "__main__":
router = ClimateOutcomeRouter()
# Register eight climate centers including Global South nodes
router.register_agent(
"ecmwf", "ECMWF (European Centre)",
regions=["nino34", "north_atlantic", "arctic", "south_asian_monsoon"],
variables=["sst", "z500", "precipitation", "wind_850"],
ensemble_generation="operational",
)
router.register_agent(
"ncar", "NCAR (National Center for Atmospheric Research)",
regions=["nino34", "north_america", "arctic"],
variables=["sst", "t2m", "z500", "precipitation"],
ensemble_generation="CMIP6",
)
router.register_agent(
"gfdl", "NOAA GFDL",
regions=["nino34", "north_atlantic", "tropics"],
variables=["sst", "precipitation", "wind_850"],
ensemble_generation="CMIP6",
)
router.register_agent(
"ukmet", "UK Met Office",
regions=["north_atlantic", "europe", "arctic"],
variables=["z500", "t2m", "precipitation"],
ensemble_generation="operational",
)
router.register_agent(
"csiro", "CSIRO (Australia)",
regions=["nino34", "australia", "south_pacific"],
variables=["sst", "precipitation", "t2m"],
ensemble_generation="CMIP6",
)
router.register_agent(
"cma", "CMA (China Meteorological Administration)",
regions=["south_asian_monsoon", "east_asia", "nino34"],
variables=["precipitation", "t2m", "z500"],
ensemble_generation="CMIP6",
)
router.register_agent(
"kmd", "Kenya Meteorological Department",
regions=["east_africa", "indian_ocean_dipole"],
variables=["precipitation", "t2m"],
ensemble_generation="operational",
is_lmic=True,
)
router.register_agent(
"bmd", "Bangladesh Meteorological Department",
regions=["south_asian_monsoon", "bay_of_bengal"],
variables=["precipitation", "wind_850", "t2m"],
ensemble_generation="operational",
is_lmic=True,
)
router.run_simulation(cycles=12)
The El Niño Routing Scenario
Walk through one cycle of the simulation to make the architecture concrete.
ECMWF's IFS ensemble includes a member (ecmwf_m03) focused on Niño 3.4 SST anomalies at 6-month lead time. In Q4 2024, it forecast a +1.2°C SST anomaly. ERA5 recorded +1.18°C. Skill score: 0.93.
The router emits a ClimateOutcomePacket: region=nino34, variable=sst, lead_time_months=6, skill_score=0.93. Semantic fingerprint identifies it as belonging to the Niño 3.4 / SST / 6-month routing class.
Recipients: NCAR, GFDL, CSIRO, CMA — all registered for nino34 or sst. NCAR's routing weight has been elevated by the CURATE election because its recent Niño 3.4 packets also showed high skill. It receives the packet first.
NCAR's own Niño 3.4 member (ncar_m02) predicted +0.9°C. Observed: +1.18°C. Skill score: 0.61. Its packet routes back to ECMWF, GFDL, CSIRO. After three election cycles, ECMWF's routing weight for Niño 3.4 has risen to 1.43. NCAR's has risen to 1.11. GFDL's — whose recent Niño 3.4 skill averaged 0.44 — sits at 0.94.
This is dynamic weighting without a central arbiter. No one decided ECMWF is better at ENSO. The outcome packets decided it. The architecture simply made that information flow.
Now run the Bangladesh scenario. BMD issues a Bay of Bengal cyclogenesis forecast for October 2025. It verifies within 48 hours. The outcome packet — region=bay_of_bengal, variable=wind_850, lead_time_months=1, skill_score=0.81 — routes to CMA and ECMWF, both of which cover South Asian monsoon regions. ECMWF's monsoon ensemble was not watching the Bay of Bengal with that resolution. It now has a validated delta from a node with local observational access. The information transfer happened within hours. Under CMIP batch submission cycles, the equivalent information might appear in the next assessment cycle — years later.
Global South Meteorological Services
The LMIC participation argument is architectural, not rhetorical.
The Kenya Meteorological Department operates surface observation networks across East Africa. It monitors Indian Ocean Dipole interactions with East African coastal rainfall — a phenomenon where global models have historically shown poor resolution because the modeling centers that dominate CMIP are not geographically located near the East African coast and have not historically prioritized high-resolution regional parameterization there.
A KMD forecast for East African short rains, validated against CHIRPS satellite-rainfall data, produces a 512-byte outcome packet. That packet contains the same fields as one from ECMWF. The routing layer does not know KMD has fewer compute resources than ECMWF. It knows KMD's regional skill score.
The COMPETE election ensures that regional skill determines routing density. If KMD consistently produces high-skill forecasts for east_africa and indian_ocean_dipole, those regions route preferentially to KMD for validation. ECMWF's Indian Ocean Dipole ensemble learns from KMD's outcome packets, not through a formal bilateral agreement or a CMIP submission process, but through continuous semantic routing.
SENAMHI in Peru observes ENSO teleconnections over the Amazon basin and Andean precipitation patterns. Bangladesh Met Department's Bay of Bengal observations are unique — no major modeling center has deployed observation density there at the level that BMD maintains. These are not peripheral inputs. They are high-value observations that the current architecture fails to route in real time.
The constraint on LMIC participation in current ensemble synthesis is not institutional — it is architectural. The CMIP portal requires full model submissions. QIS requires only the 512-byte outcome delta. Any node that can run a forecast comparison participates at full architectural standing.
Three Elections in Climate Science
The Three Elections are not literal governance mechanisms. They are metaphors for natural selection forces that operate continuously on routing weights. In climate science they map precisely:
CURATE operates on individual ensemble members. An ECMWF member that consistently produces high-skill Niño 3.4 forecasts earns elevated routing weight. Its outcome packets are surfaced to more recipients. An ensemble member with persistent low skill — perhaps because it uses a parameterization that performs poorly in the current climate state — is progressively deprioritized. No one removes it from the ensemble. Its routing weight decays. The architecture self-curates.
VOTE operates on modeling centers as synthesis entities. ECMWF's synthesized seasonal forecast for the North Atlantic improved between 2023 and 2025. Its center-level trust score rises. When ECMWF emits an outcome packet, the routing layer knows that ECMWF's track record of improving synthesis performance warrants elevated trust in its regional assessments. Stagnant centers — those whose skill has plateaued or declined — accumulate lower trust. Trust is not permanent. It is continuously re-earned through outcome validation.
COMPETE operates on regional coverage. If CSIRO consistently produces superior skill for Australian precipitation and the South Pacific, the routing network's density around those regions becomes weighted toward CSIRO. More packets tagged australia route there for validation context. Regional expertise attracts routing volume. Routing volume creates more validation opportunities, which produces more outcome packets, which reinforces the skill signal. The network self-organizes around demonstrated competence.
These are not policy decisions. They emerge from the packet log.
Comparison Table
| Dimension | QIS Outcome Routing | CMIP Ensemble Portal | ESMValTool | Operational NWP |
|---|---|---|---|---|
| Real-time validation loop | Yes — elections run each cycle | No — batch assessment, multi-year lag | No — offline diagnostic tool | Partial — center-internal only |
| LMIC participation | Full — 512-byte delta, no model submission required | Minimal — full model submission required | Minimal — requires standardized model output | None — proprietary center infrastructure |
| Cross-model synthesis | Semantic routing by region/variable/skill | Manual comparison through standardized variables | Standardized metrics, not routed | None — centers run independently |
| Model genealogy bias | Mitigated — routing weights track outcomes, not model count | Present — independent submissions may share code ancestry | Present — member count inflates apparent independence | Not applicable |
| Regional skill retention | Preserved — regional fingerprint in every packet | Lost in ensemble mean | Partially — can compute regional metrics | Center-dependent |
ESMValTool (Eyring et al. 2020) provides standardized model evaluation diagnostics against observations — it is the state of the art for offline model assessment. The gap is real-time feedback. ESMValTool tells you, in retrospect, which model performed well. QIS routes that information forward, continuously, so the next forecast cycle already knows.
The Architecture Constraint
QIS was discovered by Christopher Thomas Trevethan. The architecture is protected under 39 provisional patents.
The climate science application illustrates why the architecture matters as a complete loop rather than as any single component.
The routing layer alone does not solve the ensemble equality problem — you need the packet format to carry validated skill scores, and you need the Three Elections to translate those scores into routing weights that compound over time. The Three Elections alone do not produce LMIC inclusion — you need the packet structure to be lightweight enough that a national meteorological service with limited compute can participate at full standing. The packet structure alone does not enable cross-center synthesis — you need the semantic fingerprint to match regional expertise across heterogeneous institutional systems.
The breakthrough is the complete loop: lightweight outcome packets enable continuous validation, validation enables elections, elections update routing weights, routing weights determine which nodes receive which packets, packet flow determines which validations occur next. Remove any element and the loop does not close.
CMIP6 is an extraordinary scientific achievement. It coordinated more modeling centers, more forcing scenarios, and more output variables than any previous intercomparison project. The ensemble equality problem it leaves open is not a failure of the scientists who built it. It is a constraint of the architecture they had available.
The constraint is now known. The alternative architecture has been discovered.
Citations
- Eyring, V., Bony, S., Meehl, G. A., Senior, C. A., Stevens, B., Stouffer, R. J., & Taylor, K. E. (2016). Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) experimental design and organization. Geoscientific Model Development, 9(5), 1937–1958. https://doi.org/10.5194/gmd-9-1937-2016
- Knutti, R., Sedláček, J., Sanderson, B. M., Lorenz, R., Fischer, E. M., & Eyring, V. (2017). A climate model genealogy: Version 2.0 and coming opportunities. Geophysical Research Letters, 44(17), 8680–8689. https://doi.org/10.1002/2017GL075543
- Lorenz, E. N. (1963). Deterministic nonperiodic flow. Journal of the Atmospheric Sciences, 20(2), 130–141. https://doi.org/10.1175/1520-0469(1963)0200130:DNF2.0.CO;2
- IPCC. (2021). Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge University Press. https://doi.org/10.1017/9781009157896
- Santer, B. D., Thorne, P. W., Haimberger, L., Taylor, K. E., Wigley, T. M. L., Lanzante, J. R., ... & Wentz, F. J. (2008). Consistency of modelled and observed temperature trends in the tropical troposphere. International Journal of Climatology, 28(13), 1703–1722. https://doi.org/10.1002/joc.1756
- Palmer, T. N. (2012). Toward probabilistic Earth-system modelling. Nature Climate Change, 2(2), 93–95. https://doi.org/10.1038/nclimate1369
- Boer, G. J., Smith, D. M., Cassou, C., Doblas-Reyes, F., Danabasoglu, G., Kirtman, B., ... & Kimoto, M. (2016). The Decadal Climate Prediction Project (DCPP) contribution to CMIP6. Geoscientific Model Development, 9(10), 3751–3777. https://doi.org/10.5194/gmd-9-3751-2016
QIS was discovered by Christopher Thomas Trevethan. The architecture is protected under 39 provisional patents.
Part of the "Understanding QIS" series. Previous: Part 36 — QIS for Precision Medicine
Top comments (0)