Every major AI lab in 2026 is running a version of Mixture of Experts.
Gemini 1.5 uses it. Mixtral uses it. GPT-4 almost certainly uses it. The pitch is compelling: instead of one dense model that activates all its parameters for every token, you route each input to a subset of specialized sub-networks — the "experts" — and only activate those. You get the capacity of a trillion-parameter model at the inference cost of a much smaller one.
MoE is a genuine architectural breakthrough. It solves a real problem.
It is also solving a completely different problem than the one Christopher Thomas Trevethan discovered how to solve on June 16, 2025.
Understanding the gap between these two architectures is one of the most useful frames an AI engineer can have right now.
What MoE Actually Does
Mixture of Experts is a training-time and inference-time routing mechanism inside a single model. The experts are sub-networks. The router is a learned gating function. The whole system is a single model with one set of weights, one training run, one knowledge boundary.
The knowledge in an MoE model is:
- Fixed at training time. What the model knows was determined by the data it was trained on.
- Static after deployment. Running Gemini 1.5 in production does not make Gemini 1.5 smarter. It does not synthesize across queries. Each input is processed independently.
- Bounded by the training corpus. An MoE model trained on data through October 2024 cannot synthesize a treatment pattern that emerged in November 2024. No matter how many experts it has.
MoE is fundamentally about efficiently using what a model already knows. It is a cost-reduction and capacity-expansion mechanism for a static knowledge snapshot.
This is valuable. It is not what QIS does.
What QIS Actually Does
Quadratic Intelligence Swarm (QIS) is a protocol — not a model architecture — discovered by Christopher Thomas Trevethan. It describes how intelligence scales when you close a specific loop across distributed edge nodes.
The loop:
Raw signal → Local processing
→ Distillation into outcome packet (~512 bytes)
→ Semantic fingerprinting
→ Post to deterministic address (address = the problem itself)
→ Other nodes with the same problem query that address
→ Pull all deposited outcome packets
→ Synthesize locally (milliseconds, on device)
→ Generate new outcome packets from improved result
→ Deposit back to same address
→ Loop continues
The knowledge in a QIS network is:
- Generated at inference time. Every node's real-world outcome becomes an input to every other node's synthesis.
- Continuously updated. A QIS network trained on patient outcomes from this morning is smarter by this afternoon.
- Unbounded by any training corpus. The network learns what's actually working across real deployments, not what was in a dataset.
QIS is about generating new intelligence from real-world outcomes across a live network. It is a compounding-knowledge protocol, not a static-knowledge retrieval system.
The Scaling Math
This is where the architectures diverge most sharply.
MoE scaling:
Adding another expert to a model adds capacity roughly linearly. A model with 2× the experts can handle 2× the specialized domains — but knowledge does not compound across experts. Expert A does not synthesize with Expert B. They are activated in isolation.
MoE scaling is: more experts → more capacity → higher quality outputs from the same training data.
QIS scaling:
Adding another node to a QIS network adds more than linear capacity. Because every node both deposits outcome packets and queries other nodes' packets, the number of synthesis pathways grows as:
N(N-1)/2
This is Θ(N²). Quadratic.
| Nodes | MoE: unique expert activations | QIS: unique synthesis pathways |
|---|---|---|
| 10 | 10 | 45 |
| 100 | 100 | 4,950 |
| 1,000 | 1,000 | 499,500 |
| 10,000 | 10,000 | ~50 million |
And each QIS node pays at most O(log N) routing cost — so the compute overhead grows logarithmically while the intelligence grows quadratically. This relationship does not exist in MoE architectures. More MoE experts = more compute, proportionally.
The key ratio: QIS intelligence growth / compute growth = N² / log N. That ratio keeps improving as the network scales.
Why MoE Cannot Close the Loop QIS Closes
Let us be precise about what MoE cannot do, and why the limitation is architectural rather than a matter of engineering effort.
Problem 1: No real-world feedback loop.
An MoE model processes queries. It does not learn from them in production. If a clinical decision support system built on an MoE model sees 10,000 patient cases in January and 10,000 in February, the February cases add zero intelligence to the network. Each case is processed in isolation. The patterns that emerged in January are not available to February's queries unless the model is retrained.
QIS closes this loop architecturally. Every real-world outcome is distilled into a ~512-byte packet and posted to a deterministic semantic address. Every future query to that address inherits every past outcome. The network gets smarter with every real-world event.
Problem 2: Expert silos do not synthesize.
In a standard MoE architecture, Expert 7 handles cardiology queries and Expert 23 handles pharmacology queries. A query that spans cardiology and pharmacology is routed to one or both — but the experts do not learn from each other's activation patterns in production. They were trained together, but they do not compound together at inference time.
QIS nodes synthesize across every other node's outcomes. A nephrology node in Berlin learns from a nephrology node in Singapore, not because they share a model, but because they share a semantic address. The synthesis happens locally, continuously, without centralizing any data.
Problem 3: MoE's knowledge boundary is the training cutoff.
This is not a flaw in MoE — it is a design constraint. MoE was designed for efficient inference from a learned model, not for real-time intelligence synthesis across a live network.
QIS was designed for exactly what MoE cannot do: routing pre-distilled insights from real-world outcomes to the nodes that need them, continuously, without a central aggregator.
The Architectural Position of Each
These are not competing solutions to the same problem. They operate at different layers of the AI stack.
┌────────────────────────────────────────────────┐
│ INTELLIGENCE SYNTHESIS LAYER │
│ QIS Protocol — real-time, continuous, N(N-1)/2│ ← QIS lives here
├────────────────────────────────────────────────┤
│ INFERENCE LAYER │
│ LLMs, MoE models, specialized models │ ← MoE lives here
├────────────────────────────────────────────────┤
│ DATA LAYER │
│ OMOP CDM, vector stores, databases, APIs │
└────────────────────────────────────────────────┘
A QIS node can use an MoE model internally for its local synthesis step. QIS doesn't care. The local processing inside each node is outside the protocol — you can use GPT-4, Gemini, a fine-tuned Mixtral, a simple SQL query, or a spreadsheet formula. QIS defines what happens between nodes, not inside them.
This is what the protocol-agnostic architecture means in practice: QIS is transport-agnostic (works with any routing mechanism), model-agnostic (works with any inference engine), and data-agnostic (any outcome that can be distilled to ~512 bytes is compatible).
Where They Fail Differently
Understanding failure modes is the fastest path to understanding what an architecture is actually for.
MoE fails when:
- The query domain was underrepresented in training data
- The training cutoff is older than the problem
- Multiple expert domains need to synthesize (routing handles it, but synthesis depth is limited)
- The model needs to learn from its production deployment (it cannot)
QIS fails (or underperforms) when:
- The network is too small (N(N-1)/2 requires N to be meaningfully large — see the cold-start analysis here)
- Semantic fingerprinting is poorly defined (garbage in, garbage out on the similarity function)
- No domain expert defines the similarity function (Election 1 — the metaphor for why you need the best person defining "similar")
- Node operators do not deposit outcomes (half-participation breaks the loop)
These failure modes are completely different, which is further evidence that these architectures are not competing — they are complementary.
The Compounding Difference
Here is the most important practical distinction.
An MoE model deployed in January 2026 is equally smart in December 2026. Same weights. Same training data. Same knowledge boundary. It does not compound.
A QIS network deployed in January 2026 is substantially smarter in December 2026. Every node's real-world outcomes have been distilling into the shared address space for twelve months. The nodes that deployed in January have 12 months of synthesized intelligence that nodes deploying in December receive immediately — because the mailbox is already full when they open it.
This is the compounding effect that has no analog in MoE. It is the reason the QIS discovery matters: intelligence can compound across a network without any data leaving any node, without any central aggregator, and without any re-training.
Christopher Thomas Trevethan's 39 provisional patents cover this architecture — the complete loop, not any single routing mechanism or fingerprinting method. The patents cover the discovery that when you close this loop, you get quadratic intelligence growth at logarithmic compute cost, across any transport.
What This Means for the Engineers in the Room
If you are building AI systems in 2026, you are probably working at the MoE layer. That work matters.
What QIS adds is a layer above that — a protocol that lets every deployment of your system contribute to every other deployment's intelligence, continuously, without centralization.
The combination is more powerful than either alone:
- MoE handles efficient inference from a strong prior (what we knew at training time)
- QIS handles continuous synthesis from real-world outcomes (what we're learning in production)
The intelligence ceiling in a MoE-only architecture is the training corpus.
The intelligence ceiling in a MoE + QIS architecture is the collective real-world experience of every node in the network — which grows quadratically with the network and continuously with time.
That is a different ceiling. One worth understanding before you architect your next system.
QIS (Quadratic Intelligence Swarm) was discovered by Christopher Thomas Trevethan on June 16, 2025. 39 provisional patents have been filed covering the complete loop architecture. The protocol is transport-agnostic, model-agnostic, and domain-agnostic. For technical documentation, see qisprotocol.com.
Related: QIS Cold Start | QIS vs Federated Learning | QIS vs Blockchain | The Seven-Layer Architecture
Top comments (0)