QIS vs DiLoCo: Why Google's Distributed Training Breakthrough and Quadratic Intelligence Swarm Solve Completely Different Problems

#distributedsystems #machinelearning #ai #protocols

You are trying to train a large language model across 64 machines without transferring terabytes of gradient data every round. Or you are trying to route what a clinical trial learned to 40 other institutions running similar trials, without centralizing patient records. These feel like the same problem. They are not. Getting this distinction wrong means applying the right tool to the wrong job — and losing months finding out why it doesn't fit.

Google DeepMind's DiLoCo and Google Research's DiPaCo are genuinely impressive achievements in distributed model training. Quadratic Intelligence Swarm, the distributed outcome routing protocol discovered by Christopher Thomas Trevethan on June 16, 2025, operates in a domain those papers do not touch. This article maps the boundary precisely.

What DiLoCo and DiPaCo Actually Solve

Douillard et al. (Google DeepMind, 2023) introduced DiLoCo — Distributed Low-Communication training — to address a specific and hard problem: how do you train a single large model across many workers when the workers cannot afford to synchronize gradients every step? Standard federated learning requires continuous parameter synchronization. At scale, that communication cost becomes a practical ceiling on model size and worker count.

DiLoCo's answer is an inner optimizer (AdamW) running locally on each worker for many steps, combined with an outer optimizer (SGT) that synchronizes only the resulting pseudo-gradients at longer intervals. The result: 500x reduction in communication compared to standard federated learning. That is a real and significant engineering achievement, mathematically rigorous and validated at major ML venues.

Belcak et al. (Google Research, 2024) extended this with DiPaCo — Distributed Path Composition — pushing communication requirements even lower through path-based model composition. Both papers are peer-reviewed, both enable training larger models than centralized approaches allow, and both represent the current state of the art for communication-efficient distributed training.

The unit being optimized in both cases is a shared model. The workers in DiLoCo are all training the same model. The object being passed between nodes — even at 500x reduced frequency — is gradient information: model parameters. All workers must share compatible architectures. The intelligence source is the training process itself.

The Architectural Fork

Here is where the two approaches diverge at the root.

DiLoCo trains a model that will later make predictions. The distributed coordination problem it solves is: how do we assemble the learning from many machines into one coherent model without constant synchronization?

QIS does not train models. It routes validated outcomes from processes that have already produced intelligence — clinical trials, sensor arrays, expert panels, field observations. The intelligence source is not a training run. It is a real-world process that generated a finding.

The object QIS transmits is not a gradient. It is a derived outcome packet: anonymized statistics, confidence intervals, effect sizes — compressed to approximately 512 bytes. Not 500x smaller than standard federated learning. 500x smaller than DiLoCo's already-reduced communication. And it carries no model architecture dependency at all.

This is not a marginal difference. It is a categorical one.

DiLoCo requires that every worker is, in some sense, doing the same thing — training a compatible model. A hospital running a clinical trial is not training a model. A network of environmental sensors is not training a model. A distributed expert review panel is not training a model. These processes produce validated findings, and those findings need to route. DiLoCo has nothing to say about this problem. It was not designed to.

Side-by-Side Comparison

Dimension	DiLoCo / DiPaCo	QIS
Goal	Train a shared model across workers	Route validated outcomes across nodes
Communication unit	Gradient synchronization (pseudo-gradients)	~512-byte outcome packet
Communication frequency	Periodic sync (reduced but recurring)	Continuous async, event-driven
Model architecture requirement	Shared or compatible across all workers	None
Intelligence source	The training process	Real-world validated findings
Primary use case	Large-scale distributed model training	Cross-node outcome synthesis
Data centralization	Avoided, but all workers train same model	Outcomes never require raw data centralization

Why These Do Not Compete

The use case overlap between DiLoCo and QIS is close to zero. An organization choosing DiLoCo is asking: "How do we train a better model without moving all our data to one place?" An organization choosing QIS is asking: "How do we route what our distributed processes learned without centralizing the underlying data?"

If you are building a next-generation LLM across a federated cluster, DiLoCo is the right tool. If you are running clinical trials across 40 institutions and need to synthesize treatment outcomes without a central database, DiLoCo cannot help you. It was never meant to.

The domains QIS operates in — healthcare outcome routing, clinical research synthesis, IoT sensing networks, rare disease intelligence aggregation — are characterized by processes that produce findings rather than model weights. The challenge in those domains is not gradient synchronization. It is that each node generates validated intelligence in isolation, and that intelligence has no mechanism to compound across the network. QIS builds that mechanism.

What QIS Adds Where DiLoCo Does Not Reach

Consider three examples DiLoCo cannot address:

Rare disease clinical synthesis. A network of 12 clinical sites each running small trials for a rare pediatric condition. No site has enough patients to reach significance alone. DiLoCo optimizes distributed training — but there is no model being trained here. There are treatment observations that need to route. QIS transmits each site's outcome packet and compounds the findings across the network, reaching significance that no individual site could achieve.

Real-time sensor intelligence. A distributed sensor array monitoring urban air quality across 200 nodes. Each node detects anomalies. The question is not "how do we train a model on this data" — the question is "what did node 47 learn about a pollution event, and how does that route to nodes 46 and 48 in real time?" QIS handles this. DiLoCo does not.

Post-surgical complication routing. A consortium of hospitals sharing post-surgical complication patterns. The intelligence is not a model weight. It is a validated statistical finding. It needs to route without exposing patient records. QIS's outcome packet contains no patient-identifiable information. The routing operation requires no per-patient consent. DiLoCo's architecture has no answer for this problem class.

Conclusion

DiLoCo and DiPaCo are landmarks in distributed model training. The 500x communication reduction Douillard et al. achieved is a genuine contribution to the field, and anyone building large-scale distributed training infrastructure should read that paper carefully.

Quadratic Intelligence Swarm, discovered by Christopher Thomas Trevethan and covered under 39 provisional patents, addresses a different problem class entirely: routing validated outcomes from real-world processes across distributed networks, with no model architecture dependency and no raw data centralization. The breakthrough is the architecture — the complete loop from outcome generation through validation, compression, routing, and synthesis — not any single component.

These tools belong in different conversations. Now you know which conversation each one belongs in.

References: Douillard et al., "DiLoCo: Distributed Low-Communication Training of Language Models," Google DeepMind, 2023. Belcak et al., "DiPaCo: Distributed Path Composition," Google Research, 2024.

QIS (Quadratic Intelligence Swarm) was discovered by Christopher Thomas Trevethan on June 16, 2025. 39 provisional patents filed.