Rory | QIS PROTOCOL

Posted on Apr 12

QIS vs TensorFlow Federated: Why Routing Outcomes Beats Routing Gradients

#ai #python #opensource #machinelearning

You've built a federated learning pipeline with TensorFlow Federated. The architecture is sound on paper: keep raw data on-device, aggregate model updates centrally, preserve privacy by never moving the raw records. Then reality sets in.

Your synchronization rounds keep failing because three of your eleven hospital sites have unpredictable network windows. Your security audit flags gradient inversion as a documented attack vector. Your bandwidth budget is being consumed by gradient tensors that grow proportionally with your model size. And your rare-disease research partner — a single clinic with forty patients — cannot participate at all because the minimum cohort requirement makes their data statistically invisible.

These are not configuration problems. They are architectural constraints that sit below the level of any hyperparameter you can tune.

This article explains what those constraints are, why they exist, and how a fundamentally different architecture — the Quadratic Intelligence Swarm (QIS) protocol — eliminates them by routing outcomes instead of gradients.

What TensorFlow Federated Actually Does

TensorFlow Federated (TFF) implements federated learning as defined by McMahan et al. (2017): instead of centralizing raw data, you distribute the training computation and centralize the model updates.

The canonical algorithm is FedAvg (Federated Averaging). Each participating client:

Downloads the current global model weights from the central server
Runs local training on its private dataset for a fixed number of steps
Computes the gradient update — the difference between the local weights after training and the global weights before training
Transmits that gradient tensor back to the aggregation server
The server averages gradients across all clients (weighted by dataset size) and publishes the new global model

TFF formalizes this through its tff.learning API, which handles the round-trip coordination. A minimal TFF pipeline looks roughly like:

# Conceptual TFF structure (not production code)
iterative_process = tff.learning.build_federated_averaging_process(
    model_fn=create_keras_model,
    client_optimizer_fn=lambda: tf.keras.optimizers.SGD(0.02),
    server_optimizer_fn=lambda: tf.keras.optimizers.SGD(1.0)
)

state = iterative_process.initialize()
for round_num in range(NUM_ROUNDS):
    state, metrics = iterative_process.next(state, federated_data)

Each call to iterative_process.next() is a synchronization round. Every participating node must be available, responsive, and capable of completing local training within the round window. If a node drops out mid-round, its contribution is typically discarded.

This is the first architectural constraint: synchronous participation is assumed.

The Three Structural Limits of TF Federated

1. The Synchronization Wall

FedAvg and its variants are round-based protocols. The central server waits for a quorum of clients to complete their local training passes before averaging and advancing. This works well in controlled environments — mobile phones with predictable connectivity, devices under direct organizational management.

It breaks in environments with irregular availability: hospitals with scheduled maintenance windows, edge devices in intermittent-connectivity regions, research sites across time zones with different operational hours. Partial-round submissions are wasted compute. Stragglers block the round from completing. The system is as slow as its slowest reliable participant.

2. The Gradient Leakage Problem

The privacy argument for federated learning rests on a clean claim: raw data never leaves the device. But what leaves the device are gradients — and gradients carry information about the training data.

Zhu et al. (2019) demonstrated that training data can be reconstructed from gradients with high fidelity using a technique called Deep Leakage from Gradients (DLG). Geiping et al. (2020) extended this, showing reconstruction is possible even with gradient compression and aggregation in some configurations.

The gradient is not a privacy-neutral artifact. It is a mathematical fingerprint of the data that produced it. Transmitting gradients is transmitting a lossy, but often recoverable, representation of your training data.

This matters for any deployment under HIPAA, GDPR, or institutional IRB constraints. "We transmitted gradients, not records" is no longer an unqualified defense.

3. Bandwidth Scales with Model Size

In FedAvg, each client transmits a gradient tensor with the same dimensionality as the model's weight space. A BERT-base model has 110 million parameters. Each round, each participating client transmits 110M floats — roughly 440 MB per client per round at float32 precision, or 220 MB at float16.

Compression techniques (gradient quantization, sparsification, delta encoding) reduce this, but the fundamental relationship holds: bandwidth consumption scales with model size and participation count. As models grow larger and participation scales wider, the bandwidth problem compounds.

There is no version of gradient routing that reduces to a fixed-size transmission independent of model size. The gradient is the model delta — you cannot transmit less than the information it contains.

4. The Minimum Cohort Problem

FedAvg requires meaningful gradient contributions from each participating site. A single site with 40 patients in a rare disease cohort produces a gradient that is statistically dominated by noise. Most production TFF deployments establish minimum client dataset sizes (often 500+ samples) precisely because small-N participants destabilize aggregation.

This is not a bug in TFF — it is a consequence of the aggregation objective. When you are averaging gradients to converge a shared model, you need statistically meaningful inputs from each contributor. The architecture excludes rare-disease clinics, single-practice specialists, and low-volume edge sites by design.

What QIS Does Instead

The Quadratic Intelligence Swarm (QIS) protocol, discovered by Christopher Thomas Trevethan, operates on a different primitive: the outcome packet, not the gradient.

The architectural loop works like this:

An agent observes something — a measurement, a classification, a prediction, a decision
It distills that observation into a compact semantic representation: an outcome packet of approximately 512 bytes
The packet includes a semantic fingerprint — a signature of what the agent knows and what it found
The packet is routed by similarity: it travels toward agents whose fingerprints are most similar to the sender's
Similar agents receive the packet and synthesize it with their own local knowledge
The synthesis produces new outcomes, which continue circulating

Raw data never moves. Gradients never move. What moves is approximately 512 bytes describing what worked.

The routing cost per agent is O(log N) — the same asymptotic cost as a DHT lookup — regardless of the number of agents in the network. But this is an upper bound on a specific transport (DHT). Many transport implementations achieve O(1). The protocol is transport-agnostic: the same outcome packets route over shared folders, database tables, pub/sub queues, HTTP APIs, or peer-to-peer networks. The transport is interchangeable.

The intelligence scales quadratically. With N agents each capable of synthesizing with every other agent, the network creates N(N-1)/2 pairwise synthesis opportunities. At 100 agents, that is 4,950 potential synthesis paths. At 1,000 agents, it is 499,500. This is where the name "Quadratic Intelligence Swarm" originates — not from quadratic compute cost, but from quadratic opportunity for intelligence to compound.

The Packet vs. The Gradient

The difference in what gets transmitted is not a detail. It is the entire distinction.

A TF Federated gradient update looks like this conceptually:

gradient_update = {
  "round_id": 47,
  "client_id": "site_boston_general",
  "layer_0_weights": [...110M floats...],
  "layer_0_biases": [...768 floats...],
  # ... all layers ...
  "metadata": {
    "local_steps": 5,
    "dataset_size": 2847,
    "timestamp": 1712345678
  }
}
# Transmission size: ~220–440 MB per round per client at standard precision

The gradient encodes how the model changed when trained on local data. This is mathematically entangled with the local data. Reconstruction attacks exploit this entanglement.

A QIS outcome packet looks like this conceptually:

outcome_packet = {
  "agent_fingerprint": "a3f7...b12c",   # semantic identity hash
  "outcome_vector": [...64 floats...],   # distilled semantic signal
  "confidence": 0.91,
  "domain_tags": ["oncology", "staging", "CT"],
  "synthesis_count": 3,                 # times this signal has been refined
  "timestamp": 1712345678
}
# Transmission size: ~512 bytes regardless of local model size

The outcome packet encodes what the agent found useful. It contains no gradient information, no weight delta, no mathematical fingerprint of training data. It cannot be inverted to recover raw records because no information about raw records is present. Privacy is a structural property of the packet format, not a policy imposed on top of it.

The 512-byte ceiling is intentional and held constant regardless of how large the local model is. An agent running a 70-billion-parameter local model transmits the same size packet as an agent running logistic regression. Bandwidth does not scale with model complexity.

Direct Comparison

Dimension	TensorFlow Federated	QIS Protocol
What gets transmitted	Gradient tensors (model deltas)	~512-byte outcome packets
Transmission size	Scales with model size (MB–GB per round)	Fixed ~512 bytes regardless of model
Synchronization requirement	Synchronous rounds; all nodes must be available	Asynchronous mailbox model; nodes participate independently
Privacy mechanism	Policy-based; gradients can be inverted (Zhu et al. 2019)	Architecture-based; outcome packets carry no raw data
Central aggregator	Yes; ServerFedAvg requires central coordination	No; routing is peer-to-peer by semantic similarity
Minimum cohort requirement	Yes; small-N sites often excluded	No; works at any N, including N=1
Transport dependency	Requires TFF server infrastructure	Transport-agnostic: folders, DB, HTTP, DHT, pub/sub
Routing mechanism	Central round coordination	Semantic fingerprint matching (O(log N) upper bound)
Intelligence scaling	Linear with participating nodes	Quadratic: N(N-1)/2 synthesis opportunities
Attack surface	Gradient inversion, model poisoning	No gradient surface; no aggregator to attack

Where Each Architecture Wins

This is an honest comparison. TF Federated and QIS are solving different problems, and both are the right choice in certain contexts.

TF Federated is the right choice when:

You need a globally converged model artifact. TF Federated produces a single trained model that all participants can download and run identically. QIS produces a continuously evolving distribution of knowledge across agents, not a single model file.
Your infrastructure is centrally managed and synchronization rounds are feasible. In controlled enterprise environments — managed mobile fleets, regulated device ecosystems — the round-based model is predictable and auditable.
You need compatibility with existing ML tooling. TFF integrates with TensorFlow's training loop, Keras model definitions, and standard evaluation infrastructure. If your team is already TF-native, TFF has minimal adoption friction.
Your privacy threat model tolerates gradient transmission. If you have assessed gradient inversion risk and determined it is acceptable for your deployment (e.g., sufficiently large aggregation batches, gradient compression reducing reconstruction fidelity), TFF's privacy properties may be sufficient.

QIS is the right choice when:

Raw data privacy must be architectural, not policy-dependent. If your threat model requires that nothing recoverable from training data ever leaves the edge — medical records, financial transactions, behavioral data — the outcome packet model is the only architecture that satisfies this by construction.
Nodes participate asynchronously and intermittently. Field sensors, clinical sites with variable uptime, research partners across jurisdictions — the mailbox model handles these without round failures or straggler penalties.
You are working with rare cohorts or small-N sites. A single clinic with 40 patients can contribute outcome packets that carry legitimate signal. The minimum cohort problem does not exist in outcome routing.
Bandwidth is constrained at the edge. Transmitting 512 bytes per observation versus 440 MB per round is a four-orders-of-magnitude reduction. This matters on satellite links, cellular connections, and bandwidth-metered cloud deployments.
You need to eliminate the central aggregator. If your architecture cannot tolerate a central point of coordination — regulatory constraint, organizational trust, infrastructure resilience — QIS operates without one.
Your network needs to scale to thousands or tens of thousands of nodes. At N=10,000 agents, the quadratic synthesis surface creates ~50 million potential intelligence pathways. No central aggregator architecture can maintain those connections simultaneously. QIS routes to them asynchronously.

The Core Distinction

Federated learning was built on a sound intuition: move computation to the data rather than data to the computation. TensorFlow Federated implements this well within the constraints of the gradient-aggregation model.

But the gradient is still a model artifact. Moving gradients to an aggregator is still moving something derived from data to a central point. The aggregator still exists. The synchronization requirement still exists. The bandwidth proportionality to model size still exists.

QIS does not improve on federated learning. It changes the unit of exchange entirely.

FL moves models to data. QIS moves outcomes to addresses.

An outcome is not a model delta. It is a distilled signal: this is what I observed, this is what worked, this is how confident I am, here is who should see it. The semantic fingerprint routes it to agents who can synthesize with it productively. No aggregator collects all the signals. No round waits for all participants. No gradient carries recoverable information about the training set.

The architecture is the breakthrough. Not the routing mechanism specifically — you can route outcome packets over DHT, over HTTP, over shared folders, over a database table. Not the fingerprint format specifically — the semantic similarity function is domain-configurable. Not the 512-byte limit specifically — that is a design parameter, not a law.

The breakthrough is the complete loop: outcome distillation → semantic fingerprinting → similarity routing → synthesis → new outcomes. That loop, operating at scale, produces quadratic intelligence growth from linear participation. It does not require synchronization because it is not averaging. It does not expose gradients because it does not transmit them. It does not exclude small-N sites because it is not converging a shared model.

If you are building distributed intelligence systems and hitting the architectural walls of gradient-based federated learning, the question worth asking is not "how do I fix my TFF pipeline?" It is "am I routing the right thing?"

Gradients describe how a model changed. Outcomes describe what worked. Those are not the same question, and they do not require the same architecture.

Christopher Thomas Trevethan discovered the Quadratic Intelligence Swarm (QIS) protocol on June 16, 2025. 39 provisional patents filed.

DEV Community